Pywikibot/replace.py

From Wikibooks, open books for an open world
Jump to: navigation, search
Bug blank.svg
Git repository of Wikimedia has this file:

Replace.py is part of the Pywikibot framework.

This bot replaces text. It will retrieve information on which pages might need changes either from an XML dump or a text file, or only change a single page. To get some more information, use

python replace.py -help

Files[edit]

The bot uses three files in addition to the framework:

replace.py 
the main module
fixes.py 
a few predefined "fixes"
user-fixes.py 
a file to add ones own fixes. The file is created nearly empty by generate_user_files.py

Files that may be used for input and/or output:

filename.txt 
a file with a list of articles if specified with the parameter "-file"
filename.xml 
a local XML dump if used with parameter "-xml"
replacelog 
the log with a name that may be specified with parameter "-log"

Parameters[edit]

Local[edit]

You can run replace.py with the following parameters (for example, python replace.py -file:articles_list.txt "errror" "error").

Source
-xml Retrieve information from a local XML dump (pages_current, see http://dumps.wikimedia.org/). Argument can also be given as "-xml:filename".
-file Work on all pages given in a local text file. Will read any [[wiki link]] and use these articles. Argument can also be given as "-file:filename".
-cat Work on all pages which are in a specific category. Argument can also be given as "-cat:categoryname".
-subcat Works in the same way as -cat, but including subcategories.
-transcludes Work on all pages which transclude a specific template. Argument can also be given as "-transcludes:referredtemplate", e.g. "-transcludes:stub" means transcluding stub template.
-page Only edit a specific page. Argument can also be given as "-page:pagetitle". You can give this parameter multiple times to edit multiple pages.
-ref Work on all pages that link to a certain page. Argument can also be given as "-ref:referredpagetitle".
-filelinks Works on all pages that link to a certain image. Argument can also be given as "-filelinks:ImageName".
-links Work on all pages that are linked to from a certain page. Argument can also be given as "-links:linkingpagetitle".
-start Work on all pages in the wiki, starting at a given page. Choose "-start:!" to start at the beginning. Note: You are advised to use -xml instead of this option; this is meant for cases where there is no recent XML dump.
Replace parameters
-except:XYZ (older versions only) Ignore pages which contain XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression.
-excepttitle:XYZ (newer versions only) Skip pages with titles that contain XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression.
-excepttext:XYZ (newer versions only) Skip pages which contain the text XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression.
-exceptinside:XYZ (newer versions only) Skip occurences of the to-be-replaced text which lie within XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression.
-exceptinsidetag:XYZ (newer versions only) Skip occurences of the to-be-replaced text which lie within an XYZ tag. Possible values of XYZ include link, nowiki, ref, header, interwiki, and hyperlink.
-summary:XYZ Set the summary message text, bypassing the default edit summaries.
-fix:XYZ Perform one of the predefined replacements tasks, which are given in the dictionary 'fixes' defined inside the file fixes.py or user-fixes.py. The -regex argument and given replacements will be ignored if you use -fix. Currently available predefined fixes are:
  • HTML - convert HTML tags to wiki syntax, and fix XHTML.
  • syntax - try to fix bad wiki markup.
  • case-de - fix case errors in German.
  • grammar-de - fix grammar and typography in German.
-namespace:n Number of namespace to process. The parameter can be used multiple times. It works in combination with all other parameters except for the -start parameter. (If you want to change all pages in a particular namespace, add the namespace prefix; for example, -start:User:!.)
unnamed First unnamed argument is the old text, second argument is the new text. If the -regex argument is given, the first argument will be regarded as a regular expression, and the second argument might contain expressions like \\1 or \g<name>.
Options
-always Don't prompt you for each replacement.
-recursive Recurse replacement until possible.
-nocase Use case insensitive search expressions (including regex).
-allowoverlap When occurrences of the pattern overlap, replace all of them. Warning! Don't use this option if you don't know what you're doing, because it might easily lead to infinite loops then.
-regex Make replacements using regular expressions. If this argument isn't given, the bot will make simple text replacements.
-dotall a dot (.) also matches linebreaks when using regex

Global arguments available for all bots

arg Description Default
-family:xyz Set the family of the wiki you want to work on, e.g. wikipedia, wiktionary, commons, wikitravel, …. This will override the configuration in user-config.py settings. user-config.py parameter: family
-lang:xx Set the language of the wiki you want to work on, overriding the configuration in user-config.py where xx should be the language code[1]. user-config.py parameter: mylang
-log Enable the logfile. Logs will be stored in the logs subdirectory. user-config.py parameter: log ?
-log:xyz Enable the logfile, using xyz as the filename.
-nolog Disable the logfile (if it's enabled by default).
-putthrottle:nn
-pt:nn
Set the minimum time (in seconds) the bot will wait between saving pages. user-config.py parameter: putthrottle ?
-verbose
-v
Make the program output more detailed messages than usual to the standard output about its current work, or progress, while it is proceeding. This may be helpful when debugging or dealing with unusual situations. not selected
  1. Commons uses 'commons' for lang and family; Meta uses 'meta' for both.


Examples[edit]

If you want to change templates from the old syntax, e.g. {{msg:Stub}}, to the new syntax, e.g. {{Stub}}, download an XML dump file (page table) from http://dumps.wikimedia.org/, then use this command:

   python replace.py -xml -regex "{{msg:(.*?)}}" "{{\1}}"

Note that the you can match patterns across more than one line:

   python replace.py -regex -start:! "First line\nSecond line" ""

Replace.py can be used to insert or append text to a page (note the replacement text has an embedded new line):

   python replace.py -regex '(?ms)^(.*)$' "\1
    > [[Category:NewCat]]"

If you have a dump called foobar.xml and want to fix typos, e.g. Errror -> Error, use this:

   python replace.py -xml:foobar.xml "Errror" "Error"

If you have a page called 'John Doe' and want to convert HTML tags to wiki syntax, use:

   python replace.py -page:John_Doe -fix:HTML

If you run the bot without arguments you will be prompted multiple times for replacements:

   python replace.py -file:blah.txt

The script asks the user before modifying an article. It is recommended to double-check the result to be sure that the bot did not introduce errors (especially with misspelled words). It is possible to specify a set of articles with an external text file containing Wiki links :

 [[plane]]
 [[vehicle]]
 [[train]]
 [[car]]

The bot is then called using something like :

 python replace.py [global-arguments] -file:articles_list.txt "errror" "error" 

Rather than specifying regular expressions at the command line, it's preferable to add them to user-fixes.py

 python replace.py -file:articles_list.txt -fix:example2

Example: Replacing multiple paragraphs[edit]

The original text of the page Meta:Sandbox is:

This page is for any tests.

Welcome to the sandbox!

If you want to switch the statement (the second one goes before the first one), you type the following syntax:

replace.py -page:Meta:Sandbox -regex "This page is for any tests.\r\n\r\nWelcome to the sandbox!" "Welcome to the sandbox!\n\nThis page is for any tests."

To add a new line we use \n.

External links[edit]