Pywikibot/weblinkchecker.py

From Wikibooks, open books for an open world
< Pywikibot
Jump to: navigation, search
Bug blank.svg
Git repository of Wikimedia has this file:

weblinkchecker.py is a script from the pywikibot framework. The script finds broken external links.

weblinkchecker.py can either check all URLs found in a single article, or in all articles (in alphabetical order). It will only check HTTP and HTTPS links, and it will leave out URLs inside comments and nowiki tags. To speed itself up, it will check up to 50 links at the same time, using multithreading.

The bot won't remove external links by itself, it will only report them; removal would require strong artificial intelligence. It will only report dead links if they have been found offline at least two times, with at least one week of waiting between the first and the last time. This should prevent users from removing links just because of a temporary server failure. Please keep in mind that the bot can't yet differentiate between failures of your own connection and a server failure, so make sure you're on a stable Internet connection.

The bot will save a history of broken links to the deadlinks subdirectory, e.g. deadlinks/deadlinks-wikipedia-de.dat. This file is not intended to be read or modified by humans. The dat file will be written when the bot terminates (because it is done or the user pressed CTRL-C).

After the bot has checked some pages, run it on these pages again at a later time. This can be done with this command:

python weblinkchecker.py -repeat

If the bot finds a broken link that was already broken at least one week earlier, it will log it in a text file, e.g. deadlinks/results-wikipedia-de.txt. The written text has a format that is suitable for posting it on the wiki, so that others can help you to fix or remove the broken links from the wiki pages.

Additionally, it's possible to report broken links to the talk page of the article in which the URL was found (again, only once the linked page has been unavailable at least twice in at least one week). To use this feature, set report_dead_links_on_talk = True in your user-config.py.

Reports will include a link to the Internet Archive Wayback Machine if available, so that important references can be kept.

For syntax explanation run:

python weblinkchecker.py -help

See also[edit]