User:Whiteknight/Bot

From Wikibooks, open books for an open world
Jump to: navigation, search
User:WhiteknightChess.svg
User talk:WhiteknightNuvola apps edu languages.svg
User:Whiteknight/Book FoundryNuvola devices blockdevice.png
User:Whiteknight/All BooksNuvola apps bookcase.png
User:Whiteknight/New Book GuideNuvola apps kdict.png
User:Whiteknight/Book CreatorNuvola apps kcmsystem.png
User:Whiteknight/ImagesNuvola apps kolourpaint.png
User:Andrew WhitworthContact-new.svg
R
L
P
Whiteknight Discussion Book Foundry All Books New Book Guide Book Creator Images About Me
Due to a severe shortage of time, I am no longer able to be an active member of the Wikibooks project. Please see my page for more details about my absense. I will continue to monitor my talk page for correspondence and will try to reply quickly when possible.

This page is about the various automated and semi-automated tools that I create for use with Wikibooks. This is not to be confused with my Book Creator gadgets, which are semi-automated javascripts. My bot scripts are all written in Perl.

Projects[edit]

This is a list of projects that I am working on, will be working on, or have basically already completed. These are all automated or semi-automated tools for creating, editing, or analyzing books and pages on Wikibooks.

  • Book designer gadget
  • Print version gadget
  • Book categorizer gadget
  • Page Transformer
  • User:Whiteknight/WEP

Current Status[edit]

Bot is in good working order.

Requesting Help[edit]

To request help from my bot, especially if you know perl, use the User:Whiteknight/BotTask template on my user talk page as follows:

{{subst:User:Whiteknight
    |init=
    |transform=
    |basepage=
    |pagelist=
    |summary=
    |getmode=
  |}

Where:

init=
This is code that gets run before any page transformations are performed
transform=
This is code that is applied to every page
basepage=
This is the "main page" of a book, such as the book's TOC
pagelist=
A page or a list of pages, separated by a comma.
summary=
The edit summary to use
getmode=
The mode to use. Use "NONE" for no special get mode. Use "BOOK" to load the page list from the basepage (such as a book TOC). Other modes will be available eventually.

I double-check all requests before they get performed. Also, if you don't know exactly what to put for init= or transform=, write a plain-english description and I will try to translate it for you.

WBRegex Page Processor[edit]

This is the primary tool in my bot arsenal. The page processor is a script that allows the user to apply an arbitrary amount of perl code to the text on a page. It uses the Wikitargets.pm module to specify a list of pages, and the Wikisession module to apply a transformation function to each page in the list. Each transformation has access to the previous and the next page in the list (for navigation templates).

Common tasks can be saved as "filters". Filters are sets of perl code that takes current wikitext in, and output a transformed version of that text for uploading. Filters can be loaded into the page processor, or they can be used as stand-alone programs to modify one page at a time.

Because the page processor can execute an arbitrary amount of code on a per-page basis, various graphical widgets, or user-interfaces can be defined. Also, the code is able to add new pages to the list, or to modify the code to be executed on future pages in the list. Some common tasks for the processor are:

  1. Adding or removing templates, categories, or links.
  2. performing pattern-matching substitutions
  3. Creating new pages from a template or model
  4. Examining the text of a page to make decisions based on existing content.

This program cannot move pages, delete pages, protect pages, or various other tasks. It operates solely on the text of the page.

To Do:[edit]

Note: Availability[edit]

The code of my libraries and my WBRegex page processor are not currently freely available.

Framework Reference[edit]

This is going to serve as a reference to my framework, that will show some of the things that it can do currently, and give some kind of idea what will be done in the future.

Wikisession.pm[edit]

This is the "core" module of my framework, and is the central point which all other modules rely on. Wikisession.pm exports a class (Wikisession) that is an extension of the LWP::UserAgent class. This module maintains session data so that the user can login to the wikimedia server with a supplied username and password. All edits made by the library instance will then be attributed to that username.

This module exports a number of methods, of which the most important are:

new
This function takes a hash of values, such as "Username", "Password", "Server", etc, and returns a reference to a session object.
Login, Logout
These functions log in to wikibooks, and log out of it, using the supplied HTML::Cookiejar object for storing the session information. If no cookiejar is stored, the login information will be stored in a temporary location.
GetPageText, GetSectionText
These functions return the wikitext of a page or a particular section on a page.
PostPageText, PostSectionText, PostNewSection
These functions post text to the page. The first function deletes all previous text on the page, and replaces it with the supplied text. The second function does the same thing, but only in a specific section. The third function posts a new section to the page, with supplied header and text.
ProcessPageText, ProcessSectionText
These are the heart and soul of the module. Both of these functions take, as one of the parameters, a function reference. The passed function reference is called on the current text of the page, and the new text is posted to the page. In essence, if we have our processing function "my_function", the two following peices of code are nearly identical:
$session->ProcessWikiPageText("User:Whiteknight/bottest", "my summary", \&my_function);

and

my $text = $session->GetWikiText("User:Whiteknight/bottest");
my $newtext = my_function($text);
$session->PostPageText("User:Whiteknight/bottest", $newtext, "my summary");

They are identical in function except that the second piece of code makes an extra HTTP request to the server, and creates a lot more overhead. Any additional parameters that are passed to ProcessPageText are passed directly to the subroutine. The "complete" C-style function prototype of the function is:

ProcessWikiPageText($page, $summary, \&func, @args);

and if "$text" is the complete text from "$page", then \&func is called internally as follows:

$func->($text, @args);

So the processing functions can have any number of additional arguments, in any format, so long as the first argument is a peice of text (which can easily be disregarded, for instance).

None of the editing functions in this module explicitly account for edit conflicts. Instead, they fail silently. I may add functionality to change this in the future.

Wikihandle.pm[edit]

The Wikihandle perl module creates a tied filehandle class that can be used for regular file-style I/O with the server. For instance:

use Wikisession.pm;
use Wikihandle.pm;
tie(*WIKI, "Wikihandle", $session);
open WIKI, "User:Whiteknight/bottest";
@text = <WIKI>;
print @text;
print WIKI "This is going to be the new page text!!!\n";
close WIKI;

This module is nearly at a good place, but there are a few issues that are worth considering: Only a few of the functions are implemented: READLINE, PRINT, PRINTF, OPEN, and CLOSE. The other functions either havne't been implemented, or make no sense in this context. This module does allow for ">", "<", or ">>" modifiers, to make the page readable, writable, or appendable. No other modes are allowed.

Wikitarget.pm[edit]

This module is used to generate a list of various pages on wiki. The pages can be added manually, or they can be uploaded from a TOC page. They cannot currently be loaded from an automatically-generated page list such as Special:Prefixindex, DPL, Whatlinkshere, or categories.