Perlwikibot/clean sandbox.pl

From Wikibooks, open books for an open world
Jump to navigation Jump to search

clean_sandbox.pl is a simple script using the unreleased 3.0 version of perlwikibot. It simply cleans a wiki's sandbox by submitting an edit with some standard text, overwriting whatever was there previously.

Preamble[edit | edit source]

#!/usr/bin/perl
use strict;
use warnings;

use utf8;
use Carp;
use Getopt::Long;
use Pod::Usage;
use Config::General qw(ParseConfig);

use MediaWiki::Bot 3.0.0;

my $VERSION = '0.0.1';

Here, we use two pragmas that should be considered mandatory: strict and warnings. These are tools to make you a better programmer by forcing you to follow some rules.

We also use Getopt::Long to get command line options from the user. This greatly simplifies parsing @ARGV - you should never do it yourself. Always use a Perl module - Getopt::Long is one, but there are others.

Pod::Usage allows us to give the user man-style documentation using Perl's POD markup embedded in the source code.

Config::Generalallows us to easily parse configuration files. Again, there are others, but this one provides several features we'll make use of, like heredocs.

Finally, we need MediaWiki::Bot version 3.0.0 or higher. 3.0.0 is still under development, so method calls are subject to change.

The $VERSION string will be used later, and might be used by other scripts that use this file, like we did for MediaWiki::Bot.

POD[edit | edit source]

=head1 NAME

clean_sandbox.pl - Cleans a wiki's sandbox

=head1 SYNOPSIS

clean_sandbox.pl --wiki=meta --summary="Cleaning the sandbox"

Options:
    --help      Opens this man page and exit
    --version   Print version information and exit
    --dry-run   Do everything but edit
    --debug     Print debug output
    --wiki      Sets which wiki to use
    --text      Sets what text to use
    --page      Sets what page to edit
    --summary   Sets an edit summary
    --username  Sets a non-default username
    --password  Request a password prompt

=head1 OPTIONS

=cut

Here, we provide some documentation for the user. This uses POD (see perldoc perlpod), and will be parsed and shown to the user if they provide the --help option. The =head1 control inserts a heading with the rest of the text on the line. More POD is included in the rest of the file, but isn't included in this page.

Parsing the command line[edit | edit source]

my $help;
my $version;
my $dry_run;
my $debug;
my $wiki;
my $text;
my $page;
my $summary;
my $username;
my $password;

GetOptions(
    'help|h|?'      => \$help,
    'version|v'     => \$version,
    'dry-run'       => \$dry_run,
    'debug|verbose' => \$debug,
    'wiki=s'        => \$wiki,
    'text=s'        => \$text,
    'page=s'        => \$page,
    'summary=s'     => \$summary,
    'username=s'    => \$username,
    'password:s'    => \$password, # We can ask for it interactively!
);

This declares variables for all our command line options, and gets Getopt::Long to parse @ARGV and assign to those variables for us. This is much better than attempting to do so manually.

On the left, is the option name, and any aliases. For example, help|h|? provide a name (help) and two aliases (h and ?). The presence of that option on the command line is assigned to $help. Others, like dry-run have only the canonical name. Still others take a mandatory parameter, like wiki=s. The = indicates that the parameter is mandatory; the s indicates that it is a string. Lastly, password has an optional string parameter indicated by :s. This option's parameter is optional because we prefer to prompt for the password interactively. Command line arguments are visible to all users on the system, so doing that should be avoided.

Version data[edit | edit source]

if ($version) {
    require File::Basename;
    my $script = File::Basename::basename($0);
    print "$script version $VERSION\n" and exit;
}

If the user specified --version on the command line, $version will be true. We print a simple message containing the version string we declared above, then exit.

Prompt for password[edit | edit source]

if (defined($password)) { # I think this is wrong, actually... we'll prompt interactively even if they do --password pass. Should check defined and false.
    require Term::ReadKey;

    print "[clean_sandbox.pl] password: ";
    Term::ReadKey::ReadMode('noecho');      # Don't show the password
    $password = Term::ReadKey::ReadLine(0);
    Term::ReadKey::ReadMode('restore');     # Don't bork their terminal
}

If --password was specified on the command line, we interactively prompt for their password. To do this, we can use Term::ReadKey, which provides several methods useful for this task. First, note that require is evaluated at runtime, whereas use is evaluated at compile-time, even if it would never run. use also import()s default methods into the current context, whereas require doesn't. We could import() ourselves, but it is just as easy not to in this case.

We'll use a standard method of reading in the password. First, we show the prompt, then set the terminal to 'noecho' readmode. This means the user's keystrokes won't display anything on-screen. Next, we read in the user's input and assign it to $password. Previously, this variable simply told us whether --password was specified on the command line - now it holds the text of the password. Finally, we restore the original characteristics of the user's terminal. If we don't do that, it continues operating in 'noecho' mode, which they won't like.

Reading configuration[edit | edit source]

if (!$username or !$password or !$wiki) {
    warn 'Reading config/main.conf' if $debug;
    my %main = ParseConfig (
        -ConfigFile     => 'config/main.conf',
        -LowerCaseNames => 1,
        -AutoTrue       => 1,
        -UTF8           => 1,
    );
    $username = $main{'default'} unless $username; warn "Using $username" if $debug;
    die "I can't figure out what account to use! Try setting default in config/main.conf, or use --username" unless $username;
    die "There's no block for $username and you didn't specify enough data on the command line to continue" unless $main{'bot'}{$username};

    $password = $main{'bot'}{$username}{'password'} if (!$password);
    warn "Setting \$password" if $debug;
    $wiki = $main{'bot'}{$username}{'wiki'} unless $wiki;
    warn "Setting \$wiki to $wiki" if $debug;
}

If we don't have all of username, password, and wiki already, then we should read them in from a config file. Config::General provides the ParseConfig method to do this.

We give it the filename (relative to the current file), and a few options. UTF8 is important because this file can and will include UTF8 characters under many circumstances. An example config file:

default = Mike's bot account

<bot Mike's bot account>
    password    = fake password
    wiki        = enwikibooks
</bot>

When Config::General reads this in, it creates a hash which represents the data. Once we have that hash, we try to get the data we're missing, and warn the user if we can't accomplish that.

my $bot = MediaWiki::Bot->new(); # Create a default object so we can query sitematrix if need be
$bot->{'debug'} = $debug;

Notice that some of the warn statements are conditional on $debug. This is another command line flag that asks the script to output additional information about what it is doing to make debugging easier. We ask MediaWiki::Bot to do the same.

my $domain;
if (!$text or !$page or !$summary) {
    warn 'Reading config/clean_sandbox.conf' if $debug;
    my %conf = ParseConfig (
        -ConfigFile     => 'config/clean_sandbox.conf',
        -LowerCaseNames => 1,
        -UTF8           => 1,
    );
    if ($wiki =~ m/\w\.\w/) {
        $domain = $wiki;
        $wiki = $bot->domain_to_db($wiki);
    }
    %conf = %{ $conf{$wiki} }; # Keep just the part we want.

    $text    = $conf{'text'} unless $text; warn "Setting \$text to $text" if $debug;
    $page    = $conf{'page'} unless $page; warn "Setting \$page to $page" if $debug;
    $summary = $conf{'summary'} unless $summary; warn "Setting \$summary to $summary" if $debug;
}

Here, we use Config::General::ParseConfig to parse another config file. This one contains data for many wikis about where their sandbox is located, what standard text should be put on it, and what edit summary they want to be used. This is useful because some bot operators might not speak the language where their bot cleans the sandbox. It also means they don't have to always specify --page "Project:Sandbox" --text "{{/Don't edit this line}}\n&lt;!--Practice your editing here-->" --summary "Bot: cleaning sandbox" every time. That data can be stored in the config file instead.

Because this file contains data for many wikis, and we don't need all of it, we throw away most of it. ParseConfig creates a large hash, but we keep only the part about the wiki we actually want to edit. Then, we find any data we're missing.

Create a bot object[edit | edit source]

$domain = $bot->db_to_domain($wiki) if ($wiki !~ m/\w\.\w/);
$bot = MediaWiki::Bot->new({
    host        => $domain,
    login_data  => { username => $username, password => $password },
});

Unlike earlier versions of Perlwikibot, 3.0 will handle lots automatically to make writing scripts easier. Here, we create a new bot object, which will automatically be logged in and configured for us. Check POD documentation for MediaWiki::Bot for details about the new() constructor.

Make the edit[edit | edit source]

die <<"END" if $dry_run;
This is where we would attempt the following edit:
\$bot->edit({
    page        => $page,
    text        => $text,
    summary     => $summary
    is_minor    => 1,
});
on $domain
END

This is a here-document (heredoc) - it allows us to print a multi-line string easily. If the user specified --dry-run on the command line, we want to do everything up to this point, but not actually edit. So, if $dry_run is set, we print out this multi-line string showing what would have happened, and die. The << part of <<"END" tells Perl that a heredoc follows; END tells Perl what delimiter to look for to know the heredoc has ended; the double-quotes tell Perl that we want it to be an interpolated string.

Actually make the edit[edit | edit source]

warn "Editing..." if $debug;
$bot->edit({
    page        => $page,
    text        => $text,
    summary     => $summary
    is_minor    => 1,
}) or die "Couldn't edit";

This actually makes the edit by calling MediaWiki::Bot's edit() method, and passing it the page name, what text to put, the edit summary, and that it is a minor edit. See POD for details on edit().