Help:Export

Wiki pages can be exported in a special XML format to import into another MediaWiki installation or use it elsewise for instance for analysing the content.

How to export

There are at least four ways to export pages:

Paste the name of the articles in the box in Special:Export or use the URL //en.wikibooks.org/wiki/Special:Export/article's_name. You can get a list of all page names (in a specified namespace) at Special:Allpages.
The backup script dumpBackup.php dumps all the wiki pages into an XML file. dumpBackup.php only works on MediaWiki 1.5 or newer. You need to have direct access to the server to run this script. Dumps of mediawiki projects are (more or less) regularly made available at http://download.wikipedia.org.
There is a OAI-PMH-interface to regularly fetch pages that have been modified since a specific time. For Wikimedia projects this interface is not publically available. OAI-PMH contains a wrapper format around the actual exported articles.
Use the Python Wikipedia Robot Framework. This won't be explained here.

By default only the current version of a page is included. Optionally you can get all versions with date, time, user name and edit summary.

Additionally you can copy the SQL database. This is how dumps of the database were made available before MediaWiki 1.5 and it won't be explained here further.

Using 'Special:Export'

First you need to know the names of pages you wish to export. To export all pages of a namespace:

Use Special:Allpages and choose the desired namespace.
Copy the list of page names to a text editor
Put all names on separate lines
In the case that the selected namespace is not the main namespace: insert the namespace prefix before the page names e.g. 'Help:Contents'

Now you are ready to perform the export.

Go to Special:Export and paste all your page names into the textbox, making sure there are no empty lines.
Click 'Submit query'
Save the resulting XML to a file using your browser's save facility.

and finally...

Open the XML file in a text editor. Scroll to the bottom to check for error messages.

Now you can use this XML file to perform an import.

Export format

The format of the XML file you receive is the same in all ways. It is codified in XML Schema at http://www.mediawiki.org/xml/export-0.3.xsd This format is not intended for viewing in a web browser. Some browsers show you pretty-printed XML with "+" and "-" links to view or hide selected parts. Alternatively the XML-source can be viewed using the "view source" feature of the browser, or after saving the XML file locally, with a program of choice. It you directly read the XML source it won't be difficult to find the actual wikitext. If you don't use a special XML editor "<" and ">" appear as < and >, to avoid a conflict with XML tags; to avoid ambiguity, "&" is coded as "&".

In the current version the export format does not contain an XML replacement of wiki markup (see Wikipedia DTD for an older proposal). You only get the wikitext as you get when editing the article.

Example

  <mediawiki xml:lang="en">
    <page>
      <title>Page title</title>
      <restrictions>sysop</restrictions>
      <revision>
        <timestamp>2001-01-15T13:15:00Z</timestamp>
        <contributor><username>Foobar</username></contributor>
        <comment>I have just one thing to say!</comment>
        <text>A bunch of [[text]] here.</text>
        <minor />
      </revision>
      <revision>
        <timestamp>2001-01-15T13:10:27Z</timestamp>
        <contributor><ip>10.0.0.2</ip></contributor>
        <comment>new!</comment>
        <text>An earlier [[revision]].</text>
      </revision>
    </page>
    
    <page>
      <title>Talk:Page title</title>
      <revision>
        <timestamp>2001-01-15T14:03:00Z</timestamp>
        <contributor><ip>10.0.0.2</ip></contributor>
        <comment>hey</comment>
        <text>WHYD YOU LOCK PAGE??!!! i was editing that jerk</text>
      </revision>
    </page>
  </mediawiki>

DTD

Here is an unofficial, short Document Type Definition version of the format. If you don't know what a DTD is just ignore it.

<!ELEMENT mediawiki (siteinfo,page*)>
<!-- version contains the version number of the format (currently 0.3) -->
<!ATTLIST mediawiki
  version  CDATA  #REQUIRED 
  xmlns CDATA #FIXED "http://www.mediawiki.org/xml/export-0.3/"
  xmlns:xsi CDATA #FIXED "http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation CDATA #FIXED
    "http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd"
>
<!ELEMENT siteinfo (sitename,base,generator,case,namespaces)>
<!ELEMENT sitename (#PCDATA)>      <!-- name of the wiki -->
<!ELEMENT base (#PCDATA)>          <!-- url of the main page -->
<!ELEMENT generator (#PCDATA)>     <!-- MediaWiki version string -->
<!ELEMENT case (#PCDATA)>          <!-- how cases in page names are handled -->
   <!-- possible values: 'first-letter' | 'case-sensitive'
                         'case-insensitive' option is reserved for future -->
<!ELEMENT namespaces (namespace+)> <!-- list of namespaces and prefixes -->
  <!ELEMENT namespace (#PCDATA)>     <!-- contains namespace prefix -->
  <!ATTLIST namespace key CDATA #REQUIRED> <!-- internal namespace number -->
<!ELEMENT page (title,id?,restrictions?,(revision|upload)*)>
  <!ELEMENT title (#PCDATA)>         <!-- Title with namespace prefix -->
  <!ELEMENT id (#PCDATA)> 
  <!ELEMENT restrictions (#PCDATA)>  <!-- optional page restrictions -->
<!ELEMENT revision (id?,timestamp,contributor,minor?,comment,text)>
  <!ELEMENT timestamp (#PCDATA)>     <!-- according to ISO8601 -->
  <!ELEMENT minor EMPTY>             <!-- minor flag -->
  <!ELEMENT comment (#PCDATA)> 
  <!ELEMENT text (#PCDATA)>          <!-- Wikisyntax -->
  <!ATTLIST text xml:space CDATA  #FIXED "preserve">
<!ELEMENT contributor ((username,id) | ip)>
  <!ELEMENT username (#PCDATA)>
  <!ELEMENT ip (#PCDATA)>
<!ELEMENT upload (timestamp,contributor,comment?,filename,src,size)>
  <!ELEMENT filename (#PCDATA)>
  <!ELEMENT src (#PCDATA)>
  <!ELEMENT size (#PCDATA)>

Processing XML export

There are undoubtedly many tools which can process the exported XML. If you process a large number of pages (for instance a whole dump) you probably won't be able to get the document in main memory so you will need a parser based on SAX or other event-driven methods.

You can also just use regular expressions to directly process parts of the XML code. This may be faster than other methods but not recommended because it's difficult to maintain.

Please list methods and tools for processing XML export here:

Parse::MediaWikiDump is a perl module for processing the XML dump file.
m:Processing MediaWiki XML with STX - Stream based XML transformation
The m:IBM History flow project can read it after applying a small Python program, export-historyflow-expand.py.

Details and practical advice

To determine the namespace of a page you have to match its title to the prefixed defined in

/mediawiki/siteinfo/namespaces/namespace

Possible restrictions are
- sysop (protected pages)

v t e Wikibooks editor navigation
Content	Stacks (Departments · Adding books · Alphabetical classification) · Featured books (Criteria) · Cookbook (Help pages · Special pages) · Wikijunior (Department) · Computing · Engineering · Humanities · Languages · Mathematics · Science · Social sciences · Standard curricula · Wikibooks and MediaWiki help · PDF versions · Requested books
FAQs	Wikibooks overview · Readers · Contributing · Editing · Administration · Problems
Help	Glossary · Navigating · Pages (Creating, Namespaces, Discussing, Archiving, Moving, Merging, Deleting) · Tracking changes · Account management · Contributing (Why and why not contribute? Starting a book) · Editing · Categories · Files · Tables · Variables · Formulas · Quizzes · Local manuals of style (Wikijunior) · Collections · Importing · Editnotices · Database download · Development stages · Dialog (Assistant) · Export · Print versions · Help books (Using Wikibooks · Editing Wikitext · CSS)
Templates	Template help · Template index (Book information · General · Sources · Links · Navigation · Files · Media · Maintenance · Licensing issues notices · Deletion · Stubs · User messages · User notices · User pages · Discussion · Featured books · Policies and guidelines) · Template categories · Making Templates A101 · Templates Ready to Use
Reading room	General · Proposals · Projects · Featured books · General · Technical · Administrative · Deletion · Undeletion · Import · Permissions · Bulletin Board
Useful pages	Accuracy dispute · Authorship · Autoblock · Book donations · Categories · Dealing with vandalism · Dewikify · How to obtain public domain books · Language Learning Difficulty · Licensing guidelines · NPOV dispute · Reading Levels · Render as PNG · Subject pages · Textbook standards · User rights · Wikibooks for course instructors · Wikibooks for Wikimedians · Wikimedia
Other project pages	Welcome · Sandbox · Contact us · Disclaimers category · CCO Resources · Shortcuts · Community Portal · Maintenance · WikiProject · Wikibooks essays (Essays category) · Book sources · Wikibooks for teaching (Guidelines for class projects · List of class projects) · WikiNode · Edit filter · CC-BY-SA License · GFDL License
See also: Policies and guidelines (Category)