Wiki-based archival description and storage/Print version

From Wikibooks, open books for an open world
Jump to navigation Jump to search

This wikibook is a practical manual for organizations and individuals who have archives that they need to arrange, describe, digitize, and store — and who wish part of these archives to be part of "the sum of all knowledge" in the Wikimedia movement. It explains a system of safe, permanent storage for archival items that encompasses everything from initial accessioning and evaluation; through arrangement, description, and digitization (or digital conversion for digitally-created items); to techniques of shelving, boxing, and conservation. It includes details of how to manage collaborative (crowd-sourced) transcriptions on Wikisource, and the integration of knowledge from the archives into Wikipedia articles.

If you are using the guidelines in this wikibook to manage your archives, we invite you to also edit these pages as you learn what works and what doesn't, in order to share your knowledge with the rest of the community. You may also like to share your specific experiences by creating learning resources on Wikiversity.

Wikis[edit]

What is an archive?

An archive is an accumulation of primary source records that have accumulated over the course of an individual or organization's lifetime, that are kept to show the function of that person or organization. The records have been naturally and necessarily generated as a product of regular activities, and may also have been collected and selected intentionally in order to represent the history of the archive owner. Read more on Wikipedia…

What do we mean by 'wiki-based'? Simply that the system described here makes use of the network of wiki sites operated by the Wikimedia movement, as well as the MediaWiki software that these sites run on. Being wiki-based in this way implies a few characteristics of an archival management system: it is web-based and spread across a number of different web sites; many people can contribute to it, and all their contributions are visible to each other and traceable; and that the supporting software is not as prescriptive of how the system works as is the case with a traditional archival database.

Your archive will exist across a number of wikis. Firstly, for material that is old enough to be public domain or that is freely licenced, the following Wikimedia sites will be used:

  • Wikipedia, for encyclopedic articles about items or their subjects;
  • Wikisource, for transcriptions of text;
  • Wikimedia Commons, for digital files (scans, photographs, etc.); and
  • Wikibooks, for the manual that you are now reading.

Secondly, you will have a number of your own wikis, completely under your control and responsibility:

  • a public wiki for material that you are allowed to publish but which is not suitable for any of the Wikimedia sites (primarily due to copyright restrictions); and
  • some number of closed wikis, one for each broad group of users you need to give access to (often, there is just one, called the 'staff wiki' or 'family only wiki' or similar).

For these wikis, we outline the technical infrastructure and administrative processes that you will require.

Read more about archival wikis.

Description[edit]

In the archives' profession, the activity of archival 'description' usually refers to what is done with records while they are being arranged for storage. It is when detailed metadata is recorded about each item, as well as the item's place in the broader context of the whole archive (and society). In this wikibook, we take a more general view and say that description is taking place at every stage from acquisition, through accessioning, as well as the traditional stages of item- or series-level description.

Each act of description is, in one sense, the same: a single wiki page is created, and on it as much information as possible is recorded about the item. A page can also contain images, photographs, and videos, as well as links to more detailed descriptions. The most important things to describe are: where it has come from (provenance) and how it was originally stored (original order), as well as what has been done with it now and where it can be found (both online and off).

One advantage of every level of description being done in this way is that this creates a unique and meaningful identifier for what is being described: the URL of the wiki page. For instance, the first page that is created is one for the archive as a whole — and immediately the archive has a home page on the internet, and a globally-unique way of being referenced.

Read more about description.

Storage[edit]

The storage of physical items, their digital representations, and digitally-created items is very important. In the wiki system, it is done progressively alongside description because that way it is possible to place each item in its final storage place and give it an identifier.

Here, we outline a system of storage of physical items that works to not only ensure (as far as is possible) their long-term preservation, but also ties in with digitizing those items and making the digital representations available online. The system is also defined by access control (for example, by ensuring that the contents of a single folder or box are all at the same level of access, and thus will be described together on one wiki). Storage of digitally-created items is easier, at least so far as there being no requirement for having a physical storage facility; the preservation and long-term access requirements are just the same though. The approach here is to treat digitizations and digitally-born files as effectively equivalent and store them in the same way in MediaWiki.

Broadly speaking, the storage process is as follows:

  1. Physical items:
    1. Physical items are scanned and/or photographed (the original boxes etc. that they occupy are included in this).
    2. The items are added in accession order to boxes and folders.
  2. Digital items are converted to appropriate long-term formats.
  3. The resulting files (both digitizations and digitally-created) are uploaded to Commons or one of the archives' wikis.
  4. A page is created for each item on its wiki, where all metadata is recorded — including the box or folder identifier. Some items will have just a single page; some will comprise multiple files (e.g. the front and back of a photograph if there's an enscription on the back) and so will have a page for each of these and an index page where the metadata is stored.
  5. The item's page is printed and stored alongside the item. This print contains the full URL of the item, to serve as a unique identifier.

Read more about storage.

External links[edit]


Wikis[edit]

The key to making wiki-based archiving work is to use Wikimedia sites wherever possible, and to use your own wiki websites when not. This means that whenever an item is for general access and appropriately licenced, it's digital representation should be uploaded to Commons (and, for textual works, also to Wikisource). If it is not appropriate to upload there and does not require any privacy controls, upload it to your own public wiki. And then, for closed-access material, you will have one or more invitation-only wikis, to which you can add private material. Notably, each item should only be documented on a single wiki, and referenced from all other places where required.

The set of wikis will form a network across which your archive will be fully documented. Links between pages and between wikis are easy to create, and files uploaded to some wikis can be used on others (more about this below).

This wikibook is aimed (as mentioned in the introduction) at smaller archives and it is for these that the participation in the Wikimedia movement is of most value. Smaller archives often don't have many resources in particular areas (money, time, expertise, etc.) and one area they can get caught out in is the provision of a high-quality dependable and permanent presence on the internet. Using Wikimedia wikis takes care of this: you don't need to worry about security, disaster recovery, availability etc. and can just focus on your content.

Of course, there is lots of material that you won't want to upload to the Wikimedia sites, and for this you'll have your own wikis. These will run MediaWiki and

Each item, series, file, storage location, etc. has a wiki page, and therefore a URL that can be used as a globally unique identifier.

Wikimedia sites[edit]

Wikimedia Commons[edit]

Wikimedia Commons is the most important project in the whole system. It is where all photos, documents, scans, audio recordings, movies, books and other files are stored and described. Everything on Commons must a) be freely licenced; b) be open for viewing by anyone (including its metadata); and c) work towards the goal of sharing the sum of all knowledge with all of humanity.

We don't provide a full overview of how to use Commons here; for that, please read the documentation at Commons:Help:Contents (and then come back and add anything here that you think is missing!).

Wikisource[edit]

Wikipedia[edit]

Wikipedia is at the end of the archiving process, because it is where the knowledge from the archive is distilled and turned into neat and well-researched articles about the topics of the archive. For example, if you're working on a set of letters

Your own wikis[edit]

For material that can not be added to Wikimedia sites, you should set up your own wikis. These will also be run on MediaWiki and so will be able to be tied in seemlessly with the Wikimedia ones, and skills and techniques will be common to the whole archive system.

You can either host your own wikis (if you have people with interested in learning how) or use one of the MediaWiki hosting services that are listed on mw:Hosting.

Public wiki[edit]

Your archive should have a single public wiki of its own, preferably on its own domain name (e.g. http://curedalesarchive.org). This wiki will form the main entry point into your archive, and link to material that's stored on all others. (There will also probably be ancillary overview pages for your archive on Commons and Wikisource.)

Your public wiki should be added to the WikiApiary, for greater discoverability. You might also like to sign up to their notification service that will tell you about possible issues with your wiki.

Upload full XML exports and an archive of your images directory to the Internet Archive (or at least tell the WikiTeam about your wiki, and their bot will do this for you). This means that even in the far distant future when your site has been retired researchers will still be able to resurrect your content.

Private wikis[edit]

A separate wiki should be created for each distinct group of permissions required for your material. Often, only one private wiki is required, and it is best to not exceed three or four in most cases.

Interwiki links and file usage[edit]

Description[edit]

Archival description is the process that takes place while records are being prepared for storage. In wiki-based archiving, this process can happen at every stage from before accession through to the on-going access of the records. It is an iterative process, in which descriptions are progressively refined as time goes on.

Each act of description is, in one sense, the same: a single wiki page is created, and on it as much information as possible is recorded about the item (where an 'item' can also be a grouping of various records). A page can also contain images, photographs, and videos, as well as links to more detailed descriptions. The most important things to describe are: where it has come from (provenance) and how it was originally stored (original order), as well as what has been done with it now and where it can be found (both online and off).

One advantage of every level of description being done in this way is that this creates a unique and meaningful identifier for what is being described: the URL of the wiki page. For instance, the first page that is created is one for the archive as a whole — upon which, the archive has a home page on the Internet, and a globally-unique way of being referenced.

Procedure[edit]

From the top-level wiki page of the archive (which may be the main page, or any other page), create a high-level list of the parts of the archive. Each part should be a link to another wiki page, and then each of those pages will have a further list of the items contained therein.

You should create pages for the archive itself and all of its constituent parts — from the points of view of the original order and provenance of the items, as well as the current arrangement and storage.

Provenance[edit]

The first set of pages to be created will be those describing the origin of the archive's contents. These will take the form of descriptions and lists of the boxes, folders, datasets, and other sources of the contents.

These pages are the first places that the scans, photographs, and other documentation will be added to.

Subject[edit]

The second set of pages to add to the wiki are those that group the above items by subject. So, for example, there will be pages listing all items pertaining to an individual person, or locality, as well as groupings by medium (book, photograph, etc.).

Storage[edit]

The last aspect of description comes after the storage of the items, because it involves the documentation of the final storage locations (including on the Internet) of all items.

These wiki pages are also printed and kept alongside the physically-stored items, in order for the items to be linked back to the wiki contents. Each printed page has a URL at the bottom linking to the specific revision of that page as it was when it was printed.

Storage[edit]

This page explains how to store physical and digitally-created archive items in a wiki system of archiving. It assumes you have a functioning wiki set up, and does not go into detail about how to describe items.

The rough workflow for a single item is as follows:

  1. its access level is determined;
  2. it's digitized (for physical items) or converted to a long-term preservation format (for digital-born items);
  3. the description is added to the appropriate wiki (with an identifier being assigned); and
  4. the files are uploaded to the wiki.
  5. In addition, for physical items:
    1. the item is stored in an appropriate folder, box, or other container of the correct access level;
    2. an identifier (URI) from the wiki is stored with the physical item.

Access control[edit]

The first aspect to consider is access control, which is simply a matter of defining who can view or edit what parts of the collection. A separate wiki is used for each group of people to which access needs to be granted, and the physical storage system follows the same groups and nomenclature. The smallest number of groups (and wikis) is used, for example a common configuration is to have three access groups (and therefore three wikis): one completely open to the general international public; a second for members and associates of the organisation; and a third closed wiki that is only accessible to staff and other trusted people. Items can be moved between wikis as required.

The physical storage system follows the same division of access control, so that whole folders or boxes are

Physical storage[edit]

Physical items are stored in accession order in their boxes and folders. This results in the smallest amount of handling of the items and avoids the need to re-order the storage system as new items are accessioned. All items are accessible via their on-wiki itentifiers and other metadata.

Digitization[edit]

Physical items are photographed or scanned or in other ways reduced to a digital representation. This includes documentation of the original ordering and storage of the items.

Flat-bed scanning[edit]

A flatbed scanner is required for digitizing flat documents, photographs, negatives, glass plates, plans, maps, etc.

An A4-sized scanner suitable for many requirements, especially photos and documents. Foolscap and other larger documents will require an A3 scanner, but larger sheets are often much larger and will require specialty equipment often out of the reach of smaller organisations.

Scanners with a backlight are required for film and negative scanning.

The minimum features of a scanner are:

  • A4 size
  • higher than 700 ppi for prints, and higher than 2700 for negatives and slides.

All should be scanned to TIFF or PNG format at at least 600 ppi with a colour depth of 24 bit.

Photography[edit]

t.b.c.

Identifiers[edit]

A page is created for each item on its wiki, where all metadata is recorded — including the box or folder identifier. Some items will have just a single page; some will comprise multiple files (e.g. the front and back of a photograph if there's an inscription on the back) and so will have a page for each of these and an index page where the metadata is stored.

The item's page is printed and stored alongside the item. This print contains the full URL of the item, to serve as a unique identifier.

Backups[edit]

Wikis, like all digital information systems, must be backed up. How to do that is outside the scope of this wikibook, but in addition to regular backups it is recommended that archival dumps of the wikis' pages kept (and public ones uploaded to the Internet Archive, along with the digital files of the wiki).