ETD Guide/Technical Issues/DiTeD and DIENST

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Theses and dissertations are traditionally covered by the legal deposit law in Portugal. Nowadays, almost all the thesis and dissertations are created using word processors, just confirming the fact that science and technology became one of the first areas for digital publishing.

In this context, the deposit of theses and dissertations emerged as an ideal case study for a scenario concerned with a specific genre. For that, the National Library of Portugal promoted the project DiTeD-Digital Thesis and Dissertations from which a software package with the same name originated.

Requirements

Theses and dissertations carry special requirements for registration and access, since their contents are usually used to produce other genres, such as books and papers, or they can include sensitive material related with, for example. patents. This requires the management system functionality to make it possible to the authors to declare special requirements for access, which have to be registered and respected.

Universities have a long tradition of independence in their organization, culture and procedures. As a consequence, soon it was learned that it would be impossible to reach, in the short and medium term, any kind of overall agreement for common formats or standard procedures with the different administrative services. Therefore, the main objective defined for DiTeD was the development, on the top of the Internet, of a framework that would connect the National Library to the local university libraries and would make it possible to support a full digital circuit for the deposit of theses and dissertations.

Architecture

A solution for this framework was found in the DIENST technology [3], which provides a good set of core services. DIENST also has an open architecture that can be used with great flexibility, making it possible to extend its services and build new functionalities. The basic entities of this architecture are shown in Figure 1, as a class diagram in UML - Unified Modeling Language.

diensts.jpg

Master Server

The Master Meta Server provides the centralized services, including a directory of all the local servers members of the system. Only one of these servers must exist in each system.

In DiTeD this server exists at the National Library. It was renamed Master Server, and differs substantially from the original versions developed for DIENST. The original server was designed to manage only metadata, while now it is necessary to manage also the contents of the theses or dissertations and give support to the workflow for its submission and deposit.

DIENST Standard Server

The DIENST Standard Server is the server installed at the university libraries. This server was modified in DiTeD, and renamed Local Server. The following core modules compose it:

  • Repository Service: This is where the documents are stored. It manages metadata structures and multiple content formats for the same document, functions that were substantially extended in DiTeD (to support a specific metadata format, as also to recognize a thesis or dissertation as possibly composed by several files). It is also possible to define and manage different collections in the same server.
  • Index Service: This service is responsible for indexing the metadata and responding to queries. Small adjustments were made in DiTeD to support diacritics in the indexes and queries, a requirement in the Portuguese writing.
  • User Interface: This service is responsible for the interaction with the user. It was extended in DiTeD to support a flexible multilingual interface and a workflow for submissions using HTTP.

Identifiers

Two Local Servers are running at the National Library. One, named Deposit Server, is used to locally store the deposited theses and dissertations coming from all universities (the deposit will consist in a copy, so in the end each thesis or dissertation will exist in two places, the Local Server and the Deposit Server). A second Local Server is used as a virtual system for those university libraries that do not have the necessary technical resources or skills to maintain their own server.

Each thesis or dissertation deposited in DiTeD automatically receives a URN [4], which will be registered and managed by a namespace and resolution service. This is in fact a simple implementation of the concept of PURL - Persistent URL, with the particular property that it resolves any PURL by returning its real URL in the original Local Server, unless it is not available anymore. In this case, it resolves it by returning its URL in the Deposit Server. The entities of this final DiTeD architecture are shown in Figure 2.

The prefix of the URN has the form "HTTP://PURL.PT/DITED", while the suffix is formed by an identifier of the university library (the "publisher") and by a specific identifier of the work itself, automatically assigned locally.

dited.jpg

Workflow

The workflow comprises two main steps: submission and deposit.

Submission

The submission process comprises the following steps:

Delivery: The process starts with the submission by the student of the thesis or dissertation to a local server. In this step the student fills a metadata form, where it is recorded the bibliographic information and the access conditions. All of this information is hold in a pending status, until it is checked.

Verification: In a second step a librarian checks the quality of the submission (a login in the local server gives access to all the pendent submissions). This task is supposed to be assured by a local librarian, but it can be also assured remotely, such as by a professional from the National Library (in a first phase of the project, this task will be assured by the National Library, especially to assure uniformity in the criteria and test and tune the procedures).

Registration: If everything is correct (metadata and contents), the thesis or dissertation is stored in the local repository, and the student receives a confirmation. Otherwise, the student is contacted to solve any problem, and the submission remains in the pending status.

Deposit

The deposit consists in the copy of the thesis or dissertation, as also of its metadata, from the Local Server to the Deposit Server. This is done in the following steps:

What’s new: Periodically, the Master Server contacts the repository of a Local Server to check if there are new submissions. The Local Server replies, giving a list of the identifiers of the new submissions.

Delivery: For each new submission, the Master Server sends a request to the Local Server to deposit it in the Deposit Server. Because this Deposit Server is also a Local Server, this deposit works just like a normal local submission.

Verification: A librarian in the National Library checks the deposit. This double checking is important, especially in the first times of the project, to reassess the procedures and test the automatic transfer of files over the Internet -not always a reliable process).

Registration: If everything is correct, the thesis or dissertation is stored in the deposit repository, the final URN (a PURL) is assigned, and both the student and the local librarian receive a confirmation. The metadata is also reused to produce a standard UNIMARC record, for the national catalogue. If it detected any problem, the local librarian is contacted and the deposit remains in the pending status.

One can argue that, if the Deposit Server is really also a Local Server, than the first step would be excused and the Local Server could perform the delivery automatically after a successful submission. This can be a future optimization, but for now the reason for this extra step is to preserve the requirement of an asynchronous system, making it possible for the Master Server, for example, to better control the moment of the deposit (such as to give preference for the night periods).

Metadata

DiTeD utilizes a metadata structure for theses and dissertations defined by the National Library and coded in XML. This structure contains descriptive bibliographic information about the work and the author, as well as information about the advisers and jury members, access conditions, etc. This metadata structure is configurable at installation time, making the software flexible for use in other countries, or even with other publication genres. Metadata may also be accessed and exported in other formats, like UNIMARC and Dublin Core.

Multilingual interface

DiTeD's user interface has multilingual capabilities, allowing the users to switch between the available languages at any time. The base configuration includes English and Portuguese.

Software availability

The software is maintained by the National Library of Portugal, and distributed freely for noncommercial use. Access to the software package may be requested by email to dited@bn.pt.

References

  1. <http://dited.bn.pt>
  2. <http://purl.org>
  3. <http://www.cs.cornell.edu/cdlrg/dienst/software/DienstSoftware.htm>
  4. Sollins, K; Masinter, L. (1994). Functional Requirements for Uniform Resource Names. RFC 1737.

Next Section: ADT