OLAC logo

OLAC Implementers' FAQ


Answers to Frequently Asked Questions about Implementing an OLAC Metadata Repository

  1. Is it required that an OLAC metadata repository describe an archive?
  2. What do the records in an OLAC metadata repository represent?
  3. How do I get started on implementing an OLAC metadata repository?
  4. How do I decide what to treat as an individual item?
  5. How do I create records for a web database?
  6. Where can I find good examples of sample metadata records?
  7. How do I decide if I should implement a static repository or a dynamic repository?

Implementing a static repository

  1. Where can I find complete examples of static repositories?
  2. How would I create and maintain a static repository by hand?
  3. Where can I find sample code for generating a static repository from a database?
  4. How do I test whether my static repository is valid?
  5. Where do I put my static repository when it is ready to publish?

Implementing a dynamic repository

  1. Where can I find complete examples of dynamic repositories?
  2. Where can I find sample code for implementing a dynamic repository?
  3. What are resumption tokens and how do they work?
  4. How do I test my dynamic repository to make sure it works?

Registering and improving a repository

  1. How do I register my repository with OLAC?
  2. How do I improve the quality of the metadata in my repository?
  3. What do I do when I want to update the information in my repository?
  4. Why aren't my changes being harvested?

General questions

The following questions are answered elsewhere in the general OLAC FAQ:

  1. What is the Open Language Archives Community?
  2. What is a language resource?
  3. What is language resource discovery?
  4. What is metadata?
  5. What is the OLAC Metadata Set?
  6. How is OLAC metadata disseminated?
  7. Who can benefit from using OLAC metadata?
  8. What is meant by Open in the context of Open Language Archives?
  9. How does a language archive join OLAC?
  10. Can an individual or small archive join OLAC without bothering with formats and protocols?
  11. How are language standards like the TEI represented in OLAC?
  12. How do I contribute new proposals for OLAC metadata?

Is it required that an OLAC metadata repository describe an archive?

In the prototypical case, a participating institution is an archive that curates language resources. In this case, the OLAC metadata repository it implements is a catalog of its archival holdings. (Participants that operate such archives can be identified by the presence of the <archivalSubmissionPolicy> element in the <olac-archive> description that is part of the Identify response of their repository.) But operating an archive is not a requirement. The mission of OLAC is to create a "worldwide virtual library of language resources" and anyone who can contribute useful information to the catalog of known language resources is invited to do so. This may include contributions from individuals rather than institutions. It may also include metadata repositories that are indexes to language resources supplied by others or that describe entry points into an online database of language-related information.

What do the records in an OLAC metadata repository represent?

Each record in an OLAC metadata repository represents a language resource that the participating archive wants to inform the world about. The resource could be a physical object like a book, a CD-ROM, a wax cylinder or a box of unpublished field notes. In such cases, the metadata record allows potential users to discover that the resource exists. When the resource is a digital object that is posted on the web (such as a document, a corpus of recordings, a database or a software program), then the metadata record can go a step further to provide access to the resource by supplying the URL in the <dc:identifier> element. The record need not describe resource that the contributor actually has under archival control. For instance, a record could be an annotated bibliography entry in which the contributor has added value to the basic citations by supplying a description and subject classification. Or a record could describe a web page that is an index page giving links to language resources that are posted on other web sites.

How do I get started on implementing an OLAC metadata repository?

An OLAC metadata repository is essentially a set of language resource descriptions. A good place to start is to look at the OLAC Metadata Usage Guidelines to learn how OLAC describes language resources. That document defines and illustrates each of the possible elements of a language resource description. It will be helpful at the same time to be looking at examples of complete metadata records from existing OLAC participants. A key decision you must make before you can begin implementing is to decide what to treat as an individual item. As you are looking at examples from other OLAC participants, make note of records that match your situation and which you could use as models for resource description in your repository. However, rather than simply using an existing record as a template for your own records, you should evaluate it against OLAC's best practice recommendations for language resource description. In the process you may discover ways to improve the quality of the metadata records you will produce.

Once you have a good idea of what the records in your repository will be like, the next step is to read carefully the OLAC Repositories standard. You will see that it defines two approaches, implementing a static repository versus a dynamic repository. You need to decide which kind of repository to implement, then focus on the corresponding section of the OLAC standard. If you are implementing a static repository, you will want to look at the more complete Specification for an OAI Static Repository which is the standard from the digital library community that the OLAC standard is based on. You may also find it helpful at this stage to look at complete examples of static repositories. When you are ready to begin implementing, you may follow the instructions for creating a repository by hand or for using a script to generate a repository from a database.

Similarly, if you are implementing a dynamic repository, you will need to consult the complete specification of the Open Archives Initiative Protocol for Metadata Harvesting and may want to see complete examples of dynamic repositories. When you are ready to begin implementing, you will want to start by searching for sample code that you can use as a basis for your implementation.

How do I decide what to treat as an individual item?

The OLAC Repositories standard addresses this issue in a section on Guidelines concerning relevance and granularity. The basic guideline is this: "A metadata repository should treat resources with a single provenance as constituting a single unit with respect to OLAC metadata and should, therefore, describe them within a single record." For published resources, the publication unit typically constitutes the appropriate unit for the OLAC metadata record. For unpublished papers presenting findings of research, these closely parallel typical published works, and can be treated at a comparable level in an OLAC metadata record. For primary source materials (e.g., recordings, transcriptions, annotations, notes, data sets), the typical practice of archivists is to gather such materials into collections based on shared provenance—this is, based on having a common origin and history. These collections are then the primary units for metadata description, resulting in OLAC records of DCMI Type Collection. See the section on Granularity of resources in the OLAC Metadata Usage Guidelines for a more in-depth discussion of the principle of provenance as applied to collections and metadata within the OLAC context.

How do I create records for a web database?

There are many valuable language resources that are underused because they are part of the Deep Web (that is, the portion of the Internet that is obscured from discovery by general search engines because the resources are in a database that is accessible only via a search interface on its host site). Such resources can be brought to the indexable web by creating OLAC records for them, since it is a built-in service of OLAC to convert every record into a page that gets crawled by web spiders. (To see the set of pages for a given archive, click the Archives link on the OLAC home page, click the "More Details" link for the desired archive, and then click the "Records in Archive" link on the resulting Archive Details page.)

The only requirement on a web database for using an OLAC repository to expose and access its language resources is that it have publicly accessible URLs for the resources. These often involve a base URL that uses parameters to provide arguments to a query. For instance, the resources in the LINGUIST List Language Resources repository and the ODIN: Online Database of Interlinear Text repository are dynamic pages that generate a listing of everything held in the database about a particular language; the ISO 639-3 code for the language is a parameter to the URL that is given in the <dc:identifier> element of each record. (Click the "Records in Archive" link on the referenced Archive Details pages to see sample records.)

Where can I find good examples of sample metadata records?

To see the metadata records from a given archive, click the Archives link on the OLAC home page, click the "More Details" link for the desired archive, and then click the "Records in Archive" link on the resulting Archive Details page. Alternatively, you may get to the same place by going to the List Records from an OLAC Archive page, selecting an archive from the dropdown list, and then clicking the Submit Query button. When you click on the identifier for a particular item, you will see an HTML representation of the metadata. To see the underlying XML format of the OLAC metadata record, click on the "OAI-PMH request for OLAC format" link.

The following are good examples of records for different kinds of resources:

The Metadata Usage Survey tool may be used to see examples of how individual metadata elements have been used throughout the OLAC catalog. Note, however, that many uses do not conform to best practice recommendations. A good way to use the tool is to identify ways of using elements that do conform to best practice and then click on the link for the number of occurrences to see a list of the records in which it has been used in this way. It is often helpful to see these elements in context to get ideas of how best to use them in complete metadata records.

How do I decide if I should implement a static repository or a dynamic repository?

There are two basic approaches as laid out in the OLAC Repositories standard. The first is to build a static repository. In a static repository, the entire catalog for the repository is expressed in a single XML file that contains all of the metadata records. That file is registered with the OLAC Static Repository Gateway which is a live web service that implements the OAI Protocol for Metadata Harvesting in order to provide access to the contents of all registered repositories. This is the simpler of the two approaches and can be used when the repository is relatively small—the OLAC Static Repository Gateway is designed to handle repositories ranging from 1 to 5,000 records. When there is no existing catalog database, the implementer can use an XML editor to create and maintain a repository by hand. When there is an existing catalog database, the implementer can use a script to export it to the proper XML format.

The second approach is to implement a dynamic repository. This approach can be used for a catalog of any size, but when the catalog is larger than 5,000 records, it becomes necessary to use this approach. In a dynamic repository, the implementer writes program code that resides at a base URL on a web site and responds to the requests of the OAI Protocol for Metadata Harvesting in order to provide dynamic access to information in an existing catalog database. The trickiest part about implementing a dynamic repository is that it is necessary to implement flow control using a resumption token mechanism in order to ensure that the responses to individual protocol requests are not exceedingly large. The OAI community considers half a megabyte to be a reasonable response size (which corresponds to about 500 records in a typical repository).

Implementing a static repository

Where can I find complete examples of static repositories?

The last thing in the section on static repositories in the OLAC Repositories standard is a link to a complete example. Clicking this link retrieves an XML document which is a hypothetical sample repository containing two records.

It is also possible to see the XML document behind each of the static repositories that have been implemented by participating archives. Approximately nine-tenths of the repositories are implemented in this way. A static repository can be identified by clicking the Archives link on the OLAC home page, then clicking the "More Details" link for the desired archive. Find the line that says "Base URL" and inspect the URL. If it begins with http://www.language-archives.org/sr/, then it is a static repository. The remainder of the Base URL is the URL (minus the http://) of the underlying XML document. In order to retrieve that document, copy the repository URL (by selecting everything in the Base URL following /sr/) and then paste it into a web browser.

How would I create and maintain a static repository by hand?

Forthcoming.

Where can I find sample code for generating a static repository from a database?

Forthcoming.

How do I test whether my static repository is valid?

Three possible approaches are described below. Ultimately the third approach is the final test that every static repository must pass. But before the repository is placed on a publicly accessible web site (see next question), one of the first two methods may be used while the repository is in preparation.

  1. Using a schema validator running on your own machine. Forthcoming.

  2. Using a web-based schema validation service. Forthcoming.

  3. Using the OLAC validation service. After the static repository file has been placed on a publicly accessible web site (see next question), the final test of conformance to the OLAC Repositories standard is made by using the OLAC Archive Registration page. Simply paste the URL for the location of your repository file into the text box and click the "Validate Only" button. If the final line of the report says "SUCCESS," then you are ready to proceed to the registration step. On the other hand, if the final line says "FAILURE," you still have work to do. Make note of all the individual tests that failed and fix the offending content in the repository; then repeat the validation process. In the case of the XML Schema Validation, click on the "error logs" link to see the list of errors generated by the validator. The two validators report the errors differently, so look at both logs to get a fuller idea of what is wrong.

Before the job is finished, you will also want to test some of your records as described below to discover ways to improve the quality of the metadata in your repository.

Where do I put my repository when it is ready to publish?

Forthcoming.

Implementing a dynamic repository

Where can I find complete examples of dynamic repositories?

Only about one-tenth of the OLAC repositories are implemented in this way. Two of them, Ethnologue and SIL Language and Culture Archives, implement a documentation page that provides a large set of links for testing all the verbs of the harvesting protocol with various combinations of parameters, including combinations that should generate error responses. You may experience a sample implementation and get a better feel for how the protocol works by clicking on one of the above links and trying the various links on the documentation page. The text of each link gives the complete parameter string that is appended to the Base URL to form a full URL. Clicking the link will retrieve the XML document which the dynamic repository returns as the response to the request. The full request URL will show in the location box of your browser. Click the browser's Back button to go back to the documentation page and try another protocol request.

Where can I find sample code for implementing a dynamic repository?

Forthcoming.

What are resumption tokens and how do they work?

Forthcoming.

How do I test my dynamic repository to make sure it works?

Forthcoming.

Registering and improving a repository

How do I register my repository with OLAC?

Forthcoming.

How do I improve the quality of the metadata in my repository?

OLAC has defined best practice recommendations for language resource description. You should review a few records from your repository against this list of recommendations to see if there are things you could change in the implementation of your repository (or of the underlying database on which it is based) to follow even more of the recommendations. It is not possible to automate tests for compliance to all the recommendations, so a complete review must be done by hand. If you would like help doing such an audit of metadata quality, contact the OLAC Coordinators; the link in the footer of this page is a mailto: link to their addresses.

However, many aspects of metadata quality can be automatically tested. OLAC has implemented metadata metrics to give implementers feedback on how well they have followed those recommendations. The Freestanding Metadata Service allows you to learn the metadata quality score for any record. Simply paste the complete <olac:olac> record into the text box on the page and click the "Analyze" button. If the format of the record is not valid, the resulting page will report the errors. Otherwise, the resulting page shows an analysis of the record. The bottom part of the page gives the metadata quality score along with recommendations on changes that would improve the score. It is possible to click the browser's Back button to return to the record entry form, edit the record to make changes that you think will improve the quality score, and then click "Analyze" again to test the result.

What do I do when I want to update the information in my repository?

Forthcoming.

Why aren't my changes being harvested?

Forthcoming.


Comments? Further questions? Please click the mailto: link below to give us feedback so that we can make this page more helpful to future implementers.

http://www.language-archives.org/tools/faq.html
Last revised: 28 July 2008

Steven Bird and Gary Simons