OLAC FAQ


Answers to Frequently Asked Questions about OLAC

  1. What is the Open Language Archives Community?
  2. What is a language resource?
  3. What is language resource discovery?
  4. What is metadata?
  5. What is the OLAC Metadata Set?
  6. How is OLAC metadata disseminated?
  7. Who can benefit from using OLAC metadata?
  8. What is meant by Open in the context of Open Language Archives?
  9. How does a language archive join OLAC?
  10. Can an individual or small archive join OLAC without bothering with formats and protocols?
  11. How are language standards like the TEI represented in OLAC?
Some of the information below is borrowed from or based on the Dublin Core FAQ.

1. What is the Open Language Archives Community?

The Open Language Archives Community, or OLAC, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources. OLAC was founded at the Workshop on Web-Based Language Documentation and Description, held in Philadelphia in December 2000.

2. What is a language resource?

A language resource is any kind of DATA, TOOL or ADVICE pertaining to the documentation, description or analysis of a human language. Texts, recordings, dictionaries, annotations, field notebooks, software, protocols, data models, file formats, newsgroup archives and web indexes are some examples of such resources. OLAC metadata can be used to describe any kind of language resource. Language resources may be digital or non-digital, published or restricted. A language archive is any collection of language resources and their resource descriptions.

3. What is language resource discovery?

The most familiar methods for language resource discovery are mailing lists, web indexes, and the catalogs of archives and publishers. Users of these methods typically experience low precision and recall—one has to wade through many irrelevant resources, and relevant resources are easily overlooked. OLAC seeks to improve that situation by developing an infrasturcture for language resource discovery. A key part of that infrastructure is a metadata set that is specialized for language resource description.

4. What is metadata?

The simplest definition of metadata is "structured data about data." Metadata is descriptive information about an object or resource whether it be physical or electronic. While the term metadata itself is relatively new, the underlying concepts behind metadata have been in use for as long as collections of information have been organized. Library card catalogs represent a well-established type of metadata; they have served as collection management and resource discovery tools for decades. Metadata can be generated either "by hand" or gnerated automatically using software.

5. What is the OLAC Metadata Set?

The OLAC Metadata Set (OLACMS) is the set of metadata elements that members of OLAC have agreed to use for describing language resources. Uniform description across archives is ensured by limiting the values of certain metadata elements to the use of terms from agreed-upon controlled vocabularies. The OLACMS is equally applicable whether the resources are available online or not. The metadata set consists of all the elements of the Dublin Core Metadata Set, a widely accepted standard for describing resources of all types. To this core set, OLACMS adds a set of refinements and qualifications that are designed for describing fundamental properties of language resources, such as subject language, language data type, and software functionality. The OLACMS Standard uses XML to represent metadata descriptions.

6. How is OLAC metadata disseminated?

OLAC metadata is disseminated using the metadata harvesting protocol of the Open Archives Initiative (OAI). End users can access the metadata using an OAI service provider (which indexes both OLAC and non-OLAC archives based on Dublin Core metadata) or an OLAC service provider (which provides services unique to language resources by using the language-resource-specific metadata in the OLAC Metadata Set).

7. Who can benefit from using OLAC metadata?

Anyone can use the OLAC Metadata Set to describe language resources. Language archives and language software repositories are the most common providers of OLAC metadata. (Participating archives and prospective participants are listed on the OLAC Organization page.) Individual researchers will soon be able to document the resources they manage using a simple form interface. The whole language resources community will be empowered by OLAC metadata, being able to quickly identify relevant resources.

8. What is meant by Open in the context of Open Language Archives?

OLAC is open in the sense that any archive can join, and any individual can access the metadata records of participating archives. Membership and access are free. Also, the process by which OLAC governs itself and makes decisions is visible to all community members and open to their participation. Open does not mean that users are free to do whatever they like with the metadata, nor does it mean that the described language resources are openly available.

9. How does a language archive join OLAC?

OLAC is open to participation by any language archive. To participate, archives must set up an OAI "data provider'', exporting their catalogs to the Dublin Core and OLAC metadata formats for harvesting by the OAI Metadata Harvesting Protocol, and then register with the OAI and OLAC. In general, the catalog remains in its existing format (e.g. in a relational database) and a CGI script permits external services to harvest the records in the prescribed XML format. A second approach avoids the OAI protocol altogether. An entire metadata repository is dumped in a single XML file, and the OLAC virtual data provider takes care of the rest. For more information about both options, please see the page: How to Become an OLAC Data Provider.

10. Can an individual or small archive join OLAC without bothering with formats and protocols?

A simple form interface has been developed which permits individuals and small archives to document their resources. The only requirements for this method is a web browser. For more information please see the ORE page at http://www.linguistlist.org/ore. In addition, any individual interested in language resource development and archiving can join the community by self-subscribing to the OLAC-General mailing list. This is the means by which members of the community are informed of new developments and invited to participate in reviews and working groups.

11. How are language standards like the TEI represented in OLAC?

In OLAC parlance, standards refer to procedures and formats that govern participating services, such as the OLAC Metadata Set or the OAI Metadata Harvesting Protocol. In OLAC, so-called "language standards'' like the TEI are viewed as a kind of language resource called ADVICE since they are not binding on members. OLAC follows a process involving working groups and voting, whereby such advice can become identified as community-agreed best practice.


Steven Bird and Gary Simons