The Open Language Archives Community

A Symposium at the 3rd Language Resources and Evaluation Conference, Las Palmas, Spain, 29-31 May 2002

Convened by Steven Bird (University of Pennsylvania), Hans Uszkoreit (Deutsches Forschungszentrum für Künstliche Intelligenz) and Gary Simons (SIL International)

Materials from the Launch

[ pdf | ppt ] Gary Simons: The Seven Pillars of Open Language Archiving
[ pdf | ppt ] Helen Dry and Anthony Aristar: OLAC, EMELD and ``Us''
[ pdf | ppt ] Hans Uszkoreit: Ontologies for Language Resource Description (to be provided)
[ pdf | ppt ] Laurent Romary and Zina Tucsnak: Experience with OLAC for the ATILF archives
[ pdf | ppt ] Martin Wynne: Opening the Archives: OLAC, TRACTOR and the OTA
[ pdf | ppt ] Steven Bird: Getting Involved in OLAC

2-page handout

Overview

The goals of the symposium were to disseminate the OLAC vision to the language resources community, and to encourage the community to archive and publish their resources using archival formats, and to document them using standard metadata.

Presentations addressed the following questions:

What is the Open Language Archives Community?
Why is language archiving important?
What does it take to participate in OLAC?

Discussion time was used to clarify the OLAC model and to identify and address any concerns raised by the audience. Substantive feedback will help to guide the future evolution of OLAC.

Program

The Seven Pillars of Open Language Archiving (Gary Simons, SIL International). Digital archiving of language documentation and description on the World-Wide Web holds the promise of unparalleled access to language information. But if it is not done well, it also offers the specter of frustration and chaos on an unparalleled scale. This talk presents an executive summary of our vision for the kind of infrastructure that would unlock the promise. Special focus is given to the seven pillars on which such an infrastructure would be erected: DATA, TOOLS, ADVICE, GATEWAY, METADATA, REVIEW, and STANDARDS.

OLAC, EMELD and ``Us'' (Helen Aristar-Dry and Anthony Aristar, Linguist List, Eastern Michigan University and Wayne State University). Over the past 11 years, the Linguist List has become the primary source of information for the linguistics community, reaching out to over 15,500 subscribers worldwide, and having four complete mirror sites. The Linguist List will be augmenting its service by providing the primary entry point for OLAC, and permitting linguists to browse distributed language resources at a single place. This talk will include a demonstration of a new Linguist List ``service provider'', and also report progress on a new NSF-sponsored project to create a ``Showroom of Best Practice'' for language documentation.

Ontologies for Language Resource Description (Hans Uszkoreit, Deutsches Forschungszentrum für Künstliche Intelligenz). The success of global digital archiving initiatives depends on usable and widely accepted schemes for describing informational resources. In order to arrive at useful metadata sets one has to combine accepted generic resource ontologies with transparent specialized ontologies for one or more relevant subjects areas. A useful ontology for the transdisciplinary area of language technology has to unify ontologies of language research and resources with ontologies of several areas of information technology. We will demonstrate how such an ontology can be developed and employed for creating interfaces between the OLAC metadata set and other resources such as the portal LT-World, the ACL/DFKI Software Registry, and the Survey of the State of the Art in Language Technology.

Opening the Archives: OLAC, TRACTOR and the OTA (Martin Wynne, Oxford Text Archive). This talk will reflect on the experience of delivering resource descriptions from two archives: the TELRI Research Archive of Computational Tools and Resources (TRACTOR) and the Oxford Text Archive. The process of migration and harvesting of the metadata records from both archives is examined. The merits and drawbacks of the OLAC metadata set, and its suitability to multilingual language tools and resources are appraised, and some alternatives are considered. Some preliminary thoughts are offered on the experience of participating in OLAC with two significant archives, which have vastly different holdings, archiving and distribution policies and metadata standards.

Experience with OLAC for the ATILF Archives (Laurent Romary and Zina Tucsnak, Analyse et Traitements Informatisés de la Langue Française). We will provide some feedback concerning the experience of implementing OLAC in the context of ATILF, an institution which acts as a central point in France for online delivery of textual and lexical data. We will show that, despite the simplicity of deployment offered by the OLAC principles and format, the main difficulty of implementing OLAC in a large institution like ATILF is to provide coherent metadata covering a wide and heterogeneous set of information sources, both from a technical and a conceptual point of view. This requires that we implement various methods of recovery from those information sources.

Getting involved in OLAC (Steven Bird, University of Pennsylvania). This talk will describe the OLAC ``starter kit'', a low entry-cost method for resource creators to document their work. Various routes for exporting existing archive catalogs to OLAC format will be described. The talk will conclude with a call for widespread participation in OLAC.

Steven Bird and Gary Simons