OLAC News Archive
In October 2007, the US National Science Foundation put on a workshop to assess the state of the art in documenting endangered languages and to plot directions for the future. There were 25 invited participants representing funding agencies, data providers, tool providers, and archives. OLAC was invited to present about its contribution of developing an infrastructure for indexing endangered language documentation. The report shows that OLAC indexes resources from over 3,100 of the world's language, including 2,186 living languages that are known to have fewer than 100,000 speakers.
The US National Science Foundation has funded a project OLAC: Accessing the World's Language Resources which aims to greatly improve access to language resources for linguists and the broader communities of interest, by achieving an order-of-magnitude increase in the coverage of the OLAC catalog and in the use of OLAC search services. The project will do so through two main areas of activity: developing guidelines and services that encourage language archives to follow best common practices that will facilitate language resource discovery through OLAC, and developing services to bridge from the resource catalogs of the library and web domains to the OLAC catalog.
In 2005, the OLAC Search Engine handled 824,676 queries, an average of 2259 per day or an average 68273 per month. The most popular languages searched for in 2005 were Dutch, English, Quechua, Arabic, Greek, German, Chinese, and Malay. Only 35% of queries specified a particular archive, the majority were generic searches across all archives. The most commonly searched repository was SIL-LCA, followed by PARADISEC and SCOIL.
The 2006 E-MELD workshop will focus on 'Tools and Standards: the State of the Art.' This annual workshop marks the culmination of the 5-year E-MELD project; one goal of the workshop is to review digital standards ratified by the community in prior workshops on text, lexicons, databases, and annotation. The workshop will be held in July, in conjunction with the LSA Summer Meeting at Michigan State University.
A tutorial on OLAC was held at the Annual Meeting of the Linguistic Society of America in January. The focus of the presentations was on audio and video recording. The event was officially sponsored by the LSA's Committee for Endangered Languages and their Preservation.
Four repositories joined OLAC in 2005: the Audio Archive of Linguistic Fieldwork at the Berkeley Language Center, UC Berkeley, USA; the Comparative Corpus of Spoken Portuguese at IEL Unicamp, Campinas, Brazil; ODIN - The Online Database of Interlinear Text at California State University, Fresno, USA; and Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC), at the Universities of Melbourne, Sydney, New England, and Australian National University.
The 2005 E-MELD workshop focussed on linguistic ontologies and data categories as aids in linguistic annotation and as tools for the fine-grained search and retrieval of language documentation. It will be held in July, in conjunction with the LSA Institute at MIT.
This tutorial provided a forum where people who are compiling documentary linguistic resources could learn about current best practices for creating and conserving those resources. The tutorial was organized by Jeff Good (MPI Leipzig) and Heidi Johnson (University of Texas, Austin and AILLA) and held at the annual meeting of the Linguistic Society of America, in Oakland, California, in January 2005.
OLAC will be organizing a booth in the publisher's exhibit hall at the Linguistic Society of America meeting in San Francisco in January. OLAC search services will be demonstrated, and OLAC archives are invited to send someone to help staff the booth, to give live demonstrations of their web interfaces and to hand out flyers for their projects. The booth and web access are expensive and we welcome any offers of sponsorship. If you would like to be involved, please contact Heidi Johnson.
The Linguistic Data Consortium at the University of Pennsylvania now hosts a powerful OLAC search interface. Features include result summaries by archive, result ranking, approximate language name matching, and country-based searches. (The service was developed by Amol Kamat, Baden Hughes, and Steven Bird at the University of Melbourne, with sponsorship from the Department of Computer Science and Software Engineering and the Linguistic Data Consortium.)
The archive report cards, added to the OLAC site in March, give summary statistics for each repository and an assessment of the quality of the repository's metadata. The assessment is based on OLAC and Dublin Core guidelines. An updated version of the system is now available, including a revised evaluation algorithm to account for changes in DC recommendations, and revised labelling of the reports for consistency with OLAC terminology. The evaluation metric rewards the use of OLAC extensions (controlled vocabularies), and what we consider to be the most important DC elements: title, date, subject, description, and identifier. The report cards can be accessed by clicking the "REPORT CARD" links on the OLAC Archives page. (The service was developed by Amol Kamat, Baden Hughes, and Steven Bird at the University of Melbourne, with sponsorship from the Department of Computer Science and Software Engineering, and the Linguistic Data Consortium.)
The OLAC Metadata standard has been promoted to `adopted' status by the OLAC Council following a 12 month period of experimentation by OLAC implementers. This document defines the format used for the interchange of metadata within the framework of the Open Archives Initiative. The metadata set is based on Qualified Dublin Core, but the format allows for the use of extensions to express community-specific qualifiers.
In a recent Survey of Digital Library Aggregation Services, published by the Digital Library Federation, Martha Brogan praised the Open Language Archives Community as exemplary. She concluded her discussion with the following statement:
OLAC is exemplary in several ways: the technical and social infrastructure that it has developed to support its community of contributors, based on shared principles and standards; the resources that it provides at its Web site about its purpose, scope, history, tools, news and events; and the efforts of its two leaders -- Gary Simons and Steven Bird [2003a, 2003b, 2003c] -- to articulate the challenges, analyze the options, and recommend possible solutions to their community of contributors in order to improve OLAC. With the formal appointment of an Outreach Working Group and its other efforts to accommodate small archives that lack technical support, OLAC's content and influence is likely to grow.
The OLAC Repositories standard has been promoted to `adopted' status by the OLAC Council following a 9 month period of experimentation by OLAC implementers. This document defines the standards OLAC archives must follow in implementing a metadata repository. Originally called the OLAC Protocol for Metadata Harvesting, this document was first drafted in December 2001. A year later it was broadened to cover the new static repository model and given a more general title. Since then there have been two further revisions in response to consultation with the community. After a further round of revision in consultation with the OLAC Council, the document is now adopted as the second OLAC standard.
The OLAC Council has now been ratified by the Advisory Board. The council members are: Anthony Aristar (LINGUIST), Chris Cieri (LDC), Gary Holton (ANLC), Chu-Ren Huang (Academia Sinica), Heidi Johnson (AILLA), Laurent Romary (ATILF), Joan Spanne (SIL) and Martin Wynne (OTA). These people have experiential knowledge of OLAC and will make decisions about OLAC standards, best practices and repositories as described in the document process and registration process.
The OLAC Process document has been promoted to `adopted' status by the OLAC Advisory Board. The process document summarizes the governing ideas of OLAC and describes how OLAC is organized and how it operates, including the document process and working group process. First drafted in May 2001, the document has been revised in consultation with the community over the past two years, as it has progressed through `draft', `proposed', and `candidate' status. After one more round of revision in consultation with the Advisory Board, the document is now adopted as the first OLAC standard.
The OLAC Working Group on Outreach will raise awareness of the activities and resources of OLAC by facilating the production of general-audience documents describing various aspects of OLAC and by contacting individuals and organizations who manage archives but are not yet part of OLAC.
New OLAC infrastructure is now in place, including a new database schema for the OLAC 1.0 metadata format, a static repository gateway based on the new OAI standard, fully updated repository validation and registration software, and a new open source OLAC software suite released on sourceforge, including harvester and aggregator and an OLAC-DC crosswalk. This infrastructure was developed by Haejoong Lee, Gary Simons and Steven Bird with sponsorship from the US National Science Foundation.
On March 20, 2003, BBC News published an article called Digital race to save languages, which talks about the language archives community "fighting against time to save decades of data on the world's endangered languages from ending on the digital scrap heap." The article is based on interviews with Steven Bird and Peter Austin.
From 10-12 December there was an OLAC workshop in Philadelphia which revised the OLAC standards and controlled vocabularies, reviewed OLAC archives and services, and considered proposals for new activities.
On November 4, 2002, Wired News published an article called Word Up: Keeping Languages Alive which discusses OLAC in connection with the Rosetta Project, one of our member archives. In the article, Gary Simons is quoted as saying: "The computing and recording technologies that are now standard tools in doing field linguistics are changing so quickly that information captured electronically today could cease to be accessible in another decade or two if special care is not taken to ensure that it is archived in stable formats by stable institutions." (Developing recommendations in this area will be a key focus of OLAC in 2003 - Steven Bird)
The August 2002 issue of Scientific American has an article called Saving Dying Languages which includes a discussion of OLAC.
The OLAC Working Group on Linguistic Types will create the OLAC-Linguistic-Type vocabulary that describes the nature or genre of the content of a language resource from a linguistic standpoint.
The OLAC Working Group on Language Codes will create OLAC standards concerning language code vocabularies and their management. In scope are all human languages, living, recently extinct, ancient and constructed (including proto and artificial languages). To learn more, and to join the group, please see the group's web page.
In the lead-up to the European Launch, several more archives have recently joined OLAC. These include: Analyse et Traitement Informatique de la Langue Franšaise, Survey of California and Other Indian Languages, A Multimodal Database of Communicative Interaction (includes the CHILDES database), Academia Sinica, the Rosetta Project 1000 Language Archive, and the SIL Language and Culture Archive.
OLAC was launched at the 3rd Language Resources and Evaluation Conference, in the Alfredo Kraus Auditorium, in Las Palmas, Spain, 29 May 2002 (14:40-16:40). There were presentations by Gary Simons, Helen Aristar-Dry, Hans Uszkoreit, Martin Wynne, Laurent Romary, Steven Bird and Nicholas Ostler.
OLAC was launched at the 76th Annual Meeting of the Linguistic Society of America, in the San Francisco Hyatt Regency, 3-6 January 2002. The event included presentations from Gary Simons, Helen Dry, Megan Crowhurst, Chu-Ren Huang, Mark Liberman, Gary Holton and Steven Bird.
The launch marks the freezing of the OLAC metadata set for a one year period to encourage widespread adoption.
LINGUIST, the home of linguistics on the internet, has launched the primary OLAC service provider.
The Ethnologue is a database of linguistic, demographic and geographical information for over 7,000 living and recently extinct languages. Gary Simons has created an OLAC interface for the Ethnologue, permitting the database to be accessed via the OLAC cross-archive search engine. Enter any language name in the search field in the banner of this page, and view hits from the Ethnologue amongst the results.
The OLAC Protocol for Metadata Harvesting is the standard that defines how OLAC service providers harvest metadata from OLAC data providers. A draft has been posted for comment.
OLAC has developed an experimental cross-archive search engine. It now harvests 18,000 records from 13 OLAC archives: LDC, ELRA, DFKI, TDProject, Perseus, ANLC, APS, LACITO, CBOLD, AISRI, TRACTOR, OTA and Ethnologue. The search engine may be accessed via the banner on this page. Users may query it by entering language names and/or linguistic resource types. A fielded search function is also available.
Because OLAC archives are also members of the Open Archives Initiative, queries on OAI service providers return hits from OLAC archives. Users can test this out by visiting the ARC cross-archive searching service and entering the query term "lexicon". OLAC is developing a new cross-archive searching service based on ARC.
Another feature that OLAC archives inherit from the OAI is a gateway for web crawlers, permitting OLAC records to be discovered using conventional search engines.
OLAC presented in Bulgaria [11/01]Martin Wynne (Oxford Text Archive) presented OLAC at the 6th TELRI Seminar (Bansko, November 2001).
OLAC presented in Japan [11/01]OLAC was presented by Chu-Ren Huang (Academia Sinica, Taiwan) in a workshop at the 6th Natural Language Processing Pacific Rim Symposium (Tokyo, November 2001).
OLAC announced in D-Lib Magazine [10/01]A short piece on OLAC appeared in the October issue of D-Lib magazine.
OLAC Metadata Set released for comment [10/01]The OLAC Metadata Set defines a series of qualifications to the Dublin Core Element Set, tailored to language resources. A new version of the metadata set draft has been posted (2001-10-22), along with an updated version of the XML schema (version 0.4). Feedback is welcomed.
NSF Funds Digital Archive for Endangered Languages [7/01]Anthony Aristar at Wayne State University and colleagues at Eastern Michigan University, the University of Pennsylvania, and the University of Arizona, have been awarded a $2 million NSF grant to develop a public digital archive of endangered language data. The archive will employ OLAC metadata.
OLAC Process document released for comment [5/01]This document summarizes the governing ideas of OLAC and describes how OLAC is organized and how it operates. Comments are invited from the community.