OLAC News Archive

OLAC Presents at NSF Workshop on Documenting Endangered Languages: [10/07]

In October 2007, the US National Science Foundation put on a workshop to assess the state of the art in documenting endangered languages and to plot directions for the future. There were 25 invited participants representing funding agencies, data providers, tool providers, and archives. OLAC was invited to present about its contribution of developing an infrastructure for indexing endangered language documentation. The report shows that OLAC indexes resources from over 3,100 of the world's language, including 2,186 living languages that are known to have fewer than 100,000 speakers.

OLAC receives new NSF Sponsorship: [8/07]

The US National Science Foundation has funded a project OLAC: Accessing the World's Language Resources which aims to greatly improve access to language resources for linguists and the broader communities of interest, by achieving an order-of-magnitude increase in the coverage of the OLAC catalog and in the use of OLAC search services. The project will do so through two main areas of activity: developing guidelines and services that encourage language archives to follow best common practices that will facilitate language resource discovery through OLAC, and developing services to bridge from the resource catalogs of the library and web domains to the OLAC catalog.

OLAC Search Engine Handles 2000 Queries Per Day: [4/06]

In 2005, the OLAC Search Engine handled 824,676 queries, an average of 2259 per day or an average 68273 per month. The most popular languages searched for in 2005 were Dutch, English, Quechua, Arabic, Greek, German, Chinese, and Malay. Only 35% of queries specified a particular archive, the majority were generic searches across all archives. The most commonly searched repository was SIL-LCA, followed by PARADISEC and SCOIL.

http://www.language-archives.org/tools/search

SIL Language and Culture Archives

Pacific And Regional Archive for Digital Sources in Endangered Cultures

Survey for California and Other Indian Languages

EMELD Workshop on Digital Language Documentation: [3/06]

The 2006 E-MELD workshop will focus on 'Tools and Standards: the State of the Art.' This annual workshop marks the culmination of the 5-year E-MELD project; one goal of the workshop is to review digital standards ratified by the community in prior workshops on text, lexicons, databases, and annotation. The workshop will be held in July, in conjunction with the LSA Summer Meeting at Michigan State University.

Workshop website

EMELD: Electronic Metastructure for Endangered Languages Data

LSA Summer Meeting

OLAC Tutorial at the LSA Annual Meeting [1/06]

A tutorial on OLAC was held at the Annual Meeting of the Linguistic Society of America in January. The focus of the presentations was on audio and video recording. The event was officially sponsored by the LSA's Committee for Endangered Languages and their Preservation.

Workshop website

LSA Annual Meeting, Albuquerque, January 2006.

LSA Committee for Endangered Languages and their Preservation

New OLAC Repositories in 2005 [1/06]

Four repositories joined OLAC in 2005: the Audio Archive of Linguistic Fieldwork at the Berkeley Language Center, UC Berkeley, USA; the Comparative Corpus of Spoken Portuguese at IEL Unicamp, Campinas, Brazil; ODIN - The Online Database of Interlinear Text at California State University, Fresno, USA; and Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC), at the Universities of Melbourne, Sydney, New England, and Australian National University.

Full list of OLAC Archives

EMELD Workshop on Digital Language Documentation: [4/05]

The 2005 E-MELD workshop focussed on linguistic ontologies and data categories as aids in linguistic annotation and as tools for the fine-grained search and retrieval of language documentation. It will be held in July, in conjunction with the LSA Institute at MIT.

Workshop website

EMELD: Electronic Metastructure for Endangered Languages Data

LSA Institute

LSA Tutorial on Archiving and Linguistic Resources: [1/05]

This tutorial provided a forum where people who are compiling documentary linguistic resources could learn about current best practices for creating and conserving those resources. The tutorial was organized by Jeff Good (MPI Leipzig) and Heidi Johnson (University of Texas, Austin and AILLA) and held at the annual meeting of the Linguistic Society of America, in Oakland, California, in January 2005.

Tutorial abstracts and slides

OLAC at the LSA: [10/04]

OLAC will be organizing a booth in the publisher's exhibit hall at the Linguistic Society of America meeting in San Francisco in January. OLAC search services will be demonstrated, and OLAC archives are invited to send someone to help staff the booth, to give live demonstrations of their web interfaces and to hand out flyers for their projects. The booth and web access are expensive and we welcome any offers of sponsorship. If you would like to be involved, please contact Heidi Johnson.

LSA Conference website

LDC Hosts OLAC Search Interface: [7/04]

The Linguistic Data Consortium at the University of Pennsylvania now hosts a powerful OLAC search interface. Features include result summaries by archive, result ranking, approximate language name matching, and country-based searches. (The service was developed by Amol Kamat, Baden Hughes, and Steven Bird at the University of Melbourne, with sponsorship from the Department of Computer Science and Software Engineering and the Linguistic Data Consortium.)

OLAC Search Interface

Archive Report Cards: [7/04]

The archive report cards, added to the OLAC site in March, give summary statistics for each repository and an assessment of the quality of the repository's metadata. The assessment is based on OLAC and Dublin Core guidelines. An updated version of the system is now available, including a revised evaluation algorithm to account for changes in DC recommendations, and revised labelling of the reports for consistency with OLAC terminology. The evaluation metric rewards the use of OLAC extensions (controlled vocabularies), and what we consider to be the most important DC elements: title, date, subject, description, and identifier. The report cards can be accessed by clicking the "REPORT CARD" links on the OLAC Archives page. (The service was developed by Amol Kamat, Baden Hughes, and Steven Bird at the University of Melbourne, with sponsorship from the Department of Computer Science and Software Engineering, and the Linguistic Data Consortium.)

OLAC Archives Page (see "REPORT CARD" links)

Report for full set of repositories

Documentation on report cards

OLAC Metadata standard adopted: [3/04]

The OLAC Metadata standard has been promoted to `adopted' status by the OLAC Council following a 12 month period of experimentation by OLAC implementers. This document defines the format used for the interchange of metadata within the framework of the Open Archives Initiative. The metadata set is based on Qualified Dublin Core, but the format allows for the use of extensions to express community-specific qualifiers.

OLAC Metadata standard

OLAC Archive on board European Space Agency mission [3/04]

On March 2, the Rosetta Disk left Earth on board an Ariane-5 rocket from the European Spaceport in Kourou, French Guyana. The mission's target is the comet Churyumov-Gerasimenko, which will be reached in 2014 after a "billiard ball" journey through the Solar System lasting more than ten years. The Rosetta Disk is a modern version of the Rosetta Stone. The 2-inch nickel disk is micro-etched with 30,000 pages of information covering over 1,000 languages. For each language there is a simple dictionary, a guide to pronunciation and counting, and a traditional story with translation. Additionally, to help language decipherment in remote futures, a translation of a common text (the first three chapters of the book of Genesis) is provided in all languages. The disk can be read with the aid of an optical microscope. The materials on the disk come from the Rosetta 1000 Language Archive, an OLAC repository.

Rosetta 1000 Langauge Archive

European Space Agency Rosetta Mission

LanguageLog: Offsite backup for world's languages

OLAC identified as "exemplary" in DLF report: [1/04]

In a recent Survey of Digital Library Aggregation Services, published by the Digital Library Federation, Martha Brogan praised the Open Language Archives Community as exemplary. She concluded her discussion with the following statement:

OLAC is exemplary in several ways: the technical and social infrastructure that it has developed to support its community of contributors, based on shared principles and standards; the resources that it provides at its Web site about its purpose, scope, history, tools, news and events; and the efforts of its two leaders -- Gary Simons and Steven Bird [2003a, 2003b, 2003c] -- to articulate the challenges, analyze the options, and recommend possible solutions to their community of contributors in order to improve OLAC. With the formal appointment of an Outreach Working Group and its other efforts to accommodate small archives that lack technical support, OLAC's content and influence is likely to grow.

A Survey of Digital Library Aggregation Services

Digital Library Federation

Outreach Working Group

OLAC Repositories standard adopted: [9/03]

The OLAC Repositories standard has been promoted to `adopted' status by the OLAC Council following a 9 month period of experimentation by OLAC implementers. This document defines the standards OLAC archives must follow in implementing a metadata repository. Originally called the OLAC Protocol for Metadata Harvesting, this document was first drafted in December 2001. A year later it was broadened to cover the new static repository model and given a more general title. Since then there have been two further revisions in response to consultation with the community. After a further round of revision in consultation with the OLAC Council, the document is now adopted as the second OLAC standard.

OLAC Repositories standard

Static repositories

OLAC Council appointed: [8/03]

The OLAC Council has now been ratified by the Advisory Board. The council members are: Anthony Aristar (LINGUIST), Chris Cieri (LDC), Gary Holton (ANLC), Chu-Ren Huang (Academia Sinica), Heidi Johnson (AILLA), Laurent Romary (ATILF), Joan Spanne (SIL) and Martin Wynne (OTA). These people have experiential knowledge of OLAC and will make decisions about OLAC standards, best practices and repositories as described in the document process and registration process.

OLAC Council

OLAC Process document

OLAC Process document adopted: [7/03]

The OLAC Process document has been promoted to `adopted' status by the OLAC Advisory Board. The process document summarizes the governing ideas of OLAC and describes how OLAC is organized and how it operates, including the document process and working group process. First drafted in May 2001, the document has been revised in consultation with the community over the past two years, as it has progressed through `draft', `proposed', and `candidate' status. After one more round of revision in consultation with the Advisory Board, the document is now adopted as the first OLAC standard.

OLAC Process document

OLAC Advisory Board

OLAC Working Group on Outreach: Call for Participation [6/03]

The OLAC Working Group on Outreach will raise awareness of the activities and resources of OLAC by facilating the production of general-audience documents describing various aspects of OLAC and by contacting individuals and organizations who manage archives but are not yet part of OLAC.

OLAC Working Group on Outreach

New OLAC infrastructure [5/03]

New OLAC infrastructure is now in place, including a new database schema for the OLAC 1.0 metadata format, a static repository gateway based on the new OAI standard, fully updated repository validation and registration software, and a new open source OLAC software suite released on sourceforge, including harvester and aggregator and an OLAC-DC crosswalk. This infrastructure was developed by Haejoong Lee, Gary Simons and Steven Bird with sponsorship from the US National Science Foundation.

OLAC Tools page

Download OLAC suite from sourceforge

OLAC in BBC News [3/03]

On March 20, 2003, BBC News published an article called Digital race to save languages, which talks about the language archives community "fighting against time to save decades of data on the world's endangered languages from ending on the digital scrap heap." The article is based on interviews with Steven Bird and Peter Austin.

Digital race to save languages by Andy Webster.

OLAC Workshop [12/02]

From 10-12 December there was an OLAC workshop in Philadelphia which revised the OLAC standards and controlled vocabularies, reviewed OLAC archives and services, and considered proposals for new activities.

IRCS Workshop on Open Language Archives

OLAC in Wired News [11/02]

On November 4, 2002, Wired News published an article called Word Up: Keeping Languages Alive which discusses OLAC in connection with the Rosetta Project, one of our member archives. In the article, Gary Simons is quoted as saying: "The computing and recording technologies that are now standard tools in doing field linguistics are changing so quickly that information captured electronically today could cease to be accessible in another decade or two if special care is not taken to ensure that it is archived in stable formats by stable institutions." (Developing recommendations in this area will be a key focus of OLAC in 2003 - Steven Bird)

Word Up: Keeping Languages Alive by Kendra Mayfield

OLAC in Scientific American [8/02]

The August 2002 issue of Scientific American has an article called Saving Dying Languages which includes a discussion of OLAC.

Saving Dying Languages, by Wayt Gibbs

Scientific American, August 2002

OLAC Working Group on Linguistic Types: Call for Participation [7/02]

The OLAC Working Group on Linguistic Types will create the OLAC-Linguistic-Type vocabulary that describes the nature or genre of the content of a language resource from a linguistic standpoint.

OLAC Working Group on Linguistic Types

OLAC Language Codes Working Group: Call for Participation [5/02]

The OLAC Working Group on Language Codes will create OLAC standards concerning language code vocabularies and their management. In scope are all human languages, living, recently extinct, ancient and constructed (including proto and artificial languages). To learn more, and to join the group, please see the group's web page.

OLAC Language Codes Working Group

More Archives Join OLAC [5/02]

In the lead-up to the European Launch, several more archives have recently joined OLAC. These include: Analyse et Traitement Informatique de la Langue Fran�aise, Survey of California and Other Indian Languages, A Multimodal Database of Communicative Interaction (includes the CHILDES database), Academia Sinica, the Rosetta Project 1000 Language Archive, and the SIL Language and Culture Archive.

More information about participating archives

OLAC Launch in Europe [5/02]

OLAC was launched at the 3rd Language Resources and Evaluation Conference, in the Alfredo Kraus Auditorium, in Las Palmas, Spain, 29 May 2002 (14:40-16:40). There were presentations by Gary Simons, Helen Aristar-Dry, Hans Uszkoreit, Martin Wynne, Laurent Romary, Steven Bird and Nicholas Ostler.

Program, abstracts and presentations

OLAC Launch in North America [1/02]

OLAC was launched at the 76th Annual Meeting of the Linguistic Society of America, in the San Francisco Hyatt Regency, 3-6 January 2002. The event included presentations from Gary Simons, Helen Dry, Megan Crowhurst, Chu-Ren Huang, Mark Liberman, Gary Holton and Steven Bird.

Program, abstracts and presentations

The launch marks the freezing of the OLAC metadata set for a one year period to encourage widespread adoption.

LINGUIST announces OLAC service provider [12/01]

LINGUIST, the home of linguistics on the internet, has launched the primary OLAC service provider.

LINGUIST homepage

LINGUIST Service Provider

Announcement

SIL Ethnologue joins OLAC [12/01]

The Ethnologue is a database of linguistic, demographic and geographical information for over 7,000 living and recently extinct languages. Gary Simons has created an OLAC interface for the Ethnologue, permitting the database to be accessed via the OLAC cross-archive search engine. Enter any language name in the search field in the banner of this page, and view hits from the Ethnologue amongst the results.

www.ethnologue.com

Introduction to the Ethnologue

OLAC Protocol for Metadata Harvesting [12/01]

The OLAC Protocol for Metadata Harvesting is the standard that defines how OLAC service providers harvest metadata from OLAC data providers. A draft has been posted for comment.

OLAC Protocol for Metadata Harvesting

The announcement on OLAC-Implementers

OLAC Search Engine [12/01]

OLAC has developed an experimental cross-archive search engine. It now harvests 18,000 records from 13 OLAC archives: LDC, ELRA, DFKI, TDProject, Perseus, ANLC, APS, LACITO, CBOLD, AISRI, TRACTOR, OTA and Ethnologue. The search engine may be accessed via the banner on this page. Users may query it by entering language names and/or linguistic resource types. A fielded search function is also available.

Because OLAC archives are also members of the Open Archives Initiative, queries on OAI service providers return hits from OLAC archives. Users can test this out by visiting the ARC cross-archive searching service and entering the query term "lexicon". OLAC is developing a new cross-archive searching service based on ARC.

Another feature that OLAC archives inherit from the OAI is a gateway for web crawlers, permitting OLAC records to be discovered using conventional search engines.

Learn more about OLAC archives

The ARC Cross-Archive Searching Service

OAI Gateway Service for Web Crawlers

Open Archives Initiative

OLAC presented in Bulgaria [11/01]

Martin Wynne (Oxford Text Archive) presented OLAC at the 6th TELRI Seminar (Bansko, November 2001).

Abstract: Opening the Archives

Workshop website: 6th TELRI Seminar on Multilingual Corpus Research

TELRI: Trans-European Language Resources Infrastructure

OLAC presented in Japan [11/01]

OLAC was presented by Chu-Ren Huang (Academia Sinica, Taiwan) in a workshop at the 6th Natural Language Processing Pacific Rim Symposium (Tokyo, November 2001).

Proceedings paper: The Open Language Archives Community and Asian Language Resources

Conference website: Workshop on Language Resources in Asia

OLAC announced in D-Lib Magazine [10/01]

A short piece on OLAC appeared in the October issue of D-Lib magazine.

D-Lib announcement

D-Lib magazine

OLAC Metadata Set released for comment [10/01]

The OLAC Metadata Set defines a series of qualifications to the Dublin Core Element Set, tailored to language resources. A new version of the metadata set draft has been posted (2001-10-22), along with an updated version of the XML schema (version 0.4). Feedback is welcomed.

OLAC Metadata Set (2001-10-22)

The announcement about the metadata set on OLAC-General

The XML schema version 0.4

The announcement about the schema on OLAC-Implementers

NSF Funds Digital Archive for Endangered Languages [7/01]

Anthony Aristar at Wayne State University and colleagues at Eastern Michigan University, the University of Pennsylvania, and the University of Arizona, have been awarded a $2 million NSF grant to develop a public digital archive of endangered language data. The archive will employ OLAC metadata.

E-MELD: Electronic Metastructure for Endangered Languages Data

Linguist List

NSF press release

OLAC Process document released for comment [5/01]

This document summarizes the governing ideas of OLAC and describes how OLAC is organized and how it operates. Comments are invited from the community.

OLAC Process document

The call for review posted to OLAC-General