OLAC Record
oai:www.ldc.upenn.edu:LDC2021T04

Metadata
Title:ATIS - Seven Languages
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Mansour, Saab, and Batool Haider. ATIS - Seven Languages LDC2021T04. Web Download. Philadelphia: Linguistic Data Consortium, 2021
Contributor:Mansour, Saab
Haider, Batool
Date (W3CDTF):2021
Date Issued (W3CDTF):2021-01-15
Description:*Introduction* ATIS - Seven Languages was developed by Amazon Web Services, Inc. and consists of 5,871 English utterances from ATIS (Air Travel Information Services) corpora, specifically ATIS2 (LDC93S5), ATIS3 Training Data (LDC94S19), and ATIS3 Test Data (LDC95S26), translated into six languages: Spanish, German, French, Portuguese, Chinese, and Japanese. The ATIS collection was developed to support the research and development of speech understanding systems. Participants were presented with various hypothetical travel planning scenarios and asked to solve them by interacting with partially or completely automated ATIS systems. The resulting utterances were recorded and transcribed. Data was collected in the early 1990s at five US sites: Raytheon BBN, Carnegie Mellon University, MIT Laboratory of Computer Science, National Institute for Standards and Technology, and SRI International. *Data* The data is separated into 4,978 utterances for training and 893 utterances for testing following the original ATIS division. The training set contains 4,978 utterances selected from the Class A (context independent) training data in the ATIS2 and ATIS3 corpora. The test set contains 893 utterances from the November 1993 and December 1994 data sets in ATIS3. The original English utterances were manually translated into the six languages. This release also includes the original English utterance. Each utterance is annotated with named entities via table lookup; markers include city, airline, airport names and dates. Data is stored in UTF-8 encoded tab separated value files. *Samples* Please view the following samples: * English Source (TXT) * Japanese Translation (TXT) * French Translation (TXT) *Updates* As of February 15, 2024, an additional set of localized translation files was added for all languages. In these files, the entities were localized to values related to the countries in which the language is spoken rather than simply translated or transliterated (e.g. for German “New York”, would be mapped to “New York” in the test_DE.tsv test set but “München” in the test_DE_loc.tsv test set). They are UTF-8 encoded tab separated value files.
Extent:Corpus size: 3370 KB
Identifier:LDC2021T04
https://catalog.ldc.upenn.edu/LDC2021T04
ISBN: 1-58563-954-0
ISLRN: 713-838-074-718-6
DOI: 10.35111/g9h5-0p74
Language:English
Spanish
German
French
Portuguese
Japanese
Chinese
Language (ISO639):eng
spa
deu
fra
por
jpn
zho
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2021T04
Rights Holder:Portions © 2021 Amazon Web Services, Inc., © 1993-1995, 2019, 2021 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2021T04
DateStamp:  2024-02-15
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Mansour, Saab; Haider, Batool. 2021. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_DE country_ES country_FR country_GB country_JP country_PT dcmi_Text iso639_deu iso639_eng iso639_fra iso639_jpn iso639_por iso639_spa iso639_zho olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2021T04
Up-to-date as of: Mon Mar 25 7:21:13 EDT 2024