OLAC Record: Multilingual ATIS

OLAC Record
oai:www.ldc.upenn.edu:LDC2019T04

Metadata

Title: Multilingual ATIS

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Upadhyay, Shyam, et al. Multilingual ATIS LDC2019T04. Web Download. Philadelphia: Linguistic Data Consortium, 2019

Contributor: Upadhyay, Shyam

Hakkani-Tur, Dilek

Tur, Gokhan

Rastogi, Abhinav

Date (W3CDTF): 2019

Date Issued (W3CDTF): 2019-02-15

Description: *Introduction* Multilingual ATIS was developed by Google Inc. and consists of 5,871 utterances from ATIS2 (LDC93S5), ATIS3 Training Data (LDC94S19), and ATIS3 Test Data (LDC95S26) annotated and translated into Hindi and Turkish. The ATIS (Air Travel Information Services) collection was developed to support the research and development of speech understanding systems. Participants were presented with various hypothetical travel planning scenarios and asked to solve them by interacting with partially or completely automated ATIS systems. The resulting utterances were recorded and transcribed. Data was collected in the early 1990s at five US sites: Raytheon BBN, Carnegie Mellon University, MIT Laboratory for Computer Science, National Institute for Standards and Technology and SRI International. *Data* The data in this release is separated into training and test sets following the original ATIS division. The training set contains 4978 utterances selected from the Class A (context independent) training data in the ATIS2 and ATIS3 corpora. The test set contains 893 utterances from the November 1993 and December 1994 data sets in ATIS3. The original English utterances were manually translated into Hindi and Turkish. This release also includes the original English utterance and the machine translation back into English of the manual target language utterance translation. Each utterance is annotated with named entities via table lookup; markers include city, airline, airport names, and dates. All data is stored in UTF-8 encoded tab separated value files. *Samples* Please view this sample. *Updates* None at this time.

Extent: Corpus size: 3688 KB

Identifier: LDC2019T04

https://catalog.ldc.upenn.edu/LDC2019T04

ISBN: 1-58563-874-9

ISLRN: 470-988-441-460-3

DOI: 10.35111/ac9t-pw34

Language: English

Hindi

Turkish

Language (ISO639): eng

hin

tur

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2019T04

Rights Holder: Portions © 2019 Google Inc., © 1993-1995, 2019 Trustees of the University of Pennsylvania

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2019T04

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Upadhyay, Shyam; Hakkani-Tur, Dilek; Tur, Gokhan; Rastogi, Abhinav. 2019. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_GB country_IN country_TR dcmi_Text iso639_eng iso639_hin iso639_tur olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2019T04
Up-to-date as of: Wed Oct 29 7:01:52 EDT 2025

Metadata
Title:		Multilingual ATIS
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Upadhyay, Shyam, et al. Multilingual ATIS LDC2019T04. Web Download. Philadelphia: Linguistic Data Consortium, 2019
Contributor:		Upadhyay, Shyam
		Hakkani-Tur, Dilek
		Tur, Gokhan
		Rastogi, Abhinav
Date (W3CDTF):		2019
Date Issued (W3CDTF):		2019-02-15
Description:		Introduction Multilingual ATIS was developed by Google Inc. and consists of 5,871 utterances from ATIS2 (LDC93S5), ATIS3 Training Data (LDC94S19), and ATIS3 Test Data (LDC95S26) annotated and translated into Hindi and Turkish. The ATIS (Air Travel Information Services) collection was developed to support the research and development of speech understanding systems. Participants were presented with various hypothetical travel planning scenarios and asked to solve them by interacting with partially or completely automated ATIS systems. The resulting utterances were recorded and transcribed. Data was collected in the early 1990s at five US sites: Raytheon BBN, Carnegie Mellon University, MIT Laboratory for Computer Science, National Institute for Standards and Technology and SRI International. Data The data in this release is separated into training and test sets following the original ATIS division. The training set contains 4978 utterances selected from the Class A (context independent) training data in the ATIS2 and ATIS3 corpora. The test set contains 893 utterances from the November 1993 and December 1994 data sets in ATIS3. The original English utterances were manually translated into Hindi and Turkish. This release also includes the original English utterance and the machine translation back into English of the manual target language utterance translation. Each utterance is annotated with named entities via table lookup; markers include city, airline, airport names, and dates. All data is stored in UTF-8 encoded tab separated value files. Samples Please view this sample. Updates None at this time.
Extent:		Corpus size: 3688 KB
Identifier:		LDC2019T04
		https://catalog.ldc.upenn.edu/LDC2019T04
		ISBN: 1-58563-874-9
		ISLRN: 470-988-441-460-3
		DOI: 10.35111/ac9t-pw34
Language:		English
		Hindi
		Turkish
Language (ISO639):		eng
		hin
		tur
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2019T04
Rights Holder:		Portions © 2019 Google Inc., © 1993-1995, 2019 Trustees of the University of Pennsylvania
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2019T04
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Upadhyay, Shyam; Hakkani-Tur, Dilek; Tur, Gokhan; Rastogi, Abhinav. 2019. Linguistic Data Consortium.
Terms:		area_Asia area_Europe country_GB country_IN country_TR dcmi_Text iso639_eng iso639_hin iso639_tur olac_primary_text