OLAC Record
oai:www.ldc.upenn.edu:LDC96T11

Metadata
Title:COMLEX Syntax Text Corpus Version 2.0
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Macleod, Catherine, Adam Meyers, and Ralph Grishman. COMLEX Syntax Text Corpus Version 2.0 LDC96T11. Web Download. Philadelphia: Linguistic Data Consortium, 1996
Contributor:Macleod, Catherine
Meyers, Adam
Grishman, Ralph
Date (W3CDTF):1996
Description:*Introduction* COMLEX Syntax Text Corpus Version 2.0 was developed by the Linguistic Data Consortium (LDC) and consists of approximately 30,000 newswire documents in English. The purpose of this corpus was to serve as the basis for a tagging task for the COMLEX English Syntax Lexicon (LDC98L21), tagging 750 of the most common verbs in the corpus with COMLEX complements. This task was somewhat different from the usual tagging of a corpus, in that the tags appear in the dictionary, not in the corpus. The tag in the dictionary entry consists of the byte number where the text example can be located in the corpus, the source, and the complement name. *Data* The corpus totals about 100 MB of text including parts of the Brown Corpus (7 MB), Wall Street Journal (27 MB), San Jose Mercury (30 MB), and Associated Press (29.5 MB). Much of the text contains SGML and other tags from their original sources. In addition to the file of text, the corpus also contains a TABLE file which lists the start, length, and ending bytes of each individual source document as well as for the sources overall (e.g. Wall Street Journal, San Jose Mercury, Brown Corpus). *Samples* For an example of the data in this corpus, please view this sample (TXT). *Updates* None at this time.
Identifier:LDC96T11
https://catalog.ldc.upenn.edu/LDC96T11
ISBN: 1-58563-148-5
ISLRN: 184-170-097-975-5
DOI: 10.35111/ryhw-kn17
Language:English
Language (ISO639):eng
License:COMLEX For-Profit Agreement: https://catalog.ldc.upenn.edu/license/comlex-for-profit-agreement.pdf
COMLEX Non-member Agreement: https://catalog.ldc.upenn.edu/license/comlex-non-member-agreement.pdf
COMLEX Non-Profit Agreement: https://catalog.ldc.upenn.edu/license/comlex-non-profit-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Rights Holder:Portions © 1987-1994 Dow Jones & Company, Inc., © 1991 San Jose Mercury News, © 1989-1991, 1994-1996 The Associated Press, © 1996 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC96T11
DateStamp:  2021-06-28
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Macleod, Catherine; Meyers, Adam; Grishman, Ralph. 1996. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC96T11
Up-to-date as of: Mon Mar 25 7:20:00 EDT 2024