OLAC Record oai:www.ldc.upenn.edu:LDC2003T16 |
Metadata | ||
Title: | SummBank 1.0 | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Radev, Dragomir, et al. SummBank 1.0 LDC2003T16. Web Download. Philadelphia: Linguistic Data Consortium, 2003 | |
Contributor: | Radev, Dragomir | |
Teufel, Simone | ||
Saggion, Horacio | ||
Lam, Wai | ||
Blitzer, John | ||
Celebi, Arda | ||
Drabek, Elliott | ||
Liu, Danyu | ||
Qi, Hong | ||
Allison, Tim | ||
Date (W3CDTF): | 2003 | |
Date Issued (W3CDTF): | 2003-12-18 | |
Description: | *Introduction* SummBank 1.0 contains the data created for the Summer 2001 Johns Hopkins University Workshop which focused on text summarization in a cross-lingual information retrieval framework. The goal was to gather a corpus of original documents and summaries for use as gold standards by the documents summarization community. The source of the data consists of 18,147 aligned bilingual (Cantonese and English) article pairs from the Information Services Department of the Hong-Kong Special Administrative Region of the People's Republic of China, which were published by the LDC in 2000 as Hong Kong News Parallel Text. *Data* This release contains 40 news clusters in English and Chinese, 360 multi-document, human-written non-extractive summaries, and nearly two million single document and multi-document extracts created by automatic and manual methods. MEAD was the summarizer that was reimplemented and upgraded during the workshop; versions of the software are available from the MEAD website. This distribution includes roughly two million text files, totalling approximately 13GB uncompressed. The text files are encoded either as utf-8 for English or GB or Big-5 for Chinese. *Updates* Additional information, updates, bug fixes may be available on the SummBank website. | |
Extent: | Corpus size: 17825792 KB | |
Identifier: | LDC2003T16 | |
https://catalog.ldc.upenn.edu/LDC2003T16 | ||
ISBN: 1-58563-274-0 | ||
ISLRN: 352-475-235-734-5 | ||
DOI: 10.35111/7v71-fh28 | ||
Language: | Yue Chinese | |
English | ||
Language (ISO639): | yue | |
eng | ||
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2003T16 | |
Rights Holder: | Portions © 1997-2000 The Government of the Hong Kong Special Administrative Region (HKSAR), © 2000, 2003 Trustees of the University of Pennsylvania | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2003T16 | |
DateStamp: | 2020-11-30 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Radev, Dragomir; Teufel, Simone; Saggion, Horacio; Lam, Wai; Blitzer, John; Celebi, Arda; Drabek, Elliott; Liu, Danyu; Qi, Hong; Allison, Tim. 2003. Linguistic Data Consortium. | |
Terms: | area_Asia area_Europe country_CN country_GB dcmi_Text iso639_eng iso639_yue olac_primary_text |