OLAC Record
oai:catalogue.elra.info:ELRA-W0073

Metadata
Title:Quaero Old Press Extended Named Entity corpus
Abstract:This corpus consists of the manual annotation of 76 newspaper issues published in 1890-1891 and provided by the French National Library (Biblioth?que Nationale de France). Three different titles are used (Le Temps, La Croix and Le Figaro) for a total of 295 pages. The corpus is fully manually annotated according to the Quaero extended and structured named entity definition.
Access Rights:Rights available for: Research Use, Commercial Use
Date Available (W3CDTF):2013-02-13
Date Issued (W3CDTF):2013-02-07
Date Modified (W3CDTF):2013-02-13
Description:Written Corpora
The Quaero Old Press Extended Named Entity corpus consists of the manual annotation of 76 newspaper issues published in 1890-1891 and provided by the French National Library (Biblioth?que Nationale de France). Three different titles are used (Le Temps, La Croix and Le Figaro) for a total of 295 pages. The corpus is fully manually annotated according to the Quaero extended and structured named entity definition, which differentiates entity "types" and "components". The training part of the corpus is composed of 231 pages and contains 1,297,742 words, 114,599 types and 136,113 components. The test corpus is composed of 64 pages and contains 363,455 words, 33,083 types and 40,432 components. The Quaero Old Press Extended Named Entity Corpus consists of: - 76 newspaper issues published in 1890-1891 and provided by the French National Library (Biblioth\`eque Nationale de France) (images and OCR output), - 295 extracted pages in text format along with the corresponding images, - the fully annotated txt corpus amounts to about 1,3 million words, - a sub-corpus serving as a mini-reference corpus for quality evaluation purposes, - tools developed for the extraction of text and images, for annotation and for evaluation, - guidelines.
Identifier:ELRA-W0073
http://catalog.elra.info/product_info.php?products_id=1194
Language:French
Language (ISO639):fra
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0073
DateStamp:  2013-02-13
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2013. ELRA (European Language Resources Association).
Terms: area_Europe country_FR dcmi_Text iso639_fra olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0073
Up-to-date as of: Sun Nov 12 1:45:31 EST 2017