OLAC Record
oai:catalogue.elra.info:ELRA-W0019

Metadata
Title:Dutch PAROLE Distributable Corpus
Abstract:This Dutch corpus is a 3 million words selection built according to the specifications of the PAROLE project. Over 250,000 words of corpus texts (with TEI markup suppressed) have been PoS-tagged automatically. A total of 59,798 running words has been manually corrected and checked
Access Rights:Rights available for: Research Use, Commercial Use
Date Available (W3CDTF):1999-07-12
Date Issued (W3CDTF):2004-09-14
Date Modified (W3CDTF):2004-05-12
Description:Written Corpora
The Dutch PAROLE Distributable Corpus is a 3 million words selection from the 20 million words Dutch PAROLE Reference corpus. The Dutch corpus annotation and checking was made accordingly to the common core PAROLE tagset. The Dutch data were also checked for type. The Dutch PAROLE Distributable Corpus contains the following texts: BOOKS: Van Sterkenburg: Wdlijst tot wdboek, 1984, 65,344 words Taal vt Journaal, 1989, 56,215 words WNT-portret, 1992, 60,133 words NEWSPAPERS Short Newspaper texts: MN_Collection, 1986-1988, 19,537 words CVNP(S)-Collection, 1983-1990, 179,220 words PERIODICAL: Short texts from - Local Papers, 1985-1988, 47,019 words - Magazines, 1985-1989, 164,589 words MISCELLANEOUS: Texts to be read out in TV-news broadcasts for: - General audience, 1992-1995, 1,285,824 words - Youth, 1991-1995, 1,008,658 words Short texts from Ephemera, 1985-1986, 131,692 words TOTAL: 3,018,231 words Over 250,000 words of corpus texts have been PoS-tagged automatically. A total of 59,798 running words has been manually corrected and checked at least two times with respect to maximal granularity, according to a lexicographer's manual. The extra 9,000 words over the required 50,000 words compensate for the occurrence of ca. 5,300 "keywords" in the original texts. The fully corrected material has been subjected to an automated post-control operation, checking the pertinence relations between the various feature values, and instantiating default values in case a mismatch (indicating a correction error) was found. Ca. 200,000 words have been checked once for PoS and type. In addition to the required PoS, type was checked for reasons of quality. This material has been subjected to an automated correction procedure addressing the feature slots (positions) beyond the first two for PoS and type so as to solve discrepancies between the manually corrected PoS and type, and the possibly erroneous, automatically assigned values of the remaining slots. More info on the Parole project: http://www.elda.org/catalogue/fr/text/doc/parole.html
Identifier:ELRA-W0019
http://catalog.elra.info/product_info.php?products_id=543
Language:Dutch, Flemish
Language (ISO639):nld
Medium:CD-ROM
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0019
DateStamp:  1999-07-12
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2004. ELRA (European Language Resources Association).
Terms: area_Europe country_NL dcmi_Text iso639_nld olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0019
Up-to-date as of: Fri Jun 23 1:04:09 EDT 2017