OLAC Record: NPChunks

OLAC Record
oai:catalogue.elra.info:ELRA-W0089

Metadata

Title: NPChunks

Access Rights: Rights available for: nonCommercialUse, commercialUse

Date Available (W3CDTF): 2016-01-20

Date Issued (W3CDTF): 2016-01-20

Date Modified (W3CDTF): 2016-01-20

Description: NPChunks is a training corpus containing approximately 1,000 sentences, with a total of 24,243 tokens, selected randomly from the written part of the CINTIL corpus. For more information on the CINTIL corpus, see ELRA-W0050, ISLRN: 176-775-844-396-0.The corpus is PoS-annotated at token level, including punctuation. Noun Phrases were recognized and annotated with specific tags. It was automatically PoS-tagged with MBT tagger (http://ilk.uvt.nl/mbt/), and lemmatized with MBLEM (http://ilk.uvt.nl/mbma/), following the annotation scheme of the Corpus of Reference of Contemporary Portuguese. YamCha software (http://chasen.org/~taku/software/yamcha/) was used to recognize chunks that consist of Noun Phrases and to identify the elements appearing at the beginning, in the middle and at the end of a noun phrase.

Identifier: ELRA-W0089

ISLRN: 412-883-442-173-8

Identifier (URI): https://catalog.elra.info/en-us/repository/browse/ELRA-W0089/

Language: Portuguese

Language (ISO639): por

Medium: downloadable

Publisher: ELRA (European Language Resources Association)

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: ELRA Catalogue of Language Resources

Description: http://www.language-archives.org/archive/catalogue.elra.info

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:catalogue.elra.info:ELRA-W0089

DateStamp: 2016-01-20

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: n.a. 2016. ELRA (European Language Resources Association).
Terms: area_Europe country_PT dcmi_Text iso639_por olac_primary_text

http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0089
Up-to-date as of: Wed Jul 15 7:05:09 EDT 2026

Metadata
Title:		NPChunks
Access Rights:		Rights available for: nonCommercialUse, commercialUse
Date Available (W3CDTF):		2016-01-20
Date Issued (W3CDTF):		2016-01-20
Date Modified (W3CDTF):		2016-01-20
Description:		NPChunks is a training corpus containing approximately 1,000 sentences, with a total of 24,243 tokens, selected randomly from the written part of the CINTIL corpus. For more information on the CINTIL corpus, see ELRA-W0050, ISLRN: 176-775-844-396-0.The corpus is PoS-annotated at token level, including punctuation. Noun Phrases were recognized and annotated with specific tags. It was automatically PoS-tagged with MBT tagger (http://ilk.uvt.nl/mbt/), and lemmatized with MBLEM (http://ilk.uvt.nl/mbma/), following the annotation scheme of the Corpus of Reference of Contemporary Portuguese. YamCha software (http://chasen.org/~taku/software/yamcha/) was used to recognize chunks that consist of Noun Phrases and to identify the elements appearing at the beginning, in the middle and at the end of a noun phrase.
Identifier:		ELRA-W0089
Identifier:		ISLRN: 412-883-442-173-8
Identifier (URI):		https://catalog.elra.info/en-us/repository/browse/ELRA-W0089/
Language:		Portuguese
Language (ISO639):		por
Medium:		downloadable
Publisher:		ELRA (European Language Resources Association)
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		ELRA Catalogue of Language Resources
Description:		http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:catalogue.elra.info:ELRA-W0089
DateStamp:		2016-01-20
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		n.a. 2016. ELRA (European Language Resources Association).
Terms:		area_Europe country_PT dcmi_Text iso639_por olac_primary_text