OLAC Record
oai:catalogue.elra.info:ELRA-W0061

Metadata
Title:CINTIL-DependencyBank
Abstract:The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of sentences annotated with their syntactic dependency graphs and grammatical function tags composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 tokens) that are used for regression testing of the computational grammar that supported the annotation of the corpus.
Access Rights:Rights available for: Commercial Use, Research Use
Date Available (W3CDTF):2012-12-05
Date Issued (W3CDTF):2012-12-05
Date Modified (W3CDTF):2012-12-05
Description:Written Corpora
The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of sentences annotated with their syntactic dependency graphs and grammatical function tags composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 tokens) that are used for regression testing of the computational grammar that supported the annotation of the corpus. For the creation of this TreeBank we adopted a semi-automatic analysis with a double-blind annotation followed by adjudication. The resulting dataset contains two information levels: morpho-syntactic information and grammatical functions. The main motivation behind the creation of this resource was to build a high quality data set with syntactic information that could support the development of a large set of automatic resources and tools for Portuguese for language technology research and development. For more information see also: Silva, Jo?o, and Branco, Ant?nio, 2012, ?Deep, consistent and also useful: Extracting vistas from deep corpora for shallower tasks?, In Proceedings, Workshop on Advanced Treebanking, LREC2012 ? The 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, May 21-27, 2012, pp.45-52.
Identifier:ELRA-W0061
http://catalog.elra.info/product_info.php?products_id=1180
Language:Portuguese
Language (ISO639):por
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0061
DateStamp:  2012-12-05
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2012. ELRA (European Language Resources Association).
Terms: area_Europe country_PT dcmi_Text iso639_por olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0061
Up-to-date as of: Wed Mar 29 3:50:55 EDT 2017