OLAC Record
oai:catalogue.elra.info:ELRA-W0040

Metadata
Title:Venice Italian Treebank (VIT)
Abstract:The VIT, Venice Italian Treebank contains about 272,000 words distributed over six different domains: bureaucratic, political, economic and financial, literary, scientific, and news. In addition, some 60,000 tokens of spoken dialogues in different Italian varieties were annotated. The annotation follows general X-bar criteria with 29 constituency labels and 102 PoS tags. VIT is also made available in a broad annotation version with 10 constituency labels and 22 PoS tags for machine learning purposes. The format is plain text with square bracketing. However, a UPenn style version which is readable by the open source query language CorpusSearch is also provided.
Access Rights:Rights available for: Research Use, Commercial Use
Date Available (W3CDTF):2006-02-13
Date Created (W3CDTF):2005-06-01
Date Issued (W3CDTF):2006-02-10
Date Modified (W3CDTF):2014-10-23
Description:Written Corpora
The VIT, Venice Italian Treebank is the effort of the collaboration of people working at the Laboratory of Computational Linguistics of the University of Venice in the years 1995-2005. It is partly the result of annotation carried out internally with no specific project in mind and no financial support. This work was partly related to the development of a lexicon, a morphological analyzer, a tagger, a deep parser of Italian. All these resources were finally ready at the beginning of the ?90s when the LCL got involved in the first national projects. The VIT contains about 272,000 words distributed over six different domains, and this is what makes it so relevant for the study of the structure of Italian language. The following domains were annotated: Domain Number of words Time span Bureaucratic 20,000 1986 Politics 40,000 1984 Economic & financial 12,000 1987 Literary 10,000 1984 Scientific 20,000 1985 News 170,000 1994 In addition, some 60,000 tokens of spoken dialogues in different Italian varieties were annotated. The annotation follows general X-bar criteria with 29 constituency labels and 102 PoS tags. VIT is also made available in a broad annotation version with 10 constituency labels and 22 PoS tags for machine learning purposes. The format is plain text with square bracketing. However, a UPenn style version which is readable by the open source query language CorpusSearch is also provided.
Identifier:ELRA-W0040
http://catalog.elra.info/product_info.php?products_id=831
Language:Italian
Language (ISO639):ita
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0040
DateStamp:  2006-02-13
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2006. ELRA (European Language Resources Association).
Terms: area_Europe country_IT dcmi_Text iso639_ita olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0040
Up-to-date as of: Mon Feb 27 0:30:50 EST 2017