OLAC Record
oai:catalogue.elra.info:ELRA-S0059

Metadata
Title:ILE: Italian LExicon
Abstract:ILE is a 588,000 entries Italian lexicon transcribed with SAMPA notation. The morpho-lexicon was obtained by processing an Italian dictionary, and adding by hand all possible inflections. The base lexicon is enriched with names and neologisms found in the 65,000 most frequent words of the newspaper "Il Sole 24 Ore", and the most frequent Italian proper names and surnames (from the telephone directory), geographical names, acronyms, company names, commonly used foreign words. A total of about 601,000 different transcriptions are provided for the 588,000 words lexicon.
Access Rights:Rights available for: Research Use, Commercial Use
Date Available (W3CDTF):1998-11-23
Date Issued (W3CDTF):2004-09-14
Date Modified (W3CDTF):2005-05-02
Description:Speech Related
ILE is a 588,000 entries Italian lexicon transcribed with SAMPA notation. It was generated, mainly for speech recognition purposes, by means of a morphological analyzer handling more than 100,000 morphemes, each of them transcribed and manually checked. Each stem was combined with all its possible suffixes to form valid words. Verbal forms do not include clitics.The morpho-lexicon was obtained by properly processing an Italian dictionary, and adding by hand all possible inflections. This base lexicon was then enriched with names and neologisms found in the 65,000 most frequent words of the newspaper "Il Sole 24 Ore". Also the most frequent Italian proper names and surnames (from the telephone directory), geographical names, acronyms, company names, commonly used foreign words were added to the lexicon.All words are transcribed using SAMPA units for the Italian language. In case of multiple pronunciations for a word, one row for each different transcription is provided (a total of about 601,000 different transcriptions are provided for the 588,000 words lexicon). Stressed vowels are marked with the ASCII character ". Also foreign words are transcribed using only SAMPA units for the Italian language, which leads to some awkward but effective transcription, at least for speech recognition purposes. Some samples of ILE follow. ANCORA "a n k o r a ANCORA a n k "o r a CESSARE tS e ss "a r e CESSEREBBERO tS e ss e r "E bb e r o CITTA` tS i tt "a AIDS "a i d s AIDS a i d i "E ss e BABY-SITTER b E b i s "i tt e r BABY-SITTER b e i b i s "i tt e r BLUE-JEANS b l u dZ "i n s
Identifier:ELRA-S0059
http://catalog.elra.info/product_info.php?products_id=529
Language:Italian
Language (ISO639):ita
Medium:CD-ROM
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-S0059
DateStamp:  1998-11-23
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2004. ELRA (European Language Resources Association).
Terms: area_Europe country_IT dcmi_Sound iso639_ita olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-S0059
Up-to-date as of: Tue Sep 5 1:32:59 EDT 2017