OLAC Record
oai:catalogue.elra.info:ELRA-E0035

Metadata
Title:DEFT'08 Evaluation Package
Abstract:DEFT (DEfi Fouille de Texte ? Text Mining Challenge) organizes evaluation campaigns in the field of text mining. The topic of DEFT 2008 edition is related to the classification of texts by topics and genres. DEFT?08 Evaluation Package enables to compare two corpora with different genres (a newspaper article corpus extracted from Le Monde newspaper and a corpus of encyclopaedic articles extracted from the internet free encyclopaedia, Wikipedia) on the basis of the same set of pre-defined categories.
Access Rights:Rights available for: Evaluation Use
Date Available (W3CDTF):2012-03-28
Date Issued (W3CDTF):2010-10-22
Date Modified (W3CDTF):2012-03-28
Description:Written Corpora
DEFT (DEfi Fouille de Texte ? Text Mining Challenge) organizes evaluation campaigns in the field of text mining. The topic of DEFT 2008 edition is related to the classification of texts by topics and genres. Automatic classification has multiple applications in text mining. Many application fields have been explored, from email orientation to strategic or scientific watch. For a few years, a new problematics on text genre classification has emerged. Beyond document topic recognition, genre recognition is useful to the use that will be made out of the document. Questions that can be raised are: How can we recognize both document topic and genre? Can difference in genre influence the recognition of a document topical category, and conversely, can difference in topic influence the recognition of a document genre? To evaluate classification software for that prospect, DEFT?08 Evaluation Package enables to compare two corpora with different genres (a newspaper article corpus extracted from Le Monde newspaper and a corpus of encyclopaedic articles extracted from the internet free encyclopaedia, Wikipedia) on the basis of the same set of pre-defined categories. Although a newspaper article highlights news whereas an encyclopaedic article disseminates knowledge, both have a certain amount of general topical categories in common, called ?column? for the former and ?category? for the latter. It consists in testing, on the one hand, robustness of a topical classification model subjected to variations in text genre, and, on the other hand, possible improvements of a topical classification through the recognition of text genre, on those corpora.
Identifier:ELRA-E0035
http://catalog.elra.info/product_info.php?products_id=1165
Language:French
Language (ISO639):fra
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-E0035
DateStamp:  2012-03-28
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2010. ELRA (European Language Resources Association).
Terms: area_Europe country_FR dcmi_Text iso639_fra olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-E0035
Up-to-date as of: Fri May 5 1:20:21 EDT 2017