OLAC Record
oai:scholarspace.manoa.hawaii.edu:10125/74645

Metadata
Title:Collecting and annotating corpora for three under-resourced languages of France: Methodological issues
Bibliographic Citation:Bernhard, Delphine, Ligozat, Anne-Laure, Bras, Myriam, Martin, Fanny, Vergez-Couret, Marianne, Erhart, Pascale, Sibille, Jean, Todirascu, Amalia, Boula de Mareüil, Philippe, Huck, Dominique; 2021-06; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/74645.
Creator:Bernhard, Delphine
Ligozat, Anne-Laure
Bras, Myriam
Martin, Fanny
Vergez-Couret, Marianne
Erhart, Pascale
Sibille, Jean
Todirascu, Amalia
Boula de Mareüil, Philippe
Huck, Dominique
Date (W3CDTF):2021-06
Description:In contrast to French, the vast majority of regional languages of France can be considered as under-resourced. In this article, we present the results of a research project aiming to produce annotated resources for three regional languages of France: Alsatian, Occitan, and Picard. These languages cover three different language families (Germanic and two subfamilies of Romance, Oïl and Oc languages) and different sociolinguistic situations. Yet, they all face issues common to many under-resourced languages: lack of human and financial resources and presence of geolinguistic variation. The originality of this project is that it brought together researchers from different fields (sociolinguistics, descriptive linguistics, dialectology, natural language processing, digital humanities) to work together towards the common goal of developing annotated corpora for Alsatian, Occitan, and Picard. This created a favorable and stimulating working environment which could not have been achieved had different research groups worked independently, each on a single language. This article details the annotation process, with a special focus on the delimitation of the tokens and the definition of the part-of-speech tags.
National Foreign Language Resource Center
Format:42 pages
Identifier:Bernhard, Delphine, Anne-Laure Ligozat, Myriam Bras, Fanny Martin, Marianne Vergez-Couret, Pascale Erhart, Jean Sibille, Amalia Todirascu, Philippe Boula de Mareüil, & Dominique Huck. 2021. Collecting and annotating corpora for three under-resourced languages of France: Methodological issues. Language Documentation & Conservation 15: 316-357. http://hdl.handle.net/10125/74645.
1934-5275
Identifier (URI):http://hdl.handle.net/10125/74645
Publisher:University of Hawaii Press
Rights:Creative Commons Attribution-NonCommercial 4.0 International
Attribution-NonCommercial 3.0 United States
http://creativecommons.org/licenses/by-nc/3.0/us/
Subject:corpus
annotations
tokenization
part-of-speech
Alsatian
Occitan
Picard
Table Of Contents:bernhard_et_al.pdf
Type:Article
Type (DCMI):Text

OLAC Info

Archive:  Language Documentation and Conservation
Description:  http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:scholarspace.manoa.hawaii.edu:10125/74645
DateStamp:  2021-06-23
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Bernhard, Delphine; Ligozat, Anne-Laure; Bras, Myriam; Martin, Fanny; Vergez-Couret, Marianne; Erhart, Pascale; Sibille, Jean; Todirascu, Amalia; Boula de Mareüil, Philippe; Huck, Dominique. 2021. University of Hawaii Press.
Terms: dcmi_Text


http://www.language-archives.org/item.php/oai:scholarspace.manoa.hawaii.edu:10125/74645
Up-to-date as of: Sun Oct 29 7:26:32 EDT 2023