OLAC Record: Structured, tokenized and tagged data from Infral's blogs

OLAC Record
oai:mulce.org:mce-infral-tagged_blogs

Metadata

Title: Structured, tokenized and tagged data from Infral's blogs

Access Rights: open access after registration

Audience: Researchers or teachers in educational sciences or linguistics

Bibliographic Citation: Laurent, M. (2011). Structured, tokenized and tagged data from Infral's blogs. LETEC Corpus from INFRAL collection, Chanier, T. (editor). Mulce.org : Clermont Université. [oai : mulce.org:mce-infral-tagged_blogs ; http://repository.mulce.org ]

Conforms To: IMS-CP for packaging

Contributor (author): Laurent Mario

Contributor (compiler): Laurent Mario

Chanier Thierry

Contributor (data_inputter): Laurent Mario

Contributor (depositor): Chanier Thierry

Contributor (editor): Chanier Thierry

Contributor (researcher): Laurent Mario ; Chanier Thierry

Creator: Laurent Mario ; Chanier Thierry

Creator (URI): Mario Laurent

Date Created (W3CDTF): 2011-12-02

Description: This corpus is based on data extracted from the global Learning & Teaching Corpus Infral archived in the data repository Mulce : http://repository.mulce.org. It was created by Mario Laurent based on his Masters' project carried out in Laboratoire de Recherche sur le Langage, Université Blaise Pascal, Clermont-Ferrand.

Structuring language interactions into exploitable corpora is necessary to analyze the data from the Infral project. To understand the development of intercultural competences we have to quantify the production of the different participants, such as language use or lexical diversity. In order to achieve this, we used Python programming language and the NLTK library. During the Infral course, participants from a French and a German university communicated using both languages via blogs. We developed a program that converts plain text from Infral's blogs into a structured XML file where each message is tokenized into words. Each word is tagged according to its form and its original language.

Extent: 28 500 ko

Format (IMT): text/xml

application/pdf

Identifier: mce-infral-tagged_blogs

Identifier (URI): http://mulce.univ-bpclermont.fr:8080/PlateFormeMulce/VIEW/PUBLIC/03/VMeta.do?adr=Infral%2FCorpus_objets%2Fmce-infral-tagged_blogs

Language: French

German

Language (ISO639): fra

deu

Publisher: Mulce (MULtimodal Corpus Exchange) ; Universite Blaise Pascal ; Clermont-Ferrand:France ; URL:http://mulce.org

References: Abendroth-Timmer, D., Bechtel, M., Chanier, T. & Ciekanski, M. (2009) "From developing to investigating intercultural competence in practice through oral and written interactions in online exchanges", Kongress für Fremdsprachendidaktik der Deutschen Gesellschaft für Fremdsprachenforschung (DGFF-Tagung), Universität Leipzig, octobre 2009. [http://edutice.archives-ouvertes.fr/edutice-00548891/ ]

Abendroth-Timmer , D., Chanier, T., Ciekanski , M., Bechtel M. & Henning E-V. (2010) "Du développement à l’investigation de la compétence interculturelle en pratique à partir des interactions à l’oral et à l’écrit dans des échanges en ligne à distance." Colloque "Plurilingualism and Pluriculturalism in a Globalised World: which Pedagogy?" (PLIDAM), 17-19 Juin, Paris.

Laurent, M. (2011). Structuration des données des blogues de la formation Infral à l’aide des outils de programmation Python et NLTK. Report of Master 2 Sciences du Langage, Univertié Blaise Pascal

Chanier, T. & Ciekanski, M. (2010). Utilité du partage des corpus pour l'analyse des interactions en ligne en situation d'apprentissage : un exemple d'approche méthodologique autour d'une base de corpus d'apprentissage. ALSIC - Apprentissage des Langues et Systèmes d'Information et de Communication 13 [http://edutice.archives-ouvertes.fr/edutice-00486676/ ]

Reffay, C, Chanier, T., Noras, M. & Betbeder, M.-L. (2008). Contribution à la structuration de corpus d'apprentissage pour un meilleur partage en recherche. In Basque, J. & Reffay, C. (dir.), numéro spécial EPAL (échanger pour apprendre en ligne), Sciences et Technologies de l'Information et de la Communication pour l'Education et la Formation (STICEF), 15, [http://sticef.univ-lemans.fr/num/vol2008/01-reffay/sticef_2008_reffay_01p.pdf , http://edutice.archives-ouvertes.fr/edutice-00159733 ]

References (URI): http://edutice.archives-ouvertes.fr/edutice-00548891/

http://edutice.archives-ouvertes.fr/edutice-00486676/

http://sticef.univ-lemans.fr/num/vol2008/01-reffay/sticef_2008_reffay_01p.pdf

Requires: mce-infral-letec-all

Rights: Rights holders of this corpus are: Thierry Chanier ; Dagmar Abendroth-Timmer; Maud Ciekanski ; Mark Bechtel ; Laurent Mario ; licence = http://creativecommons.org/licenses/by-nc-sa/2.0/

Rights (URI): http://lrl-diffusion.univ-bpclermont.fr/mulce/metadata/vdex/mce_licence.xml

Spatial Coverage (ISO3166): DE

FR

Spatial Coverage (TGN): 7005286

7008356

Subject: NLP; XML; telecollaboration ; intercultural; online teaching

French language

Subject (ISO639): fra

Subject (LCSH): Education

Data processing

Computer-assisted instruction

Language and languages

Study and teaching

Subject (OLAC): applied_linguistics

discourse_analysis

text_and_corpus_linguistics

Temporal Coverage: name=Infral course ; start=2008-09-29; end=2009-01-09

name=Master Project ; start=2011-03-01; end=2011-30-06

Type (DCMI): Dataset

Collection

Type (Discourse): dialogue

narrative

Type (OLAC): primary_text

OLAC Info

Archive: Multimodal Learning and teaching Corpora Exchange

Description: http://www.language-archives.org/archive/mulce.org

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:mulce.org:mce-infral-tagged_blogs

DateStamp: 2012-09-05

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Mario Laurent; Laurent Mario ; Chanier Thierry. 2011. Mulce (MULtimodal Corpus Exchange) ; Universite Blaise Pascal ; Clermont-Ferrand:France ; URL:http://mulce.org.
Terms: area_Europe country_DE country_FR dcmi_Collection dcmi_Dataset iso639_deu iso639_fra olac_applied_linguistics olac_dialogue olac_discourse_analysis olac_narrative olac_primary_text olac_text_and_corpus_linguistics

Inferred Metadata
Country: France
Area: Europe

http://www.language-archives.org/item.php/oai:mulce.org:mce-infral-tagged_blogs
Up-to-date as of: Fri Dec 28 1:37:10 EST 2018

Metadata
Title:		Structured, tokenized and tagged data from Infral's blogs
Access Rights:		open access after registration
Audience:		Researchers or teachers in educational sciences or linguistics
Bibliographic Citation:		Laurent, M. (2011). Structured, tokenized and tagged data from Infral's blogs. LETEC Corpus from INFRAL collection, Chanier, T. (editor). Mulce.org : Clermont Université. [oai : mulce.org:mce-infral-tagged_blogs ; http://repository.mulce.org ]
Conforms To:		IMS-CP for packaging
Contributor (author):		Laurent Mario
Contributor (compiler):		Laurent Mario
Contributor (compiler):		Chanier Thierry
Contributor (data_inputter):		Laurent Mario
Contributor (depositor):		Chanier Thierry
Contributor (editor):		Chanier Thierry
Contributor (researcher):		Laurent Mario ; Chanier Thierry
Creator:		Laurent Mario ; Chanier Thierry
Creator (URI):		Mario Laurent
Date Created (W3CDTF):		2011-12-02
Description:		This corpus is based on data extracted from the global Learning & Teaching Corpus Infral archived in the data repository Mulce : http://repository.mulce.org. It was created by Mario Laurent based on his Masters' project carried out in Laboratoire de Recherche sur le Langage, Université Blaise Pascal, Clermont-Ferrand.
Description:		Structuring language interactions into exploitable corpora is necessary to analyze the data from the Infral project. To understand the development of intercultural competences we have to quantify the production of the different participants, such as language use or lexical diversity. In order to achieve this, we used Python programming language and the NLTK library. During the Infral course, participants from a French and a German university communicated using both languages via blogs. We developed a program that converts plain text from Infral's blogs into a structured XML file where each message is tokenized into words. Each word is tagged according to its form and its original language.
Extent:		28 500 ko
Format (IMT):		text/xml
Format (IMT):		application/pdf
Identifier:		mce-infral-tagged_blogs
Identifier (URI):		http://mulce.univ-bpclermont.fr:8080/PlateFormeMulce/VIEW/PUBLIC/03/VMeta.do?adr=Infral%2FCorpus_objets%2Fmce-infral-tagged_blogs
Language:		French
Language:		German
Language (ISO639):		fra
Language (ISO639):		deu
Publisher:		Mulce (MULtimodal Corpus Exchange) ; Universite Blaise Pascal ; Clermont-Ferrand:France ; URL:http://mulce.org
References:		Abendroth-Timmer, D., Bechtel, M., Chanier, T. & Ciekanski, M. (2009) "From developing to investigating intercultural competence in practice through oral and written interactions in online exchanges", Kongress für Fremdsprachendidaktik der Deutschen Gesellschaft für Fremdsprachenforschung (DGFF-Tagung), Universität Leipzig, octobre 2009. [http://edutice.archives-ouvertes.fr/edutice-00548891/ ]
		Abendroth-Timmer , D., Chanier, T., Ciekanski , M., Bechtel M. & Henning E-V. (2010) "Du développement à l’investigation de la compétence interculturelle en pratique à partir des interactions à l’oral et à l’écrit dans des échanges en ligne à distance." Colloque "Plurilingualism and Pluriculturalism in a Globalised World: which Pedagogy?" (PLIDAM), 17-19 Juin, Paris.
		Laurent, M. (2011). Structuration des données des blogues de la formation Infral à l’aide des outils de programmation Python et NLTK. Report of Master 2 Sciences du Langage, Univertié Blaise Pascal
		Chanier, T. & Ciekanski, M. (2010). Utilité du partage des corpus pour l'analyse des interactions en ligne en situation d'apprentissage : un exemple d'approche méthodologique autour d'une base de corpus d'apprentissage. ALSIC - Apprentissage des Langues et Systèmes d'Information et de Communication 13 [http://edutice.archives-ouvertes.fr/edutice-00486676/ ]
		Reffay, C, Chanier, T., Noras, M. & Betbeder, M.-L. (2008). Contribution à la structuration de corpus d'apprentissage pour un meilleur partage en recherche. In Basque, J. & Reffay, C. (dir.), numéro spécial EPAL (échanger pour apprendre en ligne), Sciences et Technologies de l'Information et de la Communication pour l'Education et la Formation (STICEF), 15, [http://sticef.univ-lemans.fr/num/vol2008/01-reffay/sticef_2008_reffay_01p.pdf , http://edutice.archives-ouvertes.fr/edutice-00159733 ]
References (URI):		http://edutice.archives-ouvertes.fr/edutice-00548891/
		http://edutice.archives-ouvertes.fr/edutice-00486676/
		http://sticef.univ-lemans.fr/num/vol2008/01-reffay/sticef_2008_reffay_01p.pdf
Requires:		mce-infral-letec-all
Rights:		Rights holders of this corpus are: Thierry Chanier ; Dagmar Abendroth-Timmer; Maud Ciekanski ; Mark Bechtel ; Laurent Mario ; licence = http://creativecommons.org/licenses/by-nc-sa/2.0/
Rights (URI):		http://lrl-diffusion.univ-bpclermont.fr/mulce/metadata/vdex/mce_licence.xml
Spatial Coverage (ISO3166):		DE
Spatial Coverage (ISO3166):		FR
Spatial Coverage (TGN):		7005286
Spatial Coverage (TGN):		7008356
Subject:		NLP; XML; telecollaboration ; intercultural; online teaching
Subject:		French language
Subject (ISO639):		fra
Subject (LCSH):		Education
		Data processing
		Computer-assisted instruction
		Language and languages
		Study and teaching
Subject (OLAC):		applied_linguistics
		discourse_analysis
		text_and_corpus_linguistics
Temporal Coverage:		name=Infral course ; start=2008-09-29; end=2009-01-09
Temporal Coverage:		name=Master Project ; start=2011-03-01; end=2011-30-06
Type (DCMI):		Dataset
Type (DCMI):		Collection
Type (Discourse):		dialogue
Type (Discourse):		narrative
Type (OLAC):		primary_text
OLAC Info
Archive:		Multimodal Learning and teaching Corpora Exchange
Description:		http://www.language-archives.org/archive/mulce.org
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:mulce.org:mce-infral-tagged_blogs
DateStamp:		2012-09-05
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Mario Laurent; Laurent Mario ; Chanier Thierry. 2011. Mulce (MULtimodal Corpus Exchange) ; Universite Blaise Pascal ; Clermont-Ferrand:France ; URL:http://mulce.org.
Terms:		area_Europe country_DE country_FR dcmi_Collection dcmi_Dataset iso639_deu iso639_fra olac_applied_linguistics olac_dialogue olac_discourse_analysis olac_narrative olac_primary_text olac_text_and_corpus_linguistics
Inferred Metadata
Country:		France
Area:		Europe