Title:Structured, tokenized and tagged data from Infral's blogs
Access Rights:open access after registration
Audience:Researchers or teachers in educational sciences or linguistics
Bibliographic Citation:Laurent, M. (2011). Structured, tokenized and tagged data from Infral's blogs. LETEC Corpus from INFRAL collection, Chanier, T. (editor). Mulce.org : Clermont Université. [oai : mulce.org:mce-infral-tagged_blogs ; http://repository.mulce.org ]
Conforms To:IMS-CP for packaging
Contributor (author):Laurent Mario
Contributor (compiler):Laurent Mario
Chanier Thierry
Contributor (data_inputter):Laurent Mario
Contributor (depositor):Chanier Thierry
Contributor (editor):Chanier Thierry
Contributor (researcher):Laurent Mario ; Chanier Thierry
Creator: Laurent Mario ; Chanier Thierry
Creator (URI):Mario Laurent
Date Created (W3CDTF):2011-12-02
Description:This corpus is based on data extracted from the global Learning & Teaching Corpus Infral archived in the data repository Mulce : http://repository.mulce.org. It was created by Mario Laurent based on his Masters' project carried out in Laboratoire de Recherche sur le Langage, Université Blaise Pascal, Clermont-Ferrand.
Structuring language interactions into exploitable corpora is necessary to analyze the data from the Infral project. To understand the development of intercultural competences we have to quantify the production of the different participants, such as language use or lexical diversity. In order to achieve this, we used Python programming language and the NLTK library. During the Infral course, participants from a French and a German university communicated using both languages via blogs. We developed a program that converts plain text from Infral's blogs into a structured XML file where each message is tokenized into words. Each word is tagged according to its form and its original language.
Extent:28 500 ko
Format (IMT):text/xml
Identifier (URI):http://mulce.univ-bpclermont.fr:8080/PlateFormeMulce/VIEW/PUBLIC/03/VMeta.do?adr=Infral%2FCorpus_objets%2Fmce-infral-tagged_blogs
Language (ISO639):fra
Publisher: Mulce (MULtimodal Corpus Exchange) ; Universite Blaise Pascal ; Clermont-Ferrand:France ; URL:http://mulce.org
References:Abendroth-Timmer, D., Bechtel, M., Chanier, T. & Ciekanski, M. (2009) "From developing to investigating intercultural competence in practice through oral and written interactions in online exchanges", Kongress für Fremdsprachendidaktik der Deutschen Gesellschaft für Fremdsprachenforschung (DGFF-Tagung), Universität Leipzig, octobre 2009. [http://edutice.archives-ouvertes.fr/edutice-00548891/ ]
Abendroth-Timmer , D., Chanier, T., Ciekanski , M., Bechtel M. & Henning E-V. (2010) "Du développement à l’investigation de la compétence interculturelle en pratique à partir des interactions à l’oral et à l’écrit dans des échanges en ligne à distance." Colloque "Plurilingualism and Pluriculturalism in a Globalised World: which Pedagogy?" (PLIDAM), 17-19 Juin, Paris.
Laurent, M. (2011). Structuration des données des blogues de la formation Infral à l’aide des outils de programmation Python et NLTK. Report of Master 2 Sciences du Langage, Univertié Blaise Pascal
Chanier, T. & Ciekanski, M. (2010). Utilité du partage des corpus pour l'analyse des interactions en ligne en situation d'apprentissage : un exemple d'approche méthodologique autour d'une base de corpus d'apprentissage. ALSIC - Apprentissage des Langues et Systèmes d'Information et de Communication 13 [http://edutice.archives-ouvertes.fr/edutice-00486676/ ]
Reffay, C, Chanier, T., Noras, M. & Betbeder, M.-L. (2008). Contribution à la structuration de corpus d'apprentissage pour un meilleur partage en recherche. In Basque, J. & Reffay, C. (dir.), numéro spécial EPAL (échanger pour apprendre en ligne), Sciences et Technologies de l'Information et de la Communication pour l'Education et la Formation (STICEF), 15, [http://sticef.univ-lemans.fr/num/vol2008/01-reffay/sticef_2008_reffay_01p.pdf , http://edutice.archives-ouvertes.fr/edutice-00159733 ]
References (URI):http://edutice.archives-ouvertes.fr/edutice-00548891/
Rights: Rights holders of this corpus are: Thierry Chanier ; Dagmar Abendroth-Timmer; Maud Ciekanski ; Mark Bechtel ; Laurent Mario ; licence = http://creativecommons.org/licenses/by-nc-sa/2.0/
Rights (URI):http://lrl-diffusion.univ-bpclermont.fr/mulce/metadata/vdex/mce_licence.xml
Spatial Coverage (ISO3166):DE
Spatial Coverage (TGN):7005286
Subject:NLP; XML; telecollaboration ; intercultural; online teaching
French language
Subject (ISO639):fra
Subject (LCSH):Education
Data processing
Computer-assisted instruction
Language and languages
Study and teaching
Subject (OLAC):applied_linguistics
Temporal Coverage:name=Infral course ; start=2008-09-29; end=2009-01-09
name=Master Project ; start=2011-03-01; end=2011-30-06
Type (DCMI):Dataset
Type (Discourse):dialogue
Type (OLAC):primary_text


Citation: Mario Laurent; Laurent Mario ; Chanier Thierry. 2011. Mulce (MULtimodal Corpus Exchange) ; Universite Blaise Pascal ; Clermont-Ferrand:France ; URL:http://mulce.org.
Country: France
Area: Europe

