OLAC Record: LEONIDE - Longitudinal Learner Corpus in Italiano, Deutsch and English 1.1

OLAC Record
oai:clarin.eurac.edu:20.500.12124/25

Metadata

Title: LEONIDE - Longitudinal Learner Corpus in Italiano, Deutsch and English 1.1

Bibliographic Citation: http://hdl.handle.net/20.500.12124/25

Creator: Glaznieks, Aivars

Frey, Jennifer-Carmen

Stopfner, Maria

Zanasi, Lorenzo

Nicolas, Lionel

Date (W3CDTF): 2020-07-06T10:24:27Z

Date Available: 2020-07-06T10:24:27Z

Description: LEONIDE is a longitudinal corpus of student essays documenting the language competences and writing development of lower secondary school students in three different languages. The corpus contains 2.512 texts from 163 pupils, who participated in the project “One school, many languages” conducted in eight schools in the officially multilingual Italian province of South Tyrol / Alto Adige (Zanasi & Stopfner, 2018). The aim of the project was to document the development of the pupils' plurilingual linguistic and communicative skills by collecting oral and written language samples in Italian, German and English, in order to obtain a global view of their individual linguistic repertoire. LEONIDE contains all the texts written by the participating students during the course of the project, the overall size of the corpus amounts to ca. 240.000 tokens. The texts were collected over the span of 3 consecutive years (2015-2018) in public middle schools (i.e. lower secondary school, grade 6 to grade 8). The pupils were 11 years old at the beginning of the data collection and 13 years old at the end. In each grade, two written texts were collected that differ with respect to genre: the first text was elicited using a picture story re-telling task; the second text is an opinion text on different aspects related to the pupils’ life and public discourse. For each genre and each grade, the corpus provides texts in the three languages German, Italian and English. In order to reflect the school system of the Province of South Tyrol / Alto Adige, about half of the texts was collected in four schools in which German is the main language of teaching and Italian is taught as L2. The other half of the texts was collected in four schools in which Italian is the main language of teaching and German is taught as L2. In all schools, English is taught as L3 (i.e. as a foreign language at school). Subdivided by language, the corpus contains 844 Italian, 833 German and 835 English texts. Manual annotation: The corpus is fully anonymised and annotated with target hypotheses correcting orthography errors in the text as well as annotations on structural elements (paragraphs, line breaks, bullet points, symbols or emoticons etc.), foreign word insertions and transcript surface features (e.g. deletions, corrections or insertions of the student, unreadable or ambiguous items). Automatic annotation: Automatic linguistic annotation included sentence splitting, tokenisation, lemmatisation and part-of-speech-tagging. Text metadata: The corpus provides a series of relevant person-related metadata (e.g. age, gender, first language(s), school and possible special needs of the students) as well as task-related metadata (e.g. task year, text genre, etc.) Usage: As the corpus documents the development of plurilingual competences of individual learners over a period of three years, it will allow both quantitative research on the characteristics of young learners’ language over a relatively long period, as well as investigations of the development of individuals taking into account a wide range of person related metadata. In addition, it allows contrastive analyses of the young learners’ progress in their L1, L2 and L3. Availability: The corpus will be available for corpus queries via an ANNIS search interface and as download for academic purposes (ACA-BY-NC-NORED 1.0) on the Eurac Research Clarin Centre by the end of 2020.  References: Zanasi, L. & Stopfner, M. (2018). Rilevare, osservare, consultare. Metodi e strumenti per l’analisi del plurilinguismo nella scuola secondaria di primo grado. In C. M. Coonan, A. Bier Ada & E. Ballarin (Ed.), La didattica delle lingue nel nuovo millennio. Le sfide dell’internazionalizzazione (pp. 135-148). Edizioni Ca’Foscari. http://doi.org/10.30687/978-88-6969-227-7/009 Glaznieks, A., Frey, J.-C., Stopfner, M., Zanasi, L. & Nicolas, L. (accepted): LEONIDE: A longitudinal trilingual corpus of young learners of Italian, German and English. In: International Journal of Learner Corpus Linguistics.

1.1

Identifier (URI): http://hdl.handle.net/20.500.12124/25

Language: German

Italian

English

Language (ISO639): deu

ita

eng

Publisher: Institute for Applied Linguistics, Eurac Research

Rights: CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)

https://gitlab.inf.unibz.it/commul/var/eurac-licenses/-/raw/v1.0/EULA-CLARIN-ACA-BY-NC-NORED.md

Subject: multilingualism

evaluation

language competences

learner corpus

L1

L2

student essays

picture story

opinion texts

argumentative essay

Type: corpus

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: Eurac Research CLARIN Centre

Description: http://www.language-archives.org/archive/clarin.eurac.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:clarin.eurac.edu:20.500.12124/25

DateStamp: 2025-03-27

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Glaznieks, Aivars; Frey, Jennifer-Carmen; Stopfner, Maria; Zanasi, Lorenzo; Nicolas, Lionel. 2020. Institute for Applied Linguistics, Eurac Research.
Terms: area_Europe country_DE country_GB country_IT dcmi_Text iso639_deu iso639_eng iso639_ita olac_primary_text

http://www.language-archives.org/item.php/oai:clarin.eurac.edu:20.500.12124/25
Up-to-date as of: Fri Oct 17 1:18:44 EDT 2025

Metadata
Title:		LEONIDE - Longitudinal Learner Corpus in Italiano, Deutsch and English 1.1
Bibliographic Citation:		http://hdl.handle.net/20.500.12124/25
Creator:		Glaznieks, Aivars
		Frey, Jennifer-Carmen
		Stopfner, Maria
		Zanasi, Lorenzo
		Nicolas, Lionel
Date (W3CDTF):		2020-07-06T10:24:27Z
Date Available:		2020-07-06T10:24:27Z
Description:		LEONIDE is a longitudinal corpus of student essays documenting the language competences and writing development of lower secondary school students in three different languages. The corpus contains 2.512 texts from 163 pupils, who participated in the project “One school, many languages” conducted in eight schools in the officially multilingual Italian province of South Tyrol / Alto Adige (Zanasi & Stopfner, 2018). The aim of the project was to document the development of the pupils' plurilingual linguistic and communicative skills by collecting oral and written language samples in Italian, German and English, in order to obtain a global view of their individual linguistic repertoire. LEONIDE contains all the texts written by the participating students during the course of the project, the overall size of the corpus amounts to ca. 240.000 tokens. The texts were collected over the span of 3 consecutive years (2015-2018) in public middle schools (i.e. lower secondary school, grade 6 to grade 8). The pupils were 11 years old at the beginning of the data collection and 13 years old at the end. In each grade, two written texts were collected that differ with respect to genre: the first text was elicited using a picture story re-telling task; the second text is an opinion text on different aspects related to the pupils’ life and public discourse. For each genre and each grade, the corpus provides texts in the three languages German, Italian and English. In order to reflect the school system of the Province of South Tyrol / Alto Adige, about half of the texts was collected in four schools in which German is the main language of teaching and Italian is taught as L2. The other half of the texts was collected in four schools in which Italian is the main language of teaching and German is taught as L2. In all schools, English is taught as L3 (i.e. as a foreign language at school). Subdivided by language, the corpus contains 844 Italian, 833 German and 835 English texts. Manual annotation: The corpus is fully anonymised and annotated with target hypotheses correcting orthography errors in the text as well as annotations on structural elements (paragraphs, line breaks, bullet points, symbols or emoticons etc.), foreign word insertions and transcript surface features (e.g. deletions, corrections or insertions of the student, unreadable or ambiguous items). Automatic annotation: Automatic linguistic annotation included sentence splitting, tokenisation, lemmatisation and part-of-speech-tagging. Text metadata: The corpus provides a series of relevant person-related metadata (e.g. age, gender, first language(s), school and possible special needs of the students) as well as task-related metadata (e.g. task year, text genre, etc.) Usage: As the corpus documents the development of plurilingual competences of individual learners over a period of three years, it will allow both quantitative research on the characteristics of young learners’ language over a relatively long period, as well as investigations of the development of individuals taking into account a wide range of person related metadata. In addition, it allows contrastive analyses of the young learners’ progress in their L1, L2 and L3. Availability: The corpus will be available for corpus queries via an ANNIS search interface and as download for academic purposes (ACA-BY-NC-NORED 1.0) on the Eurac Research Clarin Centre by the end of 2020.  References: Zanasi, L. & Stopfner, M. (2018). Rilevare, osservare, consultare. Metodi e strumenti per l’analisi del plurilinguismo nella scuola secondaria di primo grado. In C. M. Coonan, A. Bier Ada & E. Ballarin (Ed.), La didattica delle lingue nel nuovo millennio. Le sfide dell’internazionalizzazione (pp. 135-148). Edizioni Ca’Foscari. http://doi.org/10.30687/978-88-6969-227-7/009 Glaznieks, A., Frey, J.-C., Stopfner, M., Zanasi, L. & Nicolas, L. (accepted): LEONIDE: A longitudinal trilingual corpus of young learners of Italian, German and English. In: International Journal of Learner Corpus Linguistics.
Description:		1.1
Identifier (URI):		http://hdl.handle.net/20.500.12124/25
Language:		German
		Italian
		English
Language (ISO639):		deu
		ita
		eng
Publisher:		Institute for Applied Linguistics, Eurac Research
Rights:		CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)
Rights:		https://gitlab.inf.unibz.it/commul/var/eurac-licenses/-/raw/v1.0/EULA-CLARIN-ACA-BY-NC-NORED.md
Subject:		multilingualism
		evaluation
		language competences
		learner corpus
		L1
		L2
		student essays
		picture story
		opinion texts
		argumentative essay
Type:		corpus
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		Eurac Research CLARIN Centre
Description:		http://www.language-archives.org/archive/clarin.eurac.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:clarin.eurac.edu:20.500.12124/25
DateStamp:		2025-03-27
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Glaznieks, Aivars; Frey, Jennifer-Carmen; Stopfner, Maria; Zanasi, Lorenzo; Nicolas, Lionel. 2020. Institute for Applied Linguistics, Eurac Research.
Terms:		area_Europe country_DE country_GB country_IT dcmi_Text iso639_deu iso639_eng iso639_ita olac_primary_text