OLAC Record: KoKo German L1 Learner Corpus v2

OLAC Record
oai:clarin.eurac.edu:20.500.12124/11

Metadata

Title: KoKo German L1 Learner Corpus v2

Bibliographic Citation: http://hdl.handle.net/20.500.12124/11

Creator: Abel, Andrea

Glaznieks, Aivars

Culy, Chris

Date (W3CDTF): 2019-09-19T13:21:30Z

Date Available: 2019-09-19T13:21:30Z

Description: The KoKo Corpus is an error-annotated learner corpus of L1 German speakers. It has been created with the aim to investigate and describe the writing skills of German-speaking secondary-school pupils at the end of their school career by analysing authentic texts produced in classrooms. The corpus consists of 1503 argumentative essays which contain manually performed transcription annotations and linguistic error annotations. Error annotation relates to the orthographic level only. Transcription annotations reflect surface features of the text, such as the graphical arrangement and self-corrections. The corpus building process was guided by two goals: 1. describe writing skills at the transition from secondary school to university, 2. determine external factors that may influence the distribution of writing skills, such as the region, sociolinguistic (gender, age), socio-economic, and language-related biographical factors (L1, preferred variety of German, reading and writing habits, etc.). The pupils were selected from three different German-speaking areas: - North Tyrol (Austria), South Tyrol (Italy), and Thuringia (Germany). Classes were sampled randomly, using the size of the cities in which the schools were located (small vs. medium vs. big) and the type of school (providing general education vs. education specific to a particular profession) as strata for the sampling. Since data were collected during regular courses, the typical formation of secondary-school classes in the three regions is represented in the whole corpus. Most of the participants are German native speakers (n=1319, 82.7%). Person-related metadata provides information about: - writer's L1 - writer's gender - type of school the essay comes from - location of the school the essay comes from - grade attended at data collection In addition, the corpus is automatically annotated, including tokenisation, sentence splitting, POS-tagging and lemmatization.

Identifier (URI): http://hdl.handle.net/20.500.12124/11

Is Replaced By (URI): http://hdl.handle.net/20.500.12124/12

Language: German

Language (ISO639): deu

Publisher: Institute for Applied Linguistics, Eurac Research

Replaces (URI): http://hdl.handle.net/20.500.12124/10

Rights: CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)

https://gitlab.inf.unibz.it/commul/var/eurac-licenses/-/raw/v1.0/EULA-CLARIN-ACA-BY-NC-NORED.md

Subject: learner corpus

German varieties

students in secondary school

argumentative essays

Type: corpus

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: Eurac Research CLARIN Centre

Description: http://www.language-archives.org/archive/clarin.eurac.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:clarin.eurac.edu:20.500.12124/11

DateStamp: 2023-03-17

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Abel, Andrea; Glaznieks, Aivars; Culy, Chris. 2019. Institute for Applied Linguistics, Eurac Research.
Terms: area_Europe country_DE dcmi_Text iso639_deu olac_primary_text

http://www.language-archives.org/item.php/oai:clarin.eurac.edu:20.500.12124/11
Up-to-date as of: Fri Oct 17 1:18:45 EDT 2025

Metadata
Title:		KoKo German L1 Learner Corpus v2
Bibliographic Citation:		http://hdl.handle.net/20.500.12124/11
Creator:		Abel, Andrea
		Glaznieks, Aivars
		Culy, Chris
Date (W3CDTF):		2019-09-19T13:21:30Z
Date Available:		2019-09-19T13:21:30Z
Description:		The KoKo Corpus is an error-annotated learner corpus of L1 German speakers. It has been created with the aim to investigate and describe the writing skills of German-speaking secondary-school pupils at the end of their school career by analysing authentic texts produced in classrooms. The corpus consists of 1503 argumentative essays which contain manually performed transcription annotations and linguistic error annotations. Error annotation relates to the orthographic level only. Transcription annotations reflect surface features of the text, such as the graphical arrangement and self-corrections. The corpus building process was guided by two goals: 1. describe writing skills at the transition from secondary school to university, 2. determine external factors that may influence the distribution of writing skills, such as the region, sociolinguistic (gender, age), socio-economic, and language-related biographical factors (L1, preferred variety of German, reading and writing habits, etc.). The pupils were selected from three different German-speaking areas: - North Tyrol (Austria), South Tyrol (Italy), and Thuringia (Germany). Classes were sampled randomly, using the size of the cities in which the schools were located (small vs. medium vs. big) and the type of school (providing general education vs. education specific to a particular profession) as strata for the sampling. Since data were collected during regular courses, the typical formation of secondary-school classes in the three regions is represented in the whole corpus. Most of the participants are German native speakers (n=1319, 82.7%). Person-related metadata provides information about: - writer's L1 - writer's gender - type of school the essay comes from - location of the school the essay comes from - grade attended at data collection In addition, the corpus is automatically annotated, including tokenisation, sentence splitting, POS-tagging and lemmatization.
Identifier (URI):		http://hdl.handle.net/20.500.12124/11
Is Replaced By (URI):		http://hdl.handle.net/20.500.12124/12
Language:		German
Language (ISO639):		deu
Publisher:		Institute for Applied Linguistics, Eurac Research
Replaces (URI):		http://hdl.handle.net/20.500.12124/10
Rights:		CLARIN ACADEMIC END-USER LICENCE (ACA-BY-NC-NORED 1.0)
Rights:		https://gitlab.inf.unibz.it/commul/var/eurac-licenses/-/raw/v1.0/EULA-CLARIN-ACA-BY-NC-NORED.md
Subject:		learner corpus
		German varieties
		students in secondary school
		argumentative essays
Type:		corpus
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		Eurac Research CLARIN Centre
Description:		http://www.language-archives.org/archive/clarin.eurac.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:clarin.eurac.edu:20.500.12124/11
DateStamp:		2023-03-17
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Abel, Andrea; Glaznieks, Aivars; Culy, Chris. 2019. Institute for Applied Linguistics, Eurac Research.
Terms:		area_Europe country_DE dcmi_Text iso639_deu olac_primary_text