LINDAT/CLARIAH-CZ Repository Home

Linguistic Data and NLP Tools

Find

Citation Support (with Persistent IDs)

Deposit Free and Safe

License of your Choice (Open licenses encouraged)

Easy to Find

Easy to Cite

“There ought to be only one grand dépôt of art in the world, to which the artist might repair with his works, and on presenting them receive what he required... ”Ludwig van Beethoven, 1801

What's New

corpusVarious

HELLO CAMPANIA! Ghana Collection

Author(s):

Di Salvo, Margherita ; Cataldo, Violetta ; Marta, Maffia and Asienda, Hannaora Marlene

Description:

The HELLO CAMPANIA! Ghana collection contains 12 sociolinguistic interviews collected with 4 first generation migrants and 8 second generation migrants living in Naples. It also contains 9 language portraits.

Academic Use

corpusVarious

HELLO CAMPANIA! Sri Lanka Collection

Author(s):

Di Salvo, Margherita ; Cataldo, Violetta ; Maffia, Marta and Noschese, Maria Paola

Description:

The corpus consists of 48 audio files for a total of 20:38 of recordings (public) and their relative transcriptions in ELAN (upon request). This collection includes 15 language portraits. The collection is organized in four bundles: - 1G_audio: contains all the audio files collected with 1st generation migrants (30 files) - 1G_portrait: contains the language portraits collected 1st generation migrants (13 files) - 2G_audio: contains all the audio files collected with 2nd generation migrants (18 files) - 2G_portrait: contains the language portraits collected 2nd generation migrants (2 files)

Academic Use

corpusVarious

HELLO CAMPANIA! Ukraina Collection

Author(s):

Di Salvo, Margherita ; Noschese, Maria Paola ; Cataldo, Violetta and Maffia, Marta

Description:

The Ukrainian collection contains data for 26 speakers of first generation (G1), 19 females and 6 males. The collection contains three folders for each group: the sociolinguistic interview and a language portrait.

Academic Use

Most Viewed Items - Last Month

corpusVarious

HELLO CAMPANIA! Ghana Collection

Author(s):

Di Salvo, Margherita ; Cataldo, Violetta ; Marta, Maffia and Asienda, Hannaora Marlene

Description:

Academic Use

corpusLearner Language

KoKo German L1 Learner Corpus v1

Author(s):

Abel, Andrea ; Glaznieks, Aivars and Culy, Chris

Description:

The KoKo Corpus is an error-annotated learner corpus of L1 German speakers. It has been created with the aim to investigate and describe the writing skills of German-speaking secondary-school pupils at the end of their school career by analysing authentic texts produced in classrooms. The corpus building process was guided by two goals: 1. describe writing skills at the transition from secondary school to university, 2. determine external factors that may influence the distribution of writing skills, such as the region, sociolinguistic (gender, age), socio-economic, and language-related biographical factors (L1, preferred variety of German, reading and writing habits, etc.). The pupils were selected from three different German-speaking areas: - North Tyrol (Austria), South Tyrol (Italy), and Thuringia (Germany). Classes were sampled randomly, using the size of the cities in which the schools were located (small vs. medium vs. big) and the type of school (providing general education vs. education specific to a particular profession) as strata for the sampling. Since data were collected during regular courses, the typical formation of secondary-school classes in the three regions is represented in the whole corpus. Most of the participants are German native speakers (n=1319, 82.7%). Person-related metadata provides information about: - writer's L1 - writer's gender - type of school the essay comes from - location of the school the essay comes from - grade attended at data collection

Academic Use

data management resourceLearner Language

Core Metadata Schema for Learner Corpora (LC-meta) v2

Author(s):

Paquot, Magali ; König, Alexander ; Stemle, Egon and Frey, Jennifer-Carmen

Description:

This document contains a list of metadata fields that can be used to describe learner corpus data. The core metadata scheme is structured around 8 metadata types: - Administrative metadata; - Corpus design metadata; - Learner; - Text (language sample); - Situational and task characteristics; - Annotation; - Annotator; - Transcriber.

Publicly Available