Please use the following text to cite this item or export to a predefined format:
Wedig, Helena and Strobl, Carola, 2024, German Summary Corpus (GerSumCo) v1.0.0, CLARIN DSpace, http://hdl.handle.net/20.500.12124/81
dc.contributor.authorWedig, Helena
dc.contributor.authorStrobl, Carola
dc.date.accessioned2024-11-14T17:14:49Z
dc.date.available2024-11-14T17:14:49Z
dc.date.issued2024
dc.descriptionThe GerSumCo (German Summary Corpus) is a learner corpus comprising syntheses written by L2 German writers (CEFR B2/C1) and writers of L1 German. The corpus has been created with the objective of conducting a comparative analysis of the academic writing of L1 German and L2 German students. The two subcorpora (L1 and L2) contain a total of 286 texts (178 L1 and 108 L2), written by 286 students at 14 universities and language schools in Germany (Bamberg, Bochum, Dresden, Hamburg, Hildesheim, Kiel, Leipzig, Magdeburg, Osnabrück, Potsdam, Trier, Wuppertal), Poland (Gdansk) and China (Hangzhou). The texts were collected between 2022 and 2024 as part of a PhD research project about a contrastive interlanguage analysis using GerSumCo and Beldeko to identify L1-dependent features in cohesion in L2/L1 German. The metadata files (Meta_GerSumCo_L1 & Meta_GerSumCo_L2) contain the following information: - Up to three L1s of the writers - Up to three L2s of the writers - Collection date - Topic - Whether the text was written as homework or in class - Group of students the texts belonged to The file names contain the following information: - Whether the text is part of the L1 or L2 subcorpus - Topic The summaries, on average, consist of 230 words. The texts were either produced in class on computers or as homework, within a 60-minute time frame. Students were permitted to use online dictionaries, but no AI-based auxiliary means. They were required to summarise two texts on one of four topics related to language variation in German: Kiezdeutsch, Mundartdebatte in der Schweiz, Viadrinisch and Varianten-Wörterbuch des Deutschen. This version contains the TXT files of the texts and the CSV files containing the manual annotations of the texts with token ID, sentence ID, source text form, target form, automatic annotated lemma, POS (STTS) and simple UPOS part-of-speech tag.
dc.identifier.urihttp://hdl.handle.net/20.500.12124/81
dc.language.isodeu
dc.publisherUniversity of Antwerp
dc.rightsCreative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
dc.rights.labelPUB
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0/
dc.subjectcohesion
dc.subjectL2 German
dc.subjectsummaries
dc.subjectlearner language
dc.subjectsynthesis writing
dc.titleGerman Summary Corpus (GerSumCo) v1.0.0
dc.typecorpus
local.brandingVarious
local.contact.personHelena Wedig helena.wedig@uantwerpen.be University of Antwerp
local.contact.personCarola Strobl carola.strobl@uantwerpen.be University of Antwerp
local.files.count6
local.files.size1457044
local.has.filesyes
local.hasCMDIfalse
local.hiddenfalse
local.language.nameGerman
local.size.info286 texts
local.sponsornationalFunds 1181323N Research Foundation – Flanders (FWO) A corpus-based analysis of grammatical cohesion in L2 German: Insights into the effect of learners' native language on academic writing proficiency in a foreign language.
metashare.ResourceInfo#ContentInfo.mediaTypetext
 Files in this item
Loading files... This may take a few seconds as file previews are being generated. If the process takes too long, please contact the system administrator