Please use the following text to cite this item or export to a predefined format:
De Camillis, Flavia; Chiocchetti, Elena and Stemle, Egon W., 2023, MT@BZ translation corpus v1.0, CLARIN DSpace, http://hdl.handle.net/20.500.12124/60
dc.contributor.authorDe Camillis, Flavia
dc.contributor.authorChiocchetti, Elena
dc.contributor.authorStemle, Egon W.
dc.date.accessioned2023-06-18T18:33:02Z
dc.date.available2023-06-18T18:33:02Z
dc.date.issued2023-06-13
dc.descriptionThe MT@BZ is a translation corpus that consists of 52 decrees published by the Autonomous Province of Bolzano (South Tyrol) aligned with their machine translated versions. More precisely, it consists of 26 decrees in German and the same 26 in Italian in their official versions, respectively machine translated by the project team into Italian and into German. 10 of them are COVID-19 related decress, while 16 are miscellaneous. Overall, they consist of around 130,000 words. Their machine translation was carried out with a customized version of ModernMT. Later, the corpus was uploaded first into the annotation platform Webanno, then transferred to Inception. Four annotators annotated the translation errors made by the machine according to an ad hoc error taxonomy for quality assessment. Finally, the annotations were curated to create a gold standard corpus.
dc.identifier.urihttp://hdl.handle.net/20.500.12124/60
dc.language.isoita
dc.language.isodeu
dc.publisherInstitute for Applied Linguistics, Eurac Research
dc.relation.isbasedonhttps://gitlab.inf.unibz.it/commul/mt-bz/data/bundle/-/tags/v1.0
dc.relation.isreferencedbyhttps://events.tuni.fi/uploads/2023/06/11678752-proceedings-eamt2023.pdf
dc.rightsCreative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
dc.rights.labelPUB
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0/
dc.source.urihttps://www.eurac.edu/it/institutes-centers/istituto-di-linguistica-applicata/projects/mtbz
dc.subjectmachine translation
dc.subjectannotation
dc.subjecttranslation errors
dc.subjectaccuracy
dc.subjectfluency
dc.subjectItalian
dc.subjectGerman
dc.subjectSouth Tyrolean German
dc.subjectlegal language
dc.titleMT@BZ translation corpus v1.0
dc.typecorpus
local.brandingLexicography, Terminology, and Translation
local.contact.personElena Chiocchetti elena.chiocchetti@eurac.edu Eurac Research
local.contact.personCorpus Manager clarin@eurac.edu Eurac Research CLARIN Centre (ERCC)
local.files.count5
local.files.size37159305
local.has.filesyes
local.language.nameItalian
local.language.nameGerman
local.size.info52 texts
local.size.info130.000 tokens
local.sponsorownFunds / Institute for Applied Linguistics, Eurac Research Machine Translation at South Tyrolean Institutions
metashare.ResourceInfo#ContentInfo.mediaTypetext
 Files in this item
Loading files... This may take a few seconds as file previews are being generated. If the process takes too long, please contact the system administrator