Please use the following text to cite this item or export to a predefined format:
De Camillis, Flavia; Chiocchetti, Elena and Stemle, Egon W., 2023, MT@BZ translation corpus v1.0, CLARIN DSpace, http://hdl.handle.net/20.500.12124/60
dc.contributor.author | De Camillis, Flavia |
dc.contributor.author | Chiocchetti, Elena |
dc.contributor.author | Stemle, Egon W. |
dc.date.accessioned | 2023-06-18T18:33:02Z |
dc.date.available | 2023-06-18T18:33:02Z |
dc.date.issued | 2023-06-13 |
dc.description | The MT@BZ is a translation corpus that consists of 52 decrees published by the Autonomous Province of Bolzano (South Tyrol) aligned with their machine translated versions. More precisely, it consists of 26 decrees in German and the same 26 in Italian in their official versions, respectively machine translated by the project team into Italian and into German. 10 of them are COVID-19 related decress, while 16 are miscellaneous. Overall, they consist of around 130,000 words. Their machine translation was carried out with a customized version of ModernMT. Later, the corpus was uploaded first into the annotation platform Webanno, then transferred to Inception. Four annotators annotated the translation errors made by the machine according to an ad hoc error taxonomy for quality assessment. Finally, the annotations were curated to create a gold standard corpus. |
dc.identifier.uri | http://hdl.handle.net/20.500.12124/60 |
dc.language.iso | ita |
dc.language.iso | deu |
dc.publisher | Institute for Applied Linguistics, Eurac Research |
dc.relation.isbasedon | https://gitlab.inf.unibz.it/commul/mt-bz/data/bundle/-/tags/v1.0 |
dc.relation.isreferencedby | https://events.tuni.fi/uploads/2023/06/11678752-proceedings-eamt2023.pdf |
dc.rights | Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) |
dc.rights.label | PUB |
dc.rights.uri | https://creativecommons.org/licenses/by-nc/4.0/ |
dc.source.uri | https://www.eurac.edu/it/institutes-centers/istituto-di-linguistica-applicata/projects/mtbz |
dc.subject | machine translation |
dc.subject | annotation |
dc.subject | translation errors |
dc.subject | accuracy |
dc.subject | fluency |
dc.subject | Italian |
dc.subject | German |
dc.subject | South Tyrolean German |
dc.subject | legal language |
dc.title | MT@BZ translation corpus v1.0 |
dc.type | corpus |
local.branding | Lexicography, Terminology, and Translation |
local.contact.person | Elena Chiocchetti elena.chiocchetti@eurac.edu Eurac Research |
local.contact.person | Corpus Manager clarin@eurac.edu Eurac Research CLARIN Centre (ERCC) |
local.files.count | 5 |
local.files.size | 37159305 |
local.has.files | yes |
local.language.name | Italian |
local.language.name | German |
local.size.info | 52 texts |
local.size.info | 130.000 tokens |
local.sponsor | ownFunds / Institute for Applied Linguistics, Eurac Research Machine Translation at South Tyrolean Institutions |
metashare.ResourceInfo#ContentInfo.mediaType | text |
This item isPublicly Available
and licensed under:
Files in this item
Loading files... This may take a few seconds as file previews are being generated. If the process takes too long, please contact the system administrator test@test.sk