Please use the following text to cite this item or export to a predefined format:
Stemle, Egon W. and Steger, Johannes M., 2010, KrdWrd CANOLA Corpus 1.1, CLARIN DSpace, http://hdl.handle.net/20.500.12124/9
dc.contributor.author | Stemle, Egon W. |
dc.contributor.author | Steger, Johannes M. |
dc.date.accessioned | 2019-08-14T16:05:22Z |
dc.date.available | 2019-08-14T16:05:22Z |
dc.date.issued | 2010-11-25 |
dc.description | The CANOLA Corpus is a visually annotated English web corpus for training classification engines to remove boiler plate on unseen Web pages. It was harvested, annotated and evaluated by the tools and infrastructure of the KrdWrd Project. |
dc.identifier.uri | http://hdl.handle.net/20.500.12124/9 |
dc.language.iso | eng |
dc.publisher | Institute for Applied Linguistics, Eurac Research |
dc.relation.isbasedon | https://github.com/krdwrd/data/releases/tag/v1.1 |
dc.relation.isbasedon | https://github.com/krdwrd/doc_CANOLA/releases/tag/v1.1 |
dc.relation.isreferencedby | https://www.sigwac.org.uk/raw-attachment/wiki/WAC5/WAC5_proceedings.pdf |
dc.relation.replaces | http://hdl.handle.net/20.500.12124/8 |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.label | PUB |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.source.uri | https://krdwrd.github.io |
dc.subject | boiler plate removal |
dc.subject | web page cleaning |
dc.subject | WaC |
dc.subject | Web as Corpus |
dc.subject | training data |
dc.subject | manual annotation |
dc.title | KrdWrd CANOLA Corpus 1.1 |
dc.type | corpus |
local.branding | CMC & WaC |
local.contact.person | Corpus Manager clarin@eurac.edu Eurac Research CLARIN Centre (ERCC) |
local.files.count | 2 |
local.files.size | 11356195 |
local.has.files | yes |
local.hasCMDI | false |
local.hidden | false |
local.language.name | English |
local.size.info | 216 files |
metashare.ResourceInfo#ContentInfo.mediaType | text |
Collections
This item isPublicly Available
and licensed under:
Files in this item
Loading files... This may take a few seconds as file previews are being generated. If the process takes too long, please contact the system administrator test@test.sk