Please use the following text to cite this item or export to a predefined format:
Stemle, Egon W. and Steger, Johannes M., 2010, KrdWrd CANOLA Corpus 1.1, CLARIN DSpace, http://hdl.handle.net/20.500.12124/9
dc.contributor.authorStemle, Egon W.
dc.contributor.authorSteger, Johannes M.
dc.date.accessioned2019-08-14T16:05:22Z
dc.date.available2019-08-14T16:05:22Z
dc.date.issued2010-11-25
dc.descriptionThe CANOLA Corpus is a visually annotated English web corpus for training classification engines to remove boiler plate on unseen Web pages. It was harvested, annotated and evaluated by the tools and infrastructure of the KrdWrd Project.
dc.identifier.urihttp://hdl.handle.net/20.500.12124/9
dc.language.isoeng
dc.publisherInstitute for Applied Linguistics, Eurac Research
dc.relation.isbasedonhttps://github.com/krdwrd/data/releases/tag/v1.1
dc.relation.isbasedonhttps://github.com/krdwrd/doc_CANOLA/releases/tag/v1.1
dc.relation.isreferencedbyhttps://www.sigwac.org.uk/raw-attachment/wiki/WAC5/WAC5_proceedings.pdf
dc.relation.replaceshttp://hdl.handle.net/20.500.12124/8
dc.rightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.labelPUB
dc.rights.urihttps://creativecommons.org/licenses/by-sa/4.0/
dc.source.urihttps://krdwrd.github.io
dc.subjectboiler plate removal
dc.subjectweb page cleaning
dc.subjectWaC
dc.subjectWeb as Corpus
dc.subjecttraining data
dc.subjectmanual annotation
dc.titleKrdWrd CANOLA Corpus 1.1
dc.typecorpus
local.brandingCMC & WaC
local.contact.personCorpus Manager clarin@eurac.edu Eurac Research CLARIN Centre (ERCC)
local.files.count2
local.files.size11356195
local.has.filesyes
local.hasCMDIfalse
local.hiddenfalse
local.language.nameEnglish
local.size.info216 files
metashare.ResourceInfo#ContentInfo.mediaTypetext

Version History

Showing 1 - 2 out of 2 results
VersionDateSummary
2*
2010-11-25 00:00:00
2010-09-10 00:00:00
* Selected version
 Files in this item
Loading files... This may take a few seconds as file previews are being generated. If the process takes too long, please contact the system administrator