Integrable open-boundary conditions for the supersymmetric t-J model the quantum-group-invariant case

Since the Web by far represents the largest public repository of natural language texts, recent experiments, methods, and tools in the area of corpus linguistics often use the Web as a corpus. For applications where high accuracy is crucial, the problem has to be faced that a non-negligible number of orthographic and grammatical errors occur in Web documents. In this article we investigate the distribution of orthographic errors of various types in Web pages. As a by-product, methods are developed for efficiently detecting erroneous pages and for marking orthographic errors in acceptable Web documents, reducing thus the number of errors in corpora and linguistic knowledge bases automatically retrieved from the Web.

show abstract

Towards information retrieval on historical document collections: the role of matching procedures and special lexica

Gotscharek¹,

Reffle²,

Ringlstetter³

et al. 2010

IJDAR

View full text Add to dashboard Cite

PoCoTo - an open source system for efficient interactive postcorrection of OCRed historical texts

Vobl¹,

Gotscharek²,

Reffle³

et al. 2014

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Christoph Ringlstetter

Unsupervised profiling of OCRed historical documents

Lexical postcorrection of OCR-results:the web as a dynamic secondary dictionary?

Orthographic Errors in Web Pages: Toward Cleaner Web Corpora

Towards information retrieval on historical document collections: the role of matching procedures and special lexica

PoCoTo - an open source system for efficient interactive postcorrection of OCRed historical texts

Contact Info

Product

Resources

About