Many historical newspapers are being digitized. We aim to support access to them via text analysis of the OCRd content. However, the OCR includes many errors; so extracting meaningful content from it is difficult. A pipeline of processing steps is proposed. Here, we describe the first two steps: segmentation and genre identification. The segmentation procedure based on headings was quite successful. Genre identification worked well for easily defined genre categories such as weather reports. We also propose additional techniques which may improve the accuracy still farther.
Ontologies, mainly lightweight ontologies, are ubiquitous throughout the Internet and are succeeding in replacing human expertise. We conducted a study with physicians and nurses performing a search task in the medical domain that demonstrates that lightweight ontologies perform well as a substitute for expertise. The extent of success of the substitution depends upon context of use. Our study investigates lightweight ontologies with respect to the context of use in which they are applied. The better we understand the context of use, the better we can inform ontology design and evaluation. We describe ontologies through characteristics and context through parameters. By varying ontology characteristics and testing the effect on the performance of an ontology-supported task for a context parameter, such as the level of user expertise, we increase our understanding of ontology design and evaluation. Our study shows that changing ontologies by varying some of its characteristics has a direct and significant impact on the performance of the ontology-supported task for different levels of user expertise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.