Abstract. Within specific domains, users generally face the challenge to populate an ontology according to their needs. Especially in case of novelty detection and forecast, the user wants to integrate novel information contained in natural text documents into his/her own ontology in order to utilise the knowledge base in a further step. In this paper, a semantic document ranking approach is proposed which serves as a prerequisite for ontology population. By using the underlying ontology for both query generation and document ranking, query and ranking are structured and, therefore, promise to provide a better ranking in terms of relevance and novelty than without using semantics.Keywords: Document ranking, Ontology-based information extraction, Novelty detection, Semantic similarity.
MotivationThe existence and steady growth of the Web has granted us vast amounts of web documents in which contained information can be discovered and utilised for certain information needs. Some of the existing information extraction (IE) techniques make use of background information provided by Semantic Web ontologies. In the past, various ontology-based information extraction (OBIE) systems have been proposed, where ontologies are used within the IE process. Although there exist quite a lot of notable ontologies, in many application areas appropriate ontologies are, due to domain-specificity, too small and, hence, need to be populated in terms of adding instances and properties. For ontology population, it is a crucial task to find new textual information which is relevant to the domain expert, but has not been stored in the knowledge base (KB) and, therefore, has been made usable. In this work, we focus on the worthwhile interplay between an existing KB and a text document corpus, which -in case of the use case of trend detection -is created on demand.Within the area of ontology population, we propose a novel approach for document ranking in the context of structural search for "novel" items in text documents. We claim that semantics can be used to rank documents according to their expected novel items contained therein.