Entity Extraction is a process of identifying meaningful entities from text documents. In enterprises, extracting entities improves enterprise efficiency by facilitating numerous applications, including search, recommendation, etc. However, the problem is particularly challenging on enterprise domains due to several reasons. First, the lack of redundancy of enterprise entities makes previous web-based systems like NELL and OpenIE not effective, since using only high-precision/low-recall patterns like those systems would miss the majority of sparse enterprise entities, while using more low-precision patterns in sparse setting also introduces noise drastically. Second, semantic drift is common in enterprises ("Blue" refers to "Windows Blue"), such that public signals from the web cannot be directly applied on entities. Moreover, many internal entities never appear on the web. Sparse internal signals are the only source for discovering them. To address these challenges, we propose an end-to-end framework for extracting entities in enterprises, taking the input of enterprise corpus and limited seeds to generate a high-quality entity collection as output. We introduce the novel concept of Semantic Pattern Graph to leverage public signals to understand the underlying semantics of lexical patterns, reinforce pattern evaluation using mined semantics, and yield more accurate and complete entities. Experiments on Microsoft enterprise data show the effectiveness of our approach.
The goal of this study is to develop a taxonomy of earthquake response and recovery using online information resources for organizing and sharing earthquake-related online information resources. A constructivist/interpretivist research paradigm was used in the study. A combination of top-down and bottom-up approaches was used to build the taxonomy. Facet analysis of disaster management, the timeframe of disaster management, and modular design were performed when designing the taxonomy. Two case studies were done to demonstrate the usefulness of the taxonomy for organizing and sharing information. The facet-based taxonomy can be used to organize online information for browsing and navigation. It can also be used to index and tag online information resources to support searching. It creates a common language for earthquake management stakeholders to share knowledge. The top three level categories of the taxonomy can be applied to the management of other types of disasters. The taxonomy has implications for earthquake online information management, knowledge management and disaster management. The approach can be used to build taxonomies for managing online information resources on other topics (including various types of time-sensitive disaster responses). We propose a common language for sharing information on disasters, which has great social relevance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.