Recent research on entity linking (EL) has introduced a plethora of promising techniques, ranging from deep neural networks to joint inference. But despite numerous papers there is surprisingly little understanding of the state of the art in EL. We attack this confusion by analyzing differences between several versions of the EL problem and presenting a simple yet effective, modular, unsupervised system, called Vinculum, for entity linking. We conduct an extensive evaluation on nine data sets, comparing Vinculum with two state-of-the-art systems, and elucidate key aspects of the system that include mention extraction, candidate generation, entity type prediction, entity coreference, and coherence.
Since about 100 years ago, to learn the intrinsic structure of data, many representation learning approaches have been proposed, including both linear ones and nonlinear ones, supervised ones and unsupervised ones. Particularly, deep architectures are widely applied for representation learning in recent years, and have delivered top results in many tasks, such as image classification, object detection and speech recognition. In this paper, we review the development of data representation learning methods.Specifically, we investigate both traditional feature learning algorithms and state-ofthe-art deep learning models. The history of data representation learning is introduced, while available resources (e.g. online course, tutorial and book information) and toolboxes are provided. Finally, we conclude this paper with remarks and some interesting research directions on data representation learning.
Can crowdsourced annotation of training data boost performance for relation extraction over methods based solely on distant supervision? While crowdsourcing has been shown effective for many NLP tasks, previous researchers found only minimal improvement when applying the method to relation extraction. This paper demonstrates that a much larger boost is possible, e.g., raising F1 from 0.40 to 0.60. Furthermore, the gains are due to a simple, generalizable technique, Gated Instruction, which combines an interactive tutorial, feedback to correct errors during training, and improved screening.
Entity Recognition (ER) is a key component of relation extraction systems and many other natural-language processing applications. Unfortunately, most ER systems are restricted to produce labels from to a small set of entity classes, e.g., person, organization, location or miscellaneous. In order to intelligently understand text and extract a wide range of information, it is useful to more precisely determine the semantic classes of entities mentioned in unstructured text. This paper defines a fine-grained set of 112 tags, formulates the tagging problem as multi-class, multi-label classification, describes an unsupervised method for collecting training data, and presents the FIGER implementation. Experiments show that the system accurately predicts the tags for entities. Moreover, it provides useful information for a relation extraction system, increasing the F1 score by 93%. We make FIGER and its data available as a resource for future work.
The last decade has seen a rapid growth in both the volume and variety of network traffic, while at the same time, the need to analyze the traffic for quality of service, security, and misuse has become increasingly important. In this paper, we will present a traffic analysis system that couples visual analysis with a declarative knowledge representation based on first order logic. Our system supports multiple iterations of the sense-making loop of analytic reasoning, by allowing users to save their discoveries as they are found and to reuse them in future iterations. We will show how the knowledge base can be used to improve both the visual representations and the basic analytical tasks of filtering and changing level of detail. More fundamentally, the knowledge representation can be used to classify the traffic. We will present the results of applying the system to successfully classify 80% of network traffic from one day in our laboratory. INTRODUCTIONThe last decade has seen a rapid growth in both the volume and variety of network traffic, while at the same time it is becoming ever more important for analysts to understand network behaviors to provide quality of service, security, and misuse monitoring. To aid analysts in these tasks, researchers have proposed numerous visualization techniques that apply exploratory analysis to network traffic. The sense-making loop of information visualization is critical for analysis [5]. The loop involves a repeated sequence of hypothesis, experiment, and discovery. However, current visual analysis systems for network traffic do not support sense-making well because they provide no means for analysts to save their discoveries and build upon them. As such, it becomes the analyst's burden to remember and reason about the multitude of patterns observed during visual analysis, which quickly becomes impossible in massive datasets typical of network traffic.In this paper we present a network traffic visualization system that enables previous visual discoveries to be used in future analysis. The system accomplishes this by allowing the analyst to interactively create logical models of the visual discoveries. The logical models are stored in a knowledge representation and can be reused. The reuse of knowledge creates an analytical cycle as summarized in figure 1. In addition to facilitating the sensemaking loop, knowledge representations allow the creation of more insightful visualizations that the analyst can use to discover more complex and subtle patterns.To evaluate effectiveness, we will present the results of applying our system to analyze one day of network traffic from our laboratory. This paper will be structured as follows: section 2 will provide an overview of the visual analysis process; section 3 will give a sampling of related work in this area; section 4 will describe the system's knowledge representation; section 5 will overview the visual knowledge creation; section 6 will demonstrate how the system leverages the knowledge base to improve visual analysis...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.