Abstract. In this paper we describe a methodology for harvesting information from large distributed repositories (e.g. large Web sites) with minimum user intervention. The methodology is based on a combination of information extraction, information integration and machine learning techniques. Learning is seeded by extracting information from structured sources (e.g. databases and digital libraries) or a user-defined lexicon. Retrieved information is then used to partially annotate documents. Annotated documents are used to bootstrap learning for simple Information Extraction (IE) methodologies, which in turn will produce more annotation to annotate more documents that will be used to train more complex IE engines and so on. In this paper we describe the methodology and its implementation in the Armadillo system, compare it with the current state of the art, and describe the details of an implemented application. Finally we draw some conclusions and highlight some challenges and future work.
Abstract. The process of document annotation for the Semantic Web is complex and time consuming, as it requires a great deal of manual annotation. Information extraction from texts (IE) is a technology used by some very recent systems for reducing the burden of annotation. The integration of IE systems in annotation tools is quite a new development and there is still the necessity of thinking the impact of the IE system on the whole annotation process. In this paper we initially discuss a number of requirements for the use of IE as support for annotation. Then we present and discuss a model of interaction that addresses such issues and Melita, an annotation framework that implements a methodology for active annotation for the Semantic Web based on IE. Finally we present an experiment that quantifies the gain in using IE as support to human annotators.
Abstract-This paper is intended as a follow up to a previous study of ours -Financial Time Series Forecasting -A Machine Learning Approach. The aforementioned study evaluates traditional machine learning techniques for the task of financial time series forecasting. In this paper, we attempt to make use of the same base dataset, with the difference of making use of a novel branch of machine learning techniques known as Deep Learning. These techniques have been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. These deep architectures are known to excel in tasks such as image and text recognition, but have not been exploited as much in the field of finance. In particular, for this study we will be making use of Convolutional Neural Networks (CNNs) to forecast the next period price direction with respect to the current price. We achieve an accuracy of 65% when forecasting the next month price direction and 60% for the next week price direction forecast. Whilst these results are anything but random, we are not able to match or surpass results obtained by industry leading techniques such as Logistic Regression and Support Vector Machines.
Abstract-A top priority in any business is a constant need to increase revenue and profitability. One of the causes for a decrease in profits is when current customers stop transacting. When a customer leaves or churns from a business, the opportunity for potential sales or cross selling is lost. If a customer leaves the business without any form of advice, the company may find it hard to respond and take corrective action.
This paper describes an initial prototype of the Companions project (www.companions-project.org): the Senior Companion (SC), designed to be a platform to display novel approaches to:(1) The use of Information Extraction (IE) techniques to extract the content of incoming dialogue utterances after an ASR phase.(2) The conversion of the input to RDF form to allow the generation of new facts from existing ones, under the control of a Dialogue Manager (DM), that also has access to stored knowledge and knowledge accessed in real time from the web, all in RDF form. (3) A DM expressed as a stack and network virtual machine that models mixed initiative in dialogue control. (4) A tuned dialogue act detector based on corpus evidence.The prototype platform was evaluated, and we describe this; it is also designed to support more extensive forms of emotion detection carried by both speech and lexical content, as well as extended forms of machine learning. We describe preliminary studies and results for these, in particular a novel approach to enabling reinforcement learning for open dialogue systems through the detection of emotion in the speech signal and its deployment as a form of a learned DM, at a higher level than the DM virtual machine and able to direct the SC's responses to a more emotionally appropriate part of its repertoire.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.