Previous work analyzing social networks has mainly focused on binary friendship relations. However, in online social networks the low cost of link formation can lead to networks with heterogeneous relationship strengths (e.g., acquaintances and best friends mixed together). In this case, the binary friendship indicator provides only a coarse representation of relationship information. In this work, we develop an unsupervised model to estimate relationship strength from interaction activity (e.g., communication, tagging) and user similarity. More specifically, we formulate a link-based latent variable model, along with a coordinate ascent optimization procedure for the inference. We evaluate our approach on real-world data from Facebook, showing that the estimated link weights result in higher autocorrelation and lead to improved classification accuracy.
In the past few years, as the number of dialogue systems has increased, there has been an increasing interest in the use of natural language generation in spoken dialogue. Our research assumes that trainable natural language generation is needed to support more flexible and customized dialogues with human users. This paper focuses on methods for automatically training the sentence planning module of a spoken language generator. Sentence planning is a set of inter-related but distinct tasks, one of which is sentence scoping, i.e., the choice of syntactic structure for elementary speech acts and the decision of how to combine them into one or more sentences. The paper first presents SPOT, a trainable sentence planner, and a new methodology for automatically training SPOT on the basis of feedback provided by human judges. Our methodology is unique in neither depending on hand-crafted rules nor on the existence of a domain-specific corpus. SPOT first randomly generates a candidate set of sentence plans and then selects one. We show that SPOT learns to select a sentence plan whose rating on average is only 5% worse than the top human-ranked sentence plan. We then experimentally evaluate SPOT by asking human judges to compare SPOT's output with a hand-crafted template-based generation component, two rule-based sentence planners, and two baseline sentence planners. We show that SPoT performs better than the rule-based systems and the baselines, and as well as the hand-crafted system.
This paper reports a controlled study on a large number of filter feature selection methods for text classification. Over 100 variants of five major feature selection criteria were examined using four well-known classification algorithms: a Naive Bayesian (NB) approach, a Rocchio-style classifier, a k-nearest neighbor (kNN) method and a Support Vector Machine (SVM) system. Two benchmark collections were chosen as the testbeds: Reuters-21578 and small portion of Reuters Corpus Version 1 (RCV1), making the new results comparable to published results. We found that feature selection methods based on χ 2 statistics consistently outperformed those based on other criteria (including information gain) for all four classifiers and both data collections, and that a further increase in performance was obtained by combining uncorrelated and high-performing feature selection methods.The results we obtained using only 3% of the available features are among the best reported, including results obtained with the full feature set.
Techniques for automatically training modules of a natural language generator have recently been proposed, but a fundamental concern is whether the quality of utterances produced with trainable components can compete with hand-crafted template-based or rulebased approaches. In this paper We experimentally evaluate a trainable sentence planner for a spoken dialogue system by eliciting subjective human judgments. In order to perform an exhaustive comparison, we also evaluate a hand-crafted template-based generation component, two rule-based sentence planners, and two baseline sentence planners. We show that the trainable sentence planner performs better than the rule-based systems and the baselines, and as well as the handcrafted system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.