ABSTRACT. A keyword query is the representation of the information need of a user, and is the result of a complex cognitive process which often results in under-specification. We propose an unsupervised method namely Latent Concept Modeling (LCM) for mining and modeling latent search concepts in order to recreate the conceptual view of the original information need. We use Latent Dirichlet Allocation (LDA) to exhibit highly-specific query-related topics from pseudo-relevant feedback documents. We define these topics as the latent concepts of the user query. We perform a thorough evaluation of our approach over two large ad-hoc TREC collections. Our findings reveal that the proposed method accurately models latent concepts, while being very effective in a query expansion retrieval setting.RÉSUMÉ. Une requête est la représentation du besoin d'information d'un utilisateur, et est le résultat d'un processus cognitif complexe qui mène souvent à un mauvais choix de mots-clés. Nous proposons une méthode non supervisée pour la modélisation de concepts implicites d'une requête, dans le but de recréer la représentation conceptuelle du besoin d'information initial. Nous utilisons l'allocation de Dirichlet latente (LDA) pour détecter les concepts implicites de la requête en utilisant des documents pseudo-pertinents. Nous évaluons cette méthode en profondeur en utilisant deux collections de test de TREC. Nous trouvons notamment que notre approche permet de modéliser précisément les concepts implicites de la requête, tout en obtenant de bonnes performances dans le cadre d'une recherche de documents.
This paper describes our sentiment analysis systems which have been built for SemEval-2015 Task 10 Subtask B and E. For subtask B, a Logistic Regression classifier has been trained after extracting several groups of features including lexical, syntactic, lexiconbased, Z score and semantic features. A weighting schema has been adapted for positive and negative labels in order to take into account the unbalanced distribution of tweets between the positive and negative classes. This system is ranked third over 40 participants, it achieves average F1 64.27 on Twitter data set 2015 just 0.57% less than the first system. We also present our participation in Subtask E in which our system has got the second rank with Kendall metric but the first one with Spearman for ranking twitter terms according to their association with the positive sentiment.
This paper describes our contribution in Opinion Target Extraction OTE and Sentiment Polarity sub tasks of SemEval 2015 ABSA task. A CRF model with IOB notation has been adopted for OTE with several groups of features including syntactic, lexical, semantic, sentiment lexicon features. Our submission for OTE is ranked fifth over twenty submissions. A Logistic Regression model with a weighting schema of positive and negative labels have been used for sentiment polarity; several groups of features (lexical, syntactic, semantic, lexicon and Z score) are extracted. Our submission for Sentiment Polarity is ranked third over ten submissions on the restaurant data set, third over thirteen on the laptops data set, but the first over eleven on the hotel data set that is out-of-domain set.
The variety and diversity of published content are currently expanding in all fields of scholarly communication. Yet, scientific knowledge graphs (SKG) provide only poor images of the varied directions of alternative scientific choices, and in particular scientific controversies, which are not currently identified and interpreted. We propose to use the rich variety of knowledge present in search histories to represent cliques modeling the main interpretable practices of information retrieval issued from the same “cognitive community”, identified by their use of keywords and by the search experience of the users sharing the same research question. Modeling typical cliques belonging to the same cognitive community is achieved through a new conceptual framework, based on user profiles, namely a bipartite geometric scientific knowledge graph, SKG GRAPHYP. Further studies of interpretation will test differences of documentary profiles and their meaning in various possible contexts which studies on “disagreements in scientific literature” have outlined. This final adjusted version of GRAPHYP optimizes the modeling of “Manifold Subnetworks of Cliques in Cognitive Communities” (MSCCC), captured from previous user experience in the same search domain. Cliques are built from graph grids of three parameters outlining the manifold of search experiences: mass of users; intensity of uses of items; and attention, identified as a ratio of “feature augmentation” by literature on information retrieval, its mean value allows calculation of an observed “steady” value of the user/item ratio or, conversely, a documentary behavior “deviating” from this mean value. An illustration of our approach is supplied in a positive first test, which stimulates further work on modeling subnetworks of users in search experience, that could help identify the varied alternative documentary sources of information retrieval, and in particular the scientific controversies and scholarly disputes.
In this paper, we present our contribution in SemEval2016 task7 1 : Determining Sentiment Intensity of English and Arabic Phrases, where we use web search engines for English and Arabic unsupervised sentiment intensity prediction. Our work is based, first, on a group of classic sentiment lexicons (e.g. Sen-timent140 Lexicon, SentiWordNet). Second, on web search engines' ability to find the cooccurrence of sentences with predefined negative and positive words. The use of web search engines (e.g. Google Search API) enhance the results on phrases built from opposite polarity terms.
Sentiment lexicon-based features have proved their performance in recent work concerning sentiment analysis in Twitter. Automatic constructed lexicon features seem to be enough influential to attract the attention. In this paper, we propose a new metric to estimate the word polarity score, called natural entropy (ne), in order to construct a new sentiment lexicon based on Sentiment140 corpus. We derive six features from the new lexicon and show that (ne) metric outperforms the PMI metric which has been used for the same purpose. For evaluation, we build a state-of-the-art system for sentiment analysis in short text using a supervised classifier trained on several groups of features including n-gram, sentiment lexicons, negation, Z score and semantic features. This system has been one of the best systems in both tasks of SemEval-2015: Sentiment Analysis in Twitter and Aspect-Based Sentiment Analysis. We investigate the impact of the lexicon-based features extracted from existing manual and automatic constructed lexicons on the system performance and also the impact of the proposed metric (ne).
The INEX QA track aimed to evaluate complex questionanswering tasks where answers are short texts generated from the Wikipedia by extraction of relevant short passages and aggregation into a coherent summary. In such a task, Question-answering, XML/passage retrieval and automatic summarization are combined in order to get closer to real information needs. Based on the groundwork carried out in 2009-2010 edition to determine the sub-tasks and a novel evaluation methodology, the 2011 edition experimented contextualizing tweets using a recent cleaned dump of the Wikipedia. Participants had to contextualize 132 tweets from the New York Times (NYT). Informativeness of answers has been evaluated, as well as their readability. 13 teams from 6 countries actively participated to this track. This tweet contextualization task will continue in 2012 as part of the CLEF INEX lab with same methodology and baseline but on a much wider range of tweet types.
Abstract. INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2014 evaluation campaign, which consisted of three tracks: The Interactive Social Book Search Track investigated user information seeking behavior when interacting with various sources of information, for realistic task scenarios, and how the user interface impacts search and the search experience. The Social Book Search Track investigated the relative value of authoritative metadata and usergenerated content for search and recommendation using a test collection with data from Amazon and LibraryThing, and user profiles and personal catalogues. The Tweet Contextualization Track investigated tweet contextualization, helping a user to understand a tweet by providing him with a short background summary generated from relevant Wikipedia passages aggregated into a coherent summary. INEX 2014 was an exciting year for INEX in which we for the third time ran our workshop as part of the CLEF labs in order to facilitate knowledge transfer between the evaluation forums. This paper gives an overview of all the INEX 2014 tracks, their aims and task, the built test-collections, the participants, and gives an initial analysis of the results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.