Distributional term representations

Lavelli, Alberto; Sebastiani, Fabrizio; Zanoli, Roberto

doi:10.1145/1031171.1031284

Cited by 36 publications

(34 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…LIA experimented with TF-IDF combined with Gini purity criteria; they also used a set of tokens extracted from the authors and entities metadata in the Twitter website as feature vector. UAMCLYR investigated the role of Distributional Term Representation [14] to represent terms by means of contextual information given by the term co-occurrence statistics. They used SVM classifier, and their best result was achieved with bag-of-word representation and Boolean weighting.…”

Section: Discussionmentioning

confidence: 99%

Tweet Expansion Method for Filtering Task in Twitter

Karisani

Oroumchian

Rahgozar

2015

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. In this article we propose a supervised method for expanding tweet contents to improve the recall of tweet filtering task in online reputation management systems. Our method does not use any external resources. It consists of creating a K-NN classifier in three steps. In these steps the tweets labeled related and unrelated in the training set are expanded by extracting and adding the most discriminative terms, calculating and adding the most frequent terms, and re-weighting the original tweet terms from training set. Our experiments in RepLab 2013 data set show that our method improves the performance of filtering task, in terms of F criterion, up to 13% over state-of-the-art classifiers such as SVM. This data set consists of 61 entities from different domains of automotive, banking, universities, and music. IntroductionTwitter is one of the widely used social networks in the world. According to reports 1 as of February 2015, Twitter had 288 million users. This large number of users, has made this website to be one of the most studied social networks in computer science [1][2][3]. On Twitter website users can post their messages in less than 140 characters; then their followers can read and re-tweet these messages. The huge source of information is spread in Twitter and other social networks every day; this has caused the emergence of Online Reputation Management systems (ORM.) ORM is about monitoring the Internet users' opinions regarding organizations, products, or celebrities [4]. The main tasks of ORM systems are retrieving the messages posted by users, analyzing the messages, and visualizing the results [3]. An important step in ORM is detecting the messages that are related to a specific entity; in other words, classifying messages based on their context. This step is known as the filtering task. If this step is carried out properly, it will result in reduction of noise and one could expect a higher quality of results. This task is quite challenging due to the ambiguity in the name of entities and the short length of messages. For 1 http://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/ 56 P. Karisani et al.instance, if an ORM system wants to analyze users' impression of BMW Company, it must be able to recognize the tweets that contain this name (or other related names.) However, this is not an easy task because users may also abbreviate other phrases to BMW. For example, 90s TV series "Boy Meet World" is also abbreviated to BMW in tweets due to the constraints on the message length. Therefore, more sophisticated methods than simple keyword matching are required to carry out this step correctly.The short length of messages is the main challenge of applying regular classification and disambiguation techniques for tweet filtering [3]. In this research, we propose a supervised method to address this problem through tweet expansion. We expand the content of each tweet with more related words in order to increase the accuracy of matching tweets with keywords. Although we onl...

show abstract

Section: Discussionmentioning

confidence: 99%

Tweet Expansion Method for Filtering Task in Twitter

Karisani

Oroumchian

Rahgozar

2015

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Distributional term representations (DTRs) are tools for term representation that rely on term occurrence and co-occurrence statistics (Lavelli et al 2005). The intuition behind DTRs is that the meaning of a term can be deduced by its context; where the context for a term is determined by the other terms it co-occurs with frequently or by the documents in which the term occurs more frequently.…”

Section: Distributional Term Representationsmentioning

confidence: 99%

“…Little work has been reported on DTRs for (unimodal) information retrieval (Carrillo et al 2009;Lavelli et al 2005). In the latter field, DTRs have been only used for processing unimodal information (e.g., text) where the representation of a term is determined by its context in unimodal information.…”

Section: Distributional Term Representationsmentioning

confidence: 99%

Multimodal indexing based on semantic cohesion for image retrieval

Montes

Sucar

2011

Inf Retrieval

View full text Add to dashboard Cite

This paper introduces two novel strategies for representing multimodal images with application to multimedia image retrieval. We consider images that are composed of both text and labels: while text describes the image content at a very high semantic level (e.g., making reference to places, dates or events), labels provide a mid-level description of the image (i.e., in terms of the objects that can be seen in the image). Accordingly, the main assumption of this work is that by combining information from text and labels we can develop very effective retrieval methods. We study standard information fusion techniques for combining both sources of information. However, whereas the performance of such techniques is highly competitive, they cannot capture effectively the content of images. Therefore, we propose two novel representations for multimodal images that attempt to exploit the semantic cohesion among terms from different modalities. Such representations are based on distributional term representations widely used in computational linguistics. Under the considered representations the content of an image is modeled by a distribution of co-occurrences over terms or of occurrences over other images, in such a way that the representation can be considered an expansion of the multimodal terms in the image. We report experimental results using the SAIAPR TC12 benchmark on two sets of topics used in ImageCLEF competitions with manually and automatically generated labels. Experimental results show that the proposed representations outperform significantly both, standard multimodal techniques and unimodal methods. Results on manually assigned labels provide an upper bound in the retrieval performance that can be obtained, whereas results with automatically generated labels are encouraging. The novel representations are able to capture more effectively the content of multimodal images. We emphasize that although we have applied our representations to multimedia image retrieval the same formulation can be adopted for modeling other multimodal documents (e.g., videos).

show abstract

“…The context vectors used in BoC are generated using RI and 'Document Occurrence Representation' (DOR). DOR is based on the work of Lavelli et al [13] and considers the meaning of a term as the bag of documents in which it occurs. When RI is used together with DOR, the term t is represented as a context vector:…”

Section: Random Indexingmentioning

confidence: 99%

Concept Based Representations for Ranking in Geographic Information Retrieval

Carrillo

Villatoro-Tello

Eliasmith

et al. 2010

Advances in Natural Language Processing

View full text Add to dashboard Cite

Abstract. Geographic Information Retrieval (GIR) is a specialized Information Retrieval (IR) branch that deals with information related to geographical locations. Traditional IR engines are perfectly able to retrieve the majority of the relevant documents for most geographical queries, but they have severe difficulties generating a pertinent ranking of the retrieved results, which leads to poor performance. A key reason for this ranking problem has been a lack of information. Therefore, previous GIR research has tried to fill this gap using robust geographical resources (i.e. a geographical ontology), while other research with the same aim has used relevant feedback techniques instead. This paper explores the use of Bag of Concepts (BoC; a representation where documents are considered as the union of the meanings of its terms) and Holographic Reduced Representation (HRR; a novel representation for textual structure) as re-ranking mechanisms for GIR. Our results reveal an improvement in mean average precision (MAP) when compared to the traditional vector space model, even if Pseudo Relevance Feedback is employed.

show abstract

Distributional term representations

Cited by 36 publications

References 37 publications

Tweet Expansion Method for Filtering Task in Twitter

Tweet Expansion Method for Filtering Task in Twitter

Multimodal indexing based on semantic cohesion for image retrieval

Concept Based Representations for Ranking in Geographic Information Retrieval

Contact Info

Product

Resources

About