2013
DOI: 10.1007/978-3-642-40140-4_12
|View full text |Cite
|
Sign up to set email alerts
|

How to Extract Relevant Knowledge from Tweets?

Abstract: Abstract. Tweets exchanged over the Internet are an important source of information even if their characteristics make them difficult to analyze (e.g., a maximum of 140 characters; noisy data). In this paper, we investigate two different problems. The first one is related to the extraction of representative terms from a set of tweets. More precisely we address the following question: are traditional information retrieval measures appropriate when dealing with tweets?. The second problem is related to the evolu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 17 publications
(12 reference statements)
0
6
0
Order By: Relevance
“…It should be noted that social media texts pose new challenges that are not present in the processing of medical literature. These new problems are the management of metadata included in the text [ 18 ], the detection of misspellings, word shortenings [ 19 , 20 ], slang and emoticons and to cope with ungrammatical phrases, among others. Moreover, while many terms present in clinical records and medical literature could be linked to domain resources, lay terms are not usually recorded in any structured resource.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…It should be noted that social media texts pose new challenges that are not present in the processing of medical literature. These new problems are the management of metadata included in the text [ 18 ], the detection of misspellings, word shortenings [ 19 , 20 ], slang and emoticons and to cope with ungrammatical phrases, among others. Moreover, while many terms present in clinical records and medical literature could be linked to domain resources, lay terms are not usually recorded in any structured resource.…”
Section: Discussionmentioning
confidence: 99%
“…Another important advantage is that such texts can be easily linked to biomedical ontologies by matching detected entities to concepts in these semantic resources. Meanwhile social media texts are markedly different from medical literature, and thereby the processing of social media texts poses additional challenges such as the management of metadata associated to the text (such as tags in tweets) [ 18 ], the detection of typos and unconventional spelling, word shortenings [ 19 , 20 ], slang and emoticons [ 21 ] and lack of punctuation marks, among others. Moreover, these texts are often very short and with an informal nature, making the processing task extremely challenging.…”
Section: Related Workmentioning
confidence: 99%
“…Social media texts pose additional challenges to those associated with the processing of clinical records and medical literature. These new challenges include the management of metainformation included in the text (for example as tags in tweets) (Bouillot et al, 2013), the detection of typos and unconventional spelling, word short-Figure 2: An example of the output of the system using the database. enings (Neunerdt et al, 2013;Moreira et al, 2013) and slang and emoticons (Balahur, 2013), among others.…”
Section: Discussionmentioning
confidence: 99%
“…TF-IDF measure considers both term frequency and inverse document frequency to calculate the importance of features [25]. More recently, a generalized TF-IDF measure [26] is proposed by considering different level of hierarchies among words to effectively analyze tweet user behaviors.…”
Section: Feature Selectionmentioning
confidence: 99%