HLP$@$UPenn at SemEval-2017 Task 4A: A simple, self-optimizing text
            classification system combining dense and sparse vectors

Sarker, Abeed; González, Graciela

doi:10.18653/v1/s17-2105

Cited by 11 publications

(6 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For all three tasks, extracted concepts could be matched exactly to the forum posts, thus negating the potential benefit of normalization. The exact matching can perhaps be explained by the fact that data collection and extraction from noisy text sources such as social media typically rely on keyword-based searching [54].…”

Section: Discussionmentioning

confidence: 99%

Data-Driven Lexical Normalization for Medical Social Media

Dirkson

Verberne

Sarker

et al. 2019

MTI

View full text Add to dashboard Cite

In the medical domain, user-generated social media text is increasingly used as a valuablecomplementary knowledge source to scientific medical literature. The extraction of this knowledge iscomplicated by colloquial language use and misspellings. However, lexical normalization of suchdata has not been addressed effectively. This paper presents a data-driven lexical normalizationpipeline with a novel spelling correction module for medical social media. Our method significantlyoutperforms state-of-the-art spelling correction methods and can detect mistakes with an F1 of 0.63despite extreme imbalance in the data. We also present the first corpus for spelling mistake detectionand correction in a medical patient forum.

show abstract

Section: Discussionmentioning

confidence: 99%

Data-Driven Lexical Normalization for Medical Social Media

Dirkson

Verberne

Sarker

et al. 2019

MTI

View full text Add to dashboard Cite

show abstract

“…unigrams, bigrams or trigram, respectively. Sarker et al [15] and Pal and Gosh [11] used n-gram features for developing sentiment analysis methods and evaluated their methods against the same datasets that we use in this work. Here, we explore the fol-Table 2.…”

Section: Methodsmentioning

confidence: 99%

Evaluating the Accuracy and Efficiency of Sentiment Analysis Pipelines with UIMA

Altrabsheh

Kontonatsios

Korkontzelos

2019

Natural Language Processing and Information Systems

View full text Add to dashboard Cite

Sentiment analysis methods co-ordinate text mining components, such as sentence splitters, tokenisers and classifiers, into pipelined applications to automatically analyse the emotions or sentiment expressed in textual content. However, the performance of sentiment analysis pipelines is known to be substantially affected by the constituent components. In this paper, we leverage the Unstructured Information Management Architecture (UIMA) to seamlessly co-ordinate components into sentiment analysis pipelines. We then evaluate a wide range of different combinations of text mining components to identify optimal settings. More specifically, we evaluate different pre-processing components, e.g. tokenisers and stemmers, feature weighting schemes, e.g. TF and TFIDF, feature types, e.g. bigrams, trigrams and bigrams+trigrams, and classification algorithms, e.g. Support Vector Machines, Random Forest and Naive Bayes, against 6 publicly available datasets. The results demonstrate that optimal configurations are consistent across the 6 datasets while our UIMA-based pipeline yields a robust performance when compared to baseline methods.

show abstract

Section: F1mentioning

confidence: 99%

Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

2019

View full text Add to dashboard Cite

The organizing committee would like to thank the program committee, consisting of 13 researchers, for their thoughtful input on the submissions, as well as the organizers of the ACL for their support and management. Finally, a huge thanks to all authors who submitted a paper to the workshop or participated in the shared tasks; this workshop would not have been possible without them and their hard work.

show abstract

HLP$@$UPenn at SemEval-2017 Task 4A: A simple, self-optimizing text classification system combining dense and sparse vectors

Cited by 11 publications

References 14 publications

Data-Driven Lexical Normalization for Medical Social Media

Data-Driven Lexical Normalization for Medical Social Media

Evaluating the Accuracy and Efficiency of Sentiment Analysis Pipelines with UIMA

Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

Contact Info

Product

Resources

About