Sabino Miranda-Jiménez scite author profile

Sabino Miranda-Jiménez

4Publications

132Citation Statements Received

54Citation Statements Given

How they've been cited

215

132

How they cite others

Affiliations

Consejo Nacional de Humanidades, Ciencias y Tecnologías, CTIC Foundation, Consejo Nacional de Ciencia y Tecnología

Publications

Order By: Most citations

Empirical Study of Machine Learning Based Approach for Opinion Mining in Tweets

Sidorov

Miranda-Jiménez

Viveros-Jiménez

et al. 2013

View full text Add to dashboard Cite

Abstract. Opinion mining deals with determining of the sentiment orientation-positive, negative, or neutral-of a (short) text. Recently, it has attracted great interest both in academia and in industry due to its useful potential applications. One of the most promising applications is analysis of opinions in social networks. In this paper, we examine how classifiers work while doing opinion mining over Spanish Twitter data. We explore how different settings (n-gram size, corpus size, number of sentiment classes, balanced vs. unbalanced corpus, various domains) affect precision of the machine learning algorithms. We experimented with Naïve Bayes, Decision Tree, and Support Vector Machines. We describe also language specific preprocessing-in our case, for Spanish language-of tweets. The paper presents best settings of parameters for practical applications of opinion mining in Spanish Twitter. We also present a novel resource for analysis of emotions in texts: a dictionary marked with probabilities to express one of the six basic emotionsProbability Factor of Affective use (PFA)Spanish Emotion Lexicon that contains 2,036 words.

show abstract

A case study of Spanish text transformations for twitter sentiment analysis

Téllez

Miranda-Jiménez

Graff

et al. 2017

Expert Systems with Applications

View full text Add to dashboard Cite

Sentiment analysis is a text mining task that determines the polarity of a given text, i.e., its positiveness or negativeness. Recently, it has received a lot of attention given the interest in opinion mining in micro-blogging platforms. These new forms of textual expressions present new challenges to analyze text given the use of slang, orthographic and grammatical errors, among others. Along with these challenges, a practical sentiment classifier should be able to handle efficiently large workloads.The aim of this research is to identify which text transformations (lemmatization, stemming, entity removal, among others), tokenizers (e.g., words n-grams), and tokens weighting schemes impact the most the accuracy of a classifier (Support Vector Machine) trained on two Spanish corpus. The methodology used is to exhaustively analyze all the combinations of the text transformations and their respective parameters to find out which characteristics the best performing classifiers have in common. Furthermore, among the different text transformations studied, we introduce a novel approach based on the combination of word based n-grams and character based q-grams. The results show that this novel combination of words and characters produces a classifier that outperforms the traditional word based combination by 11.17% and 5.62% on the INEGI and TASS'15 dataset, respectively.

show abstract

An automated text categorization framework based on hyperparameter optimization

Téllez

Moctezuma

Miranda-Jiménez

et al. 2018

Knowledge-Based Systems

View full text Add to dashboard Cite

A great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackle using a text classifier. A text classifier consists of several subprocesses, some of them are general enough to be applied to any supervised learning problem, whereas others are specifically designed to tackle a particular task, using complex and computational expensive processes such as lemmatization, syntactic analysis, etc. Contrary to traditional approaches, we propose a minimalistic and wide system able to tackle text classification tasks independent of domain and language, namely µTC. It is composed by some easy to implement text transformations, text representations, and a supervised learning algorithm. These pieces produce a competitive classifier even in the domain of informally written text. We provide a detailed description of µTC along with an extensive experimental comparison with relevant state-of-the-art methods. µTC was compared on 30 different datasets. Regarding accuracy, µTC obtained the best performance in 20 datasets while achieves competitive results in the remaining 10. The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution. Furthermore, it is important to state that our approach allows the usage of the technology even without knowledge of machine learning and natural language processing.

show abstract

EvoDAG: A semantic Genetic Programming Python library

Graff

Téllez

Miranda-Jiménez

2016

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.