RTextTools: A Supervised Learning Package for Text Classification

Collingwood, Loren; Jurka, Timothy P.; Boydstun, Amber E.; Grossman, Emiliano; Atteveldt, Wouter van

doi:10.32614/rj-2013-001

Cited by 87 publications

(64 citation statements)

References 10 publications

(11 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We included all of the available algorithms within RTextTools apart from the neural network, which did not converge in pilot assessments. The algorithms were support vector machine (SVM) using the radial basis function kernel with the penalty parameter of error term set to 1 and a gamma parameter set to 1/number of features [16], scaled linear discriminant analysis (SLDA) with eigenvalue threshold set to ≥1, bootstrapped boosting (bagging) with 25 bootstrap replications [17], boosting [18], random classification and regression forests with 500 trees [19], classification and regression tree [20], maximum entropy without regularization [21], and generalized linear models with L1 (lasso) penalized regularization (GLM/LASSO) [22]. …”

Section: Methodsmentioning

confidence: 99%

“…When assessed as an ensemble of multiple algorithms working together, recall is evaluated alongside coverage (the proportion of cases within the dataset to which the recall value applies) [21]. The F value is analogous to interrater reliability and, as such, we will accept agreements ≥.80 between the algorithms and the human codes as evidence that the algorithms can complete the categorization task with acceptable accuracy.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

et al. 2017

View full text Add to dashboard Cite

BackgroundMachine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development.ObjectiveThe objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors’ professional performance in the United Kingdom.MethodsWe used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians’ colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests.ResultsIndividual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to “popular” (recall=.97), “innovator” (recall=.98), and “respected” (recall=.87) codes and was lower for the “interpersonal” (recall=.80) and “professional” (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as “respected,” “professional,” and “interpersonal” related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P<.05). Scores did not vary between doctors who were rated as popular or innovative and those who were not rated at all (P>.05).ConclusionsMachine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high performance. Colleague open-text comments that signal respect, professionalism, and being interpersonal may be key indicators of doctor’s performance.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

et al. 2017

View full text Add to dashboard Cite

show abstract

“…Among CAP projects using computerized methods, the modal tool applied was the RTextTools package (Jurka, Collingwood, Boydstun, Grossman, & van Atteveldt, 2013). As Jurka et al (2013) note, RTextTools is recommended for data sets no larger than around 30,000 documents. The memory demands of the methods it utilizes 4 can lead to long processing times even on newer desktop computers.…”

Section: Practical Practices For Text Codingmentioning

confidence: 99%

“… These include many advanced methods we do not discuss: for example, SVM, multinomial logistic regression (maximum entropy), or neural networks (Jurka et al, ).…”

mentioning

confidence: 99%

Collaborating with the Machines: A Hybrid Method for Classifying Policy Documents

Loftis¹,

Mortensen²

2018

Policy Studies Journal

View full text Add to dashboard Cite

Governments produce vast and growing quantities of freely available text: laws, rules, budgets, press releases, and so forth. This information flood is facilitating important, growing research programs in policy and public administration. However, tightening research budgets and the information's vast scale forces political science and public policy to aspire to do more with less. Meeting this challenge means applied researchers must innovate. This article makes two contributions for practical text coding—the process of sorting government text into researcher‐defined coding schemes. First, we propose a method of combining human coding with automated computer classification for large data sets. Second, we present a well‐known algorithm for automated text classification, the Naïve Bayes classifier, and provide software for working with it. We argue and provide evidence that this method can help applied researchers using human coders to get more from their research budgets, and we demonstrate the method using classical examples from the study of policy agendas.

show abstract

“…The comparative study is conducted in the R data mining language based on the models trained using the RTextTools [11] package and its dependencies like e1071 [5]. Each of the three datasets is considered individually while evaluating the performance of the classification algorithms.…”

Section: Methodsmentioning

confidence: 99%

Comparative Evaluation of Supervised Learning Algorithms for Sentiment Analysis of Movie Reviews

Palkar¹,

Gala²,

Shah³

et al. 2016

IJCA

View full text Add to dashboard Cite

Online forums and social networking websites provide users with a platform for expressing their opinions. Manually evaluating these reviews for crucial analytical information is cumbersome. Sentiment analysis deals with analyzing such massively available textual data and determining its polarity. This research paper provides a comparative study of multiple well-known supervised machine learning algorithms on three standard datasets confined to the domain of movie reviews. The study is supported by illustrative plots and experimental results. The research work can be used as a base for further exploration in predicting the sentiment value of textual data in alternate domains using advanced machine learning algorithms.

show abstract

RTextTools: A Supervised Learning Package for Text Classification

Cited by 87 publications

References 10 publications

Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

Collaborating with the Machines: A Hybrid Method for Classifying Policy Documents

Comparative Evaluation of Supervised Learning Algorithms for Sentiment Analysis of Movie Reviews

Contact Info

Product

Resources

About