2013
DOI: 10.32614/rj-2013-001
|View full text |Cite
|
Sign up to set email alerts
|

RTextTools: A Supervised Learning Package for Text Classification

Abstract: Social scientists have long hand-labeled texts to create datasets useful for studying topics from congressional policymaking to media reporting. Many social scientists have begun to incorporate machine learning into their toolkits. RTextTools was designed to make machine learning accessible by providing a start-to-finish product in less than 10 steps. After installing RTextTools, the initial step is to generate a document term matrix. Second, a container object is created, which holds all the objects needed fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
64
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 87 publications
(64 citation statements)
references
References 10 publications
(11 reference statements)
0
64
0
Order By: Relevance
“…We included all of the available algorithms within RTextTools apart from the neural network, which did not converge in pilot assessments. The algorithms were support vector machine (SVM) using the radial basis function kernel with the penalty parameter of error term set to 1 and a gamma parameter set to 1/number of features [16], scaled linear discriminant analysis (SLDA) with eigenvalue threshold set to ≥1, bootstrapped boosting (bagging) with 25 bootstrap replications [17], boosting [18], random classification and regression forests with 500 trees [19], classification and regression tree [20], maximum entropy without regularization [21], and generalized linear models with L1 (lasso) penalized regularization (GLM/LASSO) [22]. …”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We included all of the available algorithms within RTextTools apart from the neural network, which did not converge in pilot assessments. The algorithms were support vector machine (SVM) using the radial basis function kernel with the penalty parameter of error term set to 1 and a gamma parameter set to 1/number of features [16], scaled linear discriminant analysis (SLDA) with eigenvalue threshold set to ≥1, bootstrapped boosting (bagging) with 25 bootstrap replications [17], boosting [18], random classification and regression forests with 500 trees [19], classification and regression tree [20], maximum entropy without regularization [21], and generalized linear models with L1 (lasso) penalized regularization (GLM/LASSO) [22]. …”
Section: Methodsmentioning
confidence: 99%
“…When assessed as an ensemble of multiple algorithms working together, recall is evaluated alongside coverage (the proportion of cases within the dataset to which the recall value applies) [21]. The F value is analogous to interrater reliability and, as such, we will accept agreements ≥.80 between the algorithms and the human codes as evidence that the algorithms can complete the categorization task with acceptable accuracy.…”
Section: Methodsmentioning
confidence: 99%
“…Among CAP projects using computerized methods, the modal tool applied was the RTextTools package (Jurka, Collingwood, Boydstun, Grossman, & van Atteveldt, 2013). As Jurka et al (2013) note, RTextTools is recommended for data sets no larger than around 30,000 documents. The memory demands of the methods it utilizes 4 can lead to long processing times even on newer desktop computers.…”
Section: Practical Practices For Text Codingmentioning
confidence: 99%
“… These include many advanced methods we do not discuss: for example, SVM, multinomial logistic regression (maximum entropy), or neural networks (Jurka et al, ).…”
mentioning
confidence: 99%
“…The comparative study is conducted in the R data mining language based on the models trained using the RTextTools [11] package and its dependencies like e1071 [5]. Each of the three datasets is considered individually while evaluating the performance of the classification algorithms.…”
Section: Methodsmentioning
confidence: 99%