2022
DOI: 10.1371/journal.pone.0270034
|View full text |Cite
|
Sign up to set email alerts
|

Automating document classification with distant supervision to increase the efficiency of systematic reviews: A case study on identifying studies with HIV impacts on female sex workers

Abstract: There remains a limited understanding of the HIV prevention and treatment needs among female sex workers in many parts of the world. Systematic reviews of existing literature can help fill this gap; however, well-done systematic reviews are time-demanding and labor-intensive. Here, we propose an automatic document classification approach to a systematic review to significantly reduce the effort in reviewing documents and optimizing empiric decision making. We first describe a manual document classification pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 39 publications
0
0
0
Order By: Relevance
“…As an input to machine learning models, most often bag-of-words (BOW) text representations were applied ( N = 30/89, 33.7%) [ 32 , 41 , 52 , 54 56 , 59 , 61 , 68 , 72 , 82 , 84 , 85 , 87 , 89 , 92 , 93 , 95 , 96 , 100 , 106 , 108 , 110 , 112 , 114 , 115 , 119 – 122 ], followed by term-frequency/inverse document frequency (TF-IDF) ( N = 16/89, 18.0%) [ 45 , 53 , 57 , 60 , 63 , 66 , 68 , 73 , 76 , 83 , 91 , 109 , 115 , 116 , 122 , 123 ], topic models ( N = 10/89, 11.2%) [ 45 , 60 , 84 , 86 , 91 , 93 , 104 , 107 , 109 , 115 , 123 ], keywords ( N = 9, 10.1%) [ 52 , 75 , 76 , 91 , 98 , 100 , 117 , 123 , 127 ], standardized terms such as Medical Subject ...…”
Section: Resultsmentioning
confidence: 99%
“…As an input to machine learning models, most often bag-of-words (BOW) text representations were applied ( N = 30/89, 33.7%) [ 32 , 41 , 52 , 54 56 , 59 , 61 , 68 , 72 , 82 , 84 , 85 , 87 , 89 , 92 , 93 , 95 , 96 , 100 , 106 , 108 , 110 , 112 , 114 , 115 , 119 – 122 ], followed by term-frequency/inverse document frequency (TF-IDF) ( N = 16/89, 18.0%) [ 45 , 53 , 57 , 60 , 63 , 66 , 68 , 73 , 76 , 83 , 91 , 109 , 115 , 116 , 122 , 123 ], topic models ( N = 10/89, 11.2%) [ 45 , 60 , 84 , 86 , 91 , 93 , 104 , 107 , 109 , 115 , 123 ], keywords ( N = 9, 10.1%) [ 52 , 75 , 76 , 91 , 98 , 100 , 117 , 123 , 127 ], standardized terms such as Medical Subject ...…”
Section: Resultsmentioning
confidence: 99%