“…A variety of types of text data were represented in the selected articles including EMRs (i.e., clinical notes, progress notes, patient safety records [17,[30][31][32][33][34][35][36]), lexical documents (i.e., language treebanks which are bodies of text that have been parsed semantically and syntactically, WordNet database [37-43]), organizational documents (i.e., maintenance logs/data, accident reports, requirements documentation [44][45][46][47]), abstracts and scientific articles (i.e., PubMed and various engineering journals [29,[48][49][50]), various bodies of text (corpora) (i.e., non-language corpora, non-medical/medical/biomedical corpora, language corpus [50][51][52][53]), social media data (i.e., Twitter, meme tracker from various social media websites [54][55][56]), product reviews (i.e., general product, Chinese tourism, Amazon product [13,57,58]), and news articles (i.e., magazines, newswires, consumer reports [54,59,60]). Almost all empirical articles (85.4%) described preprocessing methods to improve NLP algorithm performance.…”