Medical documents classification using topic modeling

Nuser, Maryam; Al-Horani, Enas

doi:10.11591/ijeecs.v17.i3.pp1524-1530

Cited by 4 publications

(2 citation statements)

References 20 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ahn et al [11] implemented topic modelling using LDA to find similar and different topics between certain organisations regarding Ridgecrest earthquake tweets to predict public engagement and encourage more effective communication during natural disasters. Other than that, in this study, LDA got a resulting accuracy rate of 71.4% for documents correctly classified, which involves developing topic modeling for medical documents with constructing a document term matrix to capture word occurrences in each document [12].…”

Section: Introductionmentioning

confidence: 92%

Topic prediction modelling on social media content using machine learning

Dewi Aisha,

Ayu Wulandhari

2024

IJEECS

View full text Add to dashboard Cite

<span>The simplicity to deliver an opinion about companies or institutions via social media has resulted in both positive and negative judgments. Through social media all positive and negative information will be easily found and spread. It is concerned that negative information will lead to negative public opinion. If this occurs, the company will suffer from a lack of trust, which will harm the company's reputation. Thus, to monitor uncontrolled issues, a company wants to know what topics or opinions are developing in the community. Therefore, the topic modelling using latent dirichlet allocation (LDA) is proposed to identify topics that are being discussed on social media. The findings of this study got the coherence score of 0.558 and based on the direct human judgment, the model got an average 80% correctly. The findings of this study reveal 4 topics groups that represent the corporate social media content. These findings offer information to companies about the latest topics or opinions that are currently developing in society which could provide recommendations related to decision-making on current issues thus increasing the trust and reliability towards the company.</span>

show abstract

Section: Introductionmentioning

confidence: 92%

Topic prediction modelling on social media content using machine learning

Dewi Aisha,

Ayu Wulandhari

2024

IJEECS

View full text Add to dashboard Cite

show abstract

“…To date, most of the text classification methods generally used to assign multiple topics to documents [6], grouping of documents into a fixed number of predefined classes [7], sentiment analysis to determine the viewpoint/polarity of a writer with respect to some topic [8], spam filtering of emails [9], automatic hate speech detection [10]. In the era of big data, the increasing number of complex documents makes traditional machine learning methods difficult to implement because conventional learning processes are not designed for big data and will not work properly with high data volumes.…”

Section: Introductionmentioning

confidence: 99%

Automated hierarchical classification of scanned documents using convolutional neural network and regular expression

Arief

Mutiara

Kusuma

et al. 2022

IJECE

View full text Add to dashboard Cite

<p>This research proposed automated hierarchical classification of scanned documents with characteristics content that have unstructured text and special patterns (specific and short strings) using convolutional neural network (CNN) and regular expression method (REM). The research data using digital correspondence documents with format PDF images from pusat data teknologi dan informasi (technology and information data center). The document hierarchy covers type of letter, type of manuscript letter, origin of letter and subject of letter. The research method consists of preprocessing, classification, and storage to database. Preprocessing covers extraction using Tesseract optical character recognition (OCR) and formation of word document vector with Word2Vec. Hierarchical classification uses CNN to classify 5 types of letters and regular expression to classify 4 types of manuscript letter, 15 origins of letter and 25 subjects of letter. The classified documents are stored in the Hive database in Hadoop big data architecture. The amount of data used is 5200 documents, consisting of 4000 for training, 1000 for testing and 200 for classification prediction documents. The trial result of 200 new documents is 188 documents correctly classified and 12 documents incorrectly classified. The accuracy of automated hierarchical classification is 94%. Next, the search of classified scanned documents based on content can be developed.</p>

show abstract