Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya

Muchene, Leacky; Safari, Wende Clarence

doi:10.1371/journal.pone.0243208

Cited by 21 publications

(13 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Due to the unsupervised nature of the method which potentially could remove the human bias from the review process, and capacity to process a large number of documents at a relatively low computational cost, the use of NLP and Topic Modelling is becoming more popular in academia for explorative literature studies (Valle et al, 2014;Liu et al, 2016;Asmussen and Møller, 2019;Muchene and Safari, 2021). The interpretation of the results produced by LDA models, however, might pose a challenge if the initial hypothesis is not supported by manual overview of the text material or if the number of topics produced by the model is not cross-fold validated against the initial dataset.…”

Section: Discussionmentioning

confidence: 99%

The Hitchhiker's Guide to Integration of Social and Ethical Awareness in Precision Livestock Farming Research

2021

View full text Add to dashboard Cite

While fully automated livestock production may be considered the ultimate goal for optimising productivity at the farm level, the benefits and costs of such a development at the scale at which it needs to be implemented must also be considered from social and ethical perspectives. Automation resulting from Precision Livestock Farming (PLF) could alter fundamental views of human-animal interactions on farm and, even further, potentially compromise human and animal welfare and health if PLF development does not include a flexible, holistic strategy for integration. To investigate topic segregation, inclusion of socio-ethical aspects, and consideration of human-animal interactions within the PLF research field, the abstracts from 644 peer-reviewed publications were analysed using the recent advances in the Natural Language Processing (NLP). Two Latent Dirichlet Allocation (LDA) probabilistic models with varying number of topics (13 and 3 for Model 1 and Model 2, respectively) were implemented to create a generalised research topic overview. The visual representation of topics produced by LDA Model 1 and Model 2 revealed prominent similarities in the terms contributing to each topic, with only weight for each term being different. The majority of terms for both models were process-oriented, obscuring the inclusion of social and ethical angles in PLF publications. A subset of articles (5%, n = 32) was randomly selected for manual examination of the full text to evaluate whether abstract text and focus reflected that of the article as a whole. Few of these articles (12.5%, n = 4) focused specifically on broader ethical or societal considerations of PLF or (9.4%, n = 3) discussed PLF with respect to human-animal interactions. While there was consideration of the impact of PLF on animal welfare and farmers in nearly half of the full texts examined (46.9%, n = 15), this was often limited to a few statements in passing. Further, these statements were typically general rather than specific and presented PLF as beneficial to human users and animal recipients. To develop PLF that is in keeping with the ethical values and societal concerns of the public and consumers, projects, and publications that deliberately combine social context with technological processes and results are needed.

show abstract

Section: Discussionmentioning

confidence: 99%

The Hitchhiker's Guide to Integration of Social and Ethical Awareness in Precision Livestock Farming Research

2021

View full text Add to dashboard Cite

show abstract

“…The extracted semantic structures are called topics and represent recurring patterns or clusters of co-occurring words in documents (27). Topics are extracted based on a probabilistic model that determines the most frequent co-occurring words over all documents (28). Key elements of TM are words or terms (a basic unit of discrete data), documents (a sequence of terms), corpus (a collection of documents), and document-term-matrix (DTM; a matrix that presents the frequency of each word in each document) (28).…”

Section: Topic Modelingmentioning

confidence: 99%

“…Topics are extracted based on a probabilistic model that determines the most frequent co-occurring words over all documents (28). Key elements of TM are words or terms (a basic unit of discrete data), documents (a sequence of terms), corpus (a collection of documents), and document-term-matrix (DTM; a matrix that presents the frequency of each word in each document) (28). An example of a DTM is presented in Table 1 where each cell is a frequency of terms used (column) in each document (row).…”

Section: Topic Modelingmentioning

confidence: 99%

Section: Latent Dirichlet Allocationmentioning

confidence: 99%

“…Latent Dirichlet Allocation (LDA) is a common TM method and one of the most popular ML algorithms (28). LDA extracts previously unknown or latent information from an immense number of documents' original texts and unstructured data (28,29). As patient charts are frequently found in the form of text files without any organized format, LDA is an appropriate method to extract salient information from these charts.…”

Section: Latent Dirichlet Allocationmentioning

confidence: 99%

See 2 more Smart Citations

An Application of Machine Learning Techniques to Analyze Patient Information to Improve Oral Health Outcomes

Ameli¹,

Gibson²,

Khanna³

et al. 2022

Front. Dent. Med

View full text Add to dashboard Cite

ObjectiveVarious health-related fields have applied Machine learning (ML) techniques such as text mining, topic modeling (TM), and artificial neural networks (ANN) to automate tasks otherwise completed by humans to enhance patient care. However, research in dentistry on the integration of these techniques into the clinic arena has yet to exist. Thus, the purpose of this study was to: introduce a method of automating the reviewing patient chart information using ML, provide a step-by-step description of how it was conducted, and demonstrate this method's potential to identify predictive relationships between patient chart information and important oral health-related contributors.MethodsA secondary data analysis was conducted to demonstrate the approach on a set of anonymized patient charts collected from a dental clinic. Two ML applications for patient chart review were demonstrated: (1) text mining and Latent Dirichlet Allocation (LDA) were used to preprocess, model, and cluster data in a narrative format and extract common topics for further analysis, (2) Ordinal logistic regression (OLR) and ANN were used to determine predictive relationships between the extracted patient chart data topics and oral health-related contributors. All analysis was conducted in R and SPSS (IBM, SPSS, statistics 22).ResultsData from 785 patient charts were analyzed. Preprocessing of raw data (data cleaning and categorizing) identified 66 variables, of which 45 were included for analysis. Using LDA, 10 radiographic findings topics and 8 treatment planning topics were extracted from the data. OLR showed that caries risk, occlusal risk, biomechanical risk, gingival recession, periodontitis, gingivitis, assisted mouth opening, and muscle tenderness were highly predictable using the extracted radiographic and treatment planning topics and chart information. Using the statistically significant predictors obtained from OLR, ANN analysis showed that the model can correctly predict >72% of all variables except for bruxism and tooth crowding (63.1 and 68.9%, respectively).ConclusionOur study presents a novel approach to address the need for data-enabled innovations in the field of dentistry and creates new areas of research in dental analytics. Utilizing ML methods and its application in dental practice has the potential to improve clinicians' and patients' understanding of the major factors that contribute to oral health diseases/conditions.

show abstract

The Biased Coin Flip Process for Nonparametric Topic Modeling

Wood

Wang

Arnold

2021

Document Analysis and Recognition – ICDAR 2021

View full text Add to dashboard Cite

Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya

Cited by 21 publications

References 34 publications

The Hitchhiker's Guide to Integration of Social and Ethical Awareness in Precision Livestock Farming Research

The Hitchhiker's Guide to Integration of Social and Ethical Awareness in Precision Livestock Farming Research

An Application of Machine Learning Techniques to Analyze Patient Information to Improve Oral Health Outcomes

The Biased Coin Flip Process for Nonparametric Topic Modeling

Contact Info

Product

Resources

About