Braja Gopal Patra scite author profile

Objective Social determinants of health (SDoH) are nonclinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision-making can potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such information is typically locked in unstructured clinical notes. Natural language processing (NLP) is the key technology to extract SDoH information from clinical text and expand its utility in patient care and research. This article presents a systematic review of the state-of-the-art NLP approaches and tools that focus on identifying and extracting SDoH data from unstructured clinical text in EHRs. Materials and Methods A broad literature search was conducted in February 2021 using 3 scholarly databases (ACL Anthology, PubMed, and Scopus) following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 6402 publications were initially identified, and after applying the study inclusion criteria, 82 publications were selected for the final review. Results Smoking status (n = 27), substance use (n = 21), homelessness (n = 20), and alcohol use (n = 15) are the most frequently studied SDoH categories. Homelessness (n = 7) and other less-studied SDoH (eg, education, financial problems, social isolation and support, family problems) are mostly identified using rule-based approaches. In contrast, machine learning approaches are popular for identifying smoking status (n = 13), substance use (n = 9), and alcohol use (n = 9). Conclusion NLP offers significant potential to extract SDoH data from narrative clinical notes, which in turn can aid in the development of screening tools, risk prediction models, and clinical decision support systems.

show abstract

A content-based literature recommendation system for datasets to improve data reusability – A case study on Gene Expression Omnibus (GEO) datasets

Patra

Maroufy

Soltanalizadeh

et al. 2020

Journal of Biomedical Informatics

View full text Add to dashboard Cite

A content-based dataset recommendation system for researchers—a case study on Gene Expression Omnibus (GEO) repository

Patra

Roberts

2020

View full text Add to dashboard Cite

It is a growing trend among researchers to make their data publicly available for experimental reproducibility and data reusability. Sharing data with fellow researchers helps in increasing the visibility of the work. On the other hand, there are researchers who are inhibited by the lack of data resources. To overcome this challenge, many repositories and knowledge bases have been established to date to ease data sharing. Further, in the past two decades, there has been an exponential increase in the number of datasets added to these dataset repositories. However, most of these repositories are domain-specific, and none of them can recommend datasets to researchers/users. Naturally, it is challenging for a researcher to keep track of all the relevant repositories for potential use. Thus, a dataset recommender system that recommends datasets to a researcher based on previous publications can enhance their productivity and expedite further research. This work adopts an information retrieval (IR) paradigm for dataset recommendation. We hypothesize that two fundamental differences exist between dataset recommendation and PubMed-style biomedical IR beyond the corpus. First, instead of keywords, the query is the researcher, embodied by his or her publications. Second, to filter the relevant datasets from non-relevant ones, researchers are better represented by a set of interests, as opposed to the entire body of their research. This second approach is implemented using a non-parametric clustering technique. These clusters are used to recommend datasets for each researcher using the cosine similarity between the vector representations of publication clusters and datasets. The maximum normalized discounted cumulative gain at 10 (NDCG@10), precision at 10 (p@10) partial and p@10 strict of 0.89, 0.78 and 0.61, respectively, were obtained using the proposed method after manual evaluation by five researchers. As per the best of our knowledge, this is the first study of its kind on content-based dataset recommendation. We hope that this system will further promote data sharing, offset the researchers’ workload in identifying the right dataset and increase the reusability of biomedical datasets. Database URL: http://genestudy.org/recommends/#/

show abstract

JU_NLP at SemEval-2016 Task 11: Identifying Complex Words in a Sentence

Mukherjee¹,

Patra²,

Das³

et al. 2016

View full text Add to dashboard Cite

The complex word identification task refers to the process of identifying difficult words in a sentence from the perspective of readers belonging to a specific target audience. This task has immense importance in the field of lexical simplification. Lexical simplification helps in improving the readability of texts consisting of challenging words. As a participant of the SemEval-2016: Task 11 shared task, we developed two systems using various lexical and semantic features to identify complex words, one using Naïve Bayes and another based on Random Forest Classifiers. The Naïve Bayes classifier based system achieves the maximum G-score of 76.7% after incorporating rule based post-processing techniques.

show abstract

Shared Task on Sentiment Analysis in Indian Languages (SAIL) Tweets - An Overview

Patra

Das

Das³

et al. 2015

View full text Add to dashboard Cite

Sentiment Analysis in Twitter has been considered as a vital task for a decade from various academic and commercial perspectives. Several works have been performed on Twitter sentiment analysis or opinion mining for English in contrast to the Indian languages. Here, we summarize the objectives and evaluation of the sentiment analysis task in tweets for three Indian languages namely Bengali, Hindi and Tamil. This is the first attempt to sentiment analysis task in the context of Indian language tweets. The main objective of this task was to classify the tweets into positive, negative and neutral polarity. For training and testing purpose, the tweets from each language were provided. Each of the participating teams was asked to submit two systems, constrained and unconstrained systems for each of the languages. We ranked the systems based on the accuracy of the systems. Total of six teams submitted the results and the maximum accuracy achieved for Bengali, Hindi and Tamil are 43.2 %, 55.67 %, and 39.28 % respectively.

show abstract

Multimodal mood classification of Hindi and Western songs

2018

View full text Add to dashboard Cite

Music information retrieval has emerged as a mainstream research area in the past two decades. Experiments on music mood classification have been performed mainly on Western music based on audio, lyrics and a combination of both. Unfortunately, due to the scarcity of digitalized resources, Indian music fares poorly in music mood retrieval research. In this paper, we identified the mood taxonomy and prepared multimodal mood annotated datasets for Hindi and Western songs. We identified important audio and lyric features using correlation based feature selection technique. Finally, we developed mood classification systems using Support Vector Machines and Feed Forward Neural Networks based on the features collected from audio, lyrics, and a combination of both. The best performing multimodal systems achieved F-measures of 75.1 and 83.5 for classifying the moods of the Hindi and Western songs respectively using Feed Forward Neural Networks. A comparative analysis indicates that the selected features work well for mood classification of the Western songs and produces better results as compared to the mood classification systems for Hindi songs.

show abstract

Social connectedness as a determinant of mental health: A scoping review

Wickramaratne

Yangchen²,

Lepow³

et al. 2022

PLoS ONE

View full text Add to dashboard Cite

Public health and epidemiologic research have established that social connectedness promotes overall health. Yet there have been no recent reviews of findings from research examining social connectedness as a determinant of mental health. The goal of this review was to evaluate recent longitudinal research probing the effects of social connectedness on depression and anxiety symptoms and diagnoses in the general population. A scoping review was performed of PubMed and PsychInfo databases from January 2015 to December 2021 following PRISMA-ScR guidelines using a defined search strategy. The search yielded 66 unique studies. In research with other than pregnant women, 83% (19 of 23) studies reported that social support benefited symptoms of depression with the remaining 17% (5 of 23) reporting minimal or no evidence that lower levels of social support predict depression at follow-up. In research with pregnant women, 83% (24 of 29 studies) found that low social support increased postpartum depressive symptoms. Among 8 of 9 studies that focused on loneliness, feeling lonely at baseline was related to adverse outcomes at follow-up including higher risks of major depressive disorder, depressive symptom severity, generalized anxiety disorder, and lower levels of physical activity. In 5 of 8 reports, smaller social network size predicted depressive symptoms or disorder at follow-up. In summary, most recent relevant longitudinal studies have demonstrated that social connectedness protects adults in the general population from depressive symptoms and disorders. The results, which were largely consistent across settings, exposure measures, and populations, support efforts to improve clinical detection of high-risk patients, including adults with low social support and elevated loneliness.

show abstract

JU_CSE: A Conditional Random Field (CRF) Based Approach to Aspect Based Sentiment Analysis

Patra

Mandal

Das

et al. 2014

View full text Add to dashboard Cite

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.