The generalized dirichlet distribution in enhanced topic detection

Caballero, Karla; Barajas, Joel; Akella, Ram

doi:10.1145/2396761.2396860

Cited by 23 publications

(22 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For this application, we fit a GD-LDA model developed by authors of [5] to extract the topics from the corpus set. This consists in all the processed text entry notes (noun phrases + terms) of all the patients.…”

Section: Topic Based Featuresmentioning

confidence: 99%

“…Each entry has an average length of 173 terms after constructing noun phrases, performing stemming, and removing stop words. We fit the GDLDA [5] model using all the text entries and K = [50, 75, 100] topics. Figure 3 shows some of the obtained topics and how these topics are aligned with symptoms and procedures for a particular disease.…”

Section: Experimental Settings and Numerical Feature Extractionmentioning

confidence: 99%

See 1 more Smart Citation

Dynamically Modeling Patient's Health State from Electronic Medical Records

Barajas

Akella

2015

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Self Cite

View full text Add to dashboard Cite

In this paper, we present a method to dynamically estimate the probability of mortality inside the Intensive Care Unit (ICU) by combining heterogeneous data. We propose a method based on Generalized Linear Dynamic Models that models the probability of mortality as a latent state that evolves over time. This framework allows us to combine different types of features (lab results, vital signs readings, doctor and nurse notes, etc) into a single state, which is updated each time new patient data is observed. In addition, we include the use of text features, based on medical noun phrase extraction and Statistical Topic Models. These features provide context about the patient that cannot be captured when only numerical features are used. We fill out the missing values using a Regularized Expectation Maximization based method assuming temporal data. We test our proposed approach using 15,000 Electronic Medical Records (EMRs) obtained from the MIMIC II public dataset. Experimental results show that the proposed model allows us to detect an increase in the probability of mortality before it occurs. We report an AUC 0.8657. Our proposed model clearly outperforms other methods of the literature in terms of sensitivity with 0.7885 compared to 0.6559 of Naive Bayes and F-score with 0.5929 compared to 0.4662 of Apache III score after 24 hours. CCS Concepts •Mathematics of computing → Time series analysis; •Applied computing → Health care information systems; •Information systems → Content analysis and feature selection;

show abstract

Section: Topic Based Featuresmentioning

confidence: 99%

Section: Experimental Settings and Numerical Feature Extractionmentioning

confidence: 99%

Dynamically Modeling Patient's Health State from Electronic Medical Records

Barajas

Akella

2015

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Self Cite

View full text Add to dashboard Cite

show abstract

“…The interior nodes are distributions over topics called super-topics. Recently, in [6], the authors presented a new model to find correlation among topics in a corpus using the Generalized Dirichlet distribution model instead of the Dirichlet distribution.…”

Section: Related Workmentioning

confidence: 99%

“…Due to space constraint, we only show the graph obtained from our NTSeg model. Note that other models such as PAM, LDCC, LDSEG, GD-LDA [6] and CTM, only form unigrams in a topic leading to ambiguous interpretation. For example, presenting the unigram "confidence" will not be that insightful in a correlation graph.…”

Section: Correlation Graphmentioning

confidence: 99%

“…In the topic modeling literature, metrics such as perplexity computation or log-likelihood have often been used. For example, PAM uses empirical log-likelihood [10] as an evaluation metric and so does a recently proposed method GD-LDA [6]. Log-likelihood has also been widely used as one of the evaluation metrics, for example in [3].…”

Section: Document Likelihood Experimentsmentioning

confidence: 99%

See 1 more Smart Citation

An unsupervised topic segmentation model incorporating word order

Jameel

Lam

2013

Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

We present a new unsupervised topic discovery model for a collection of text documents. In contrast to the majority of the state-of-the-art topic models, our model does not break the document's structure such as paragraphs and sentences. In addition, it preserves word order in the document. As a result, it can generate two levels of topics of different granularity, namely, segment-topics and word-topics. In addition, it can generate n-gram words in each topic. We also develop an approximate inference scheme using Gibbs sampling method. We conduct extensive experiments using publicly available data from different collections and show that our model improves the quality of several text mining tasks such as the ability to support fine grained topics with n-gram words in the correlation graph, the ability to segment a document into topically coherent sections, document classification, and document likelihood estimation.

show abstract

Deriving Probabilistic SVM Kernels from Exponential Family Approximations to Multivariate Distributions for Count Data

Zamzami

Bouguila

2019

Unsupervised and Semi-Supervised Learning

View full text Add to dashboard Cite

The generalized dirichlet distribution in enhanced topic detection

Cited by 23 publications

References 14 publications

Dynamically Modeling Patient's Health State from Electronic Medical Records

Dynamically Modeling Patient's Health State from Electronic Medical Records

An unsupervised topic segmentation model incorporating word order

Deriving Probabilistic SVM Kernels from Exponential Family Approximations to Multivariate Distributions for Count Data

Contact Info

Product

Resources

About