Buckling of frames braced by flexural bracing

We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTMs). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables, with those at the lowest latent level representing word co-occurrence patterns and those at higher levels representing co-occurrence of patterns at the level below. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. Unlike LDA-based topic models, HLTMs do not refer to a document generation process and use word variables instead of token variables. They use a tree structure to model the relationships between topics and words, which is conducive to the discovery of meaningful topics and topic hierarchies.

show abstract

Greedy learning of latent tree models for multidimensional clustering

Liu

Zhang

Chen

et al. 2013

Mach Learn

View full text Add to dashboard Cite

A data-driven method for syndrome type identification and classification in traditional Chinese medicine

Zhang

Liu

et al. 2017

Journal of Integrative Medicine

View full text Add to dashboard Cite

The efficacy of traditional Chinese medicine (TCM) treatments for Western medicine (WM) diseases relies heavily on the proper classification of patients into TCM syndrome types. The authors developed a data-driven method for solving the classification problem, where syndrome types were identified and quantified based on statistical patterns detected in unlabeled symptom survey data. The new method is a generalization of latent class analysis (LCA), which has been widely applied in WM research to solve a similar problem, i.e., to identify subtypes of a patient population in the absence of a gold standard. A well-known weakness of LCA is that it makes an unrealistically strong independence assumption. The authors relaxed the assumption by first detecting symptom co-occurrence patterns from survey data and used those statistical patterns instead of the symptoms as features for LCA. This new method consists of six steps: data collection, symptom co-occurrence pattern discovery, statistical pattern interpretation, syndrome identification, syndrome type identification and syndrome type classification. A software package called Lantern has been developed to support the application of the method. The method was illustrated using a data set on vascular mild cognitive impairment.

show abstract

Model-based clustering of high-dimensional data: Variable selection versus facet determination

Poon

Zhang

Liu

et al. 2013

International Journal of Approximate Reasoning

View full text Add to dashboard Cite

Variable selection is an important problem for cluster analysis of high-dimensional data. It is also a difficult one. The difficulty originates not only from the lack of class information but also the fact that high-dimensional data are often multifaceted and can be meaningfully clustered in multiple ways. In such a case the effort to find one subset of attributes that presumably gives the "best" clustering may be misguided. It makes more sense to identify various facets of a data set (each being based on a subset of attributes), cluster the data along each one, and present the results to the domain experts for appraisal and selection. In this paper, we propose a generalization of the Gaussian mixture models and demonstrate its ability to automatically identify natural facets of data and cluster data along each of those facets simultaneously. We present empirical results to show that facet determination usually leads to better clustering results than variable selection.

show abstract

Learning Analytics for Monitoring Students Participation Online: Visualizing Navigational Patterns on Learning Management System

Poon¹,

Kong²,

Yau³

et al. 2017

View full text Add to dashboard Cite

Trends and Features of the Applications of Natural Language Processing Techniques for Clinical Trials Text Analysis

Chen

Xie²,

Cheng

et al. 2020

Applied Sciences

View full text Add to dashboard Cite

Natural language processing (NLP) is an effective tool for generating structured information from unstructured data, the one that is commonly found in clinical trial texts. Such interdisciplinary research has gradually grown into a flourishing research field with accumulated scientific outputs available. In this study, bibliographical data collected from Web of Science, PubMed, and Scopus databases from 2001 to 2018 had been investigated with the use of three prominent methods, including performance analysis, science mapping, and, particularly, an automatic text analysis approach named structural topic modeling. Topical trend visualization and test analysis were further employed to quantify the effects of the year of publication on topic proportions. Topical diverse distributions across prolific countries/regions and institutions were also visualized and compared. In addition, scientific collaborations between countries/regions, institutions, and authors were also explored using social network analysis. The findings obtained were essential for facilitating the development of the NLP-enhanced clinical trial texts processing, boosting scientific and technological NLP-enhanced clinical trial research, and facilitating inter-country/region and inter-institution collaborations.

show abstract

Lexicon-Based Sentiment Convolutional Neural Networks for Online Review Analysis

Huang

Xie²,

Rao

et al. 2022

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Leonard K. M. Poon

Model-based multidimensional clustering of categorical data

Latent tree models for hierarchical topic detection

Greedy learning of latent tree models for multidimensional clustering

A data-driven method for syndrome type identification and classification in traditional Chinese medicine

Model-based clustering of high-dimensional data: Variable selection versus facet determination

Learning Analytics for Monitoring Students Participation Online: Visualizing Navigational Patterns on Learning Management System

Trends and Features of the Applications of Natural Language Processing Techniques for Clinical Trials Text Analysis

Lexicon-Based Sentiment Convolutional Neural Networks for Online Review Analysis

Contact Info

Product

Resources

About