Xiaohua Hu scite author profile

Knowledge discovery in databases, or data mining, is an important direction in the development of data and knowledge-based systems. Because of the huge amount of data stored in large numbers of existing databases, and because the amount of data generated in electronic forms is growing rapidly, it is necessary to develop efficient methods to extract knowledge from databases. An attribute-oriented rough set approach has been developed for knowledge discovery in databases. The method integrates machine-learning paradigm, especially leaming-fromexamples techniques, with rough set techniques. An attribute-oriented concept tree ascension technique is first applied in generalization, which substantially reduces the computational complexity of database learning processes. Then the cause-effect relationship among the attributes in the database is analyzed using rough set techniques, and the unimportant or irrelevant attributes are eliminated. Thus concise and strong rules with little or no redundant information can be learned efficiently. Our study shows that attribute-oriented induction combined with rough set theory provide an efficient and effective mechanism for knowledge discovery in database systems.

show abstract

Exploiting Wikipedia as external knowledge for document clustering

Zhang

et al. 2009

214

114

View full text Add to dashboard Cite

In traditional text clustering methods, documents are represented as "bags of words" without considering the semantic information of each document. For instance, if two documents use different collections of core words to represent the same topic, they may be falsely assigned to different clusters due to the lack of shared core words, although the core words they use are probably synonyms or semantically associated in other forms. The most common way to solve this problem is to enrich document representation with the background knowledge in an ontology. There are two major issues for this approach: (1) the coverage of the ontology is limited, even for WordNet or Mesh, (2) using ontology terms as replacement or additional features may cause information loss, or introduce noise. In this paper, we present a novel text clustering method to address these two issues by enriching document representation with Wikipedia concept and category information. We develop two approaches, exact match and relatedness-match, to map text documents to Wikipedia concepts, and further to Wikipedia categories. Then the text documents are clustered based on a similarity metric which combines document content information, concept information as well as category information. The experimental results using the proposed clustering framework on three datasets (20-newsgroup, TDT2, and LA Times) show that clustering performance improves significantly by enriching document representation with Wikipedia concepts and categories.

show abstract

Single-cell transcriptomic analysis of the tumor ecosystems underlying initiation and progression of papillary thyroid carcinoma

Shi

et al. 2021

Nat Commun

100

View full text Add to dashboard Cite

The tumor ecosystem of papillary thyroid carcinoma (PTC) is poorly characterized. Using single-cell RNA sequencing, we profile transcriptomes of 158,577 cells from 11 patients’ paratumors, localized/advanced tumors, initially-treated/recurrent lymph nodes and radioactive iodine (RAI)-refractory distant metastases, covering comprehensive clinical courses of PTC. Our data identifies a “cancer-primed” premalignant thyrocyte population with normal morphology but altered transcriptomes. Along the developmental trajectory, we also discover three phenotypes of malignant thyrocytes (follicular-like, partial-epithelial-mesenchymal-transition-like, dedifferentiation-like), whose composition shapes bulk molecular subtypes, tumor characteristics and RAI responses. Furthermore, we uncover a distinct BRAF-like-B subtype with predominant dedifferentiation-like thyrocytes, enriched cancer-associated fibroblasts, worse prognosis and promising prospect of immunotherapy. Moreover, potential vascular-immune crosstalk in PTC provides theoretical basis for combined anti-angiogenic and immunotherapy. Together, our findings provide insight into the PTC ecosystem that suggests potential prognostic and therapeutic implications.

show abstract

Genetic Dissection of Ethanol Tolerance in the Budding Yeast Saccharomyces cerevisiae

Wang

Tan

et al. 2007

146

View full text Add to dashboard Cite

Uncovering genetic control of variation in ethanol tolerance in natural populations of yeast Saccharomyces cerevisiae is essential for understanding the evolution of fermentation, the dominant lifestyle of the species, and for improving efficiency of selection for strains with high ethanol tolerance, a character of great economic value for the brewing and biofuel industries. To date, as many as 251 genes have been predicted to be involved in influencing this character. Candidacy of these genes was determined from a tested phenotypic effect following gene knockout, from an induced change in gene function under an ethanol stress condition, or by mutagenesis. This article represents the first genomics approach for dissecting genetic variation in ethanol tolerance between two yeast strains with a highly divergent trait phenotype. We developed a simple but reliable experimental protocol for scoring the phenotype and a set of STR/SNP markers evenly covering the whole genome. We created a mapping population comprising 319 segregants from crossing the parental strains. On the basis of the data sets, we find that the tolerance trait has a high heritability and that additive genetic variance dominates genetic variation of the trait. Segregation at five QTL detected has explained $50% of phenotypic variation; in particular, the major QTL mapped on yeast chromosome 9 has accounted for a quarter of the phenotypic variation. We integrated the QTL analysis with the predicted candidacy of ethanol resistance genes and found that only a few of these candidates fall in the QTL regions.

show abstract

Improving motor imagery practice with synchronous action observation in stroke patients

Sun

Wei

Luo

et al. 2016

Topics in Stroke Rehabilitation

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiaohua Hu

Learning in Relational Databases: A Rough Set Approach

Exploiting Wikipedia as external knowledge for document clustering

Single-cell transcriptomic analysis of the tumor ecosystems underlying initiation and progression of papillary thyroid carcinoma

Genetic Dissection of Ethanol Tolerance in the Budding Yeast Saccharomyces cerevisiae

Improving motor imagery practice with synchronous action observation in stroke patients

Contact Info

Product

Resources

About