Enriching consumer health vocabulary through mining a social Q&amp;A site: A similarity-based approach

He, Zhe; Chen, Zhiwei; Oh, Sanghee; Hou, Jinghui; Bian, Jiang

doi:10.1016/j.jbi.2017.03.016

Cited by 37 publications

(25 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our semantic relation evaluation dataset focused on nouns and adjectives, which was based on the statistics of synsets in WordNet [ 40 ], where 69.79% (82,115/117,659) of the synsets were nouns, and 15.43% (18,156/117,659) of synsets were adjectives. We constructed the evaluation dataset in four steps: 1) We employed a named entity recognition tool developed in our lab, simiTerm [ 41 ], to generate all the candidate evaluation terms based on the N-gram model. 2) We filtered out the noisy terms including: Terms with more than four words; Terms with a frequency < 100 in the corpus; Terms starting or ending with a stop word (We used the default stop word list in simiTerm ); Unigrams that are not noun, adjective, or gerund; Multi-grams not ending with a noun or a gerund.…”

Section: Methodsmentioning

confidence: 99%

Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases

Chen

Liu

et al. 2018

BMC Med Inform Decis Mak

Self Cite

View full text Add to dashboard Cite

BackgroundIn the past few years, neural word embeddings have been widely used in text mining. However, the vector representations of word embeddings mostly act as a black box in downstream applications using them, thereby limiting their interpretability. Even though word embeddings are able to capture semantic regularities in free text documents, it is not clear how different kinds of semantic relations are represented by word embeddings and how semantically-related terms can be retrieved from word embeddings.MethodsTo improve the transparency of word embeddings and the interpretability of the applications using them, in this study, we propose a novel approach for evaluating the semantic relations in word embeddings using external knowledge bases: Wikipedia, WordNet and Unified Medical Language System (UMLS). We trained multiple word embeddings using health-related articles in Wikipedia and then evaluated their performance in the analogy and semantic relation term retrieval tasks. We also assessed if the evaluation results depend on the domain of the textual corpora by comparing the embeddings of health-related Wikipedia articles with those of general Wikipedia articles.ResultsRegarding the retrieval of semantic relations, we were able to retrieve semanti. Meanwhile, the two popular word embedding approaches, Word2vec and GloVe, obtained comparable results on both the analogy retrieval task and the semantic relation retrieval task, while dependency-based word embeddings had much worse performance in both tasks. We also found that the word embeddings trained with health-related Wikipedia articles obtained better performance in the health-related relation retrieval tasks than those trained with general Wikipedia articles.ConclusionIt is evident from this study that word embeddings can group terms with diverse semantic relations together. The domain of the training corpus does have impact on the semantic relations represented by word embeddings. We thus recommend using domain-specific corpus to train word embeddings for domain-specific text mining tasks.

show abstract

Section: Methodsmentioning

confidence: 99%

Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases

Chen

Liu

et al. 2018

BMC Med Inform Decis Mak

Self Cite

View full text Add to dashboard Cite

show abstract

“…Authors suggest that query expressions must be carefully chosen when sampling social media for disease-related micro-blogs. In their work in a similar domain, He et al experimented with social question-and-answer sites corpora on two disease domains -diabetes and cancer-in order to identify new, meaningful consumer terms [42]. Others developed a model comprising an ensemble of classifiers for mining social media data streams by combining similarity-based and genetic algorithm classifiers [43].…”

Section: Similarity-based Approachesmentioning

confidence: 99%

Time-aware domain-based social influence prediction

et al. 2020

View full text Add to dashboard Cite

Online Social Networks(OSNs) have established virtual platforms enabling people to express their opinions, interests and thoughts in a variety of contexts and domains, allowing legitimate users as well as spammers and other untrustworthy users to publish and spread their content. Hence, the concept of social trust has attracted an attention of information processors/data scientists and information consumers / business firms. One of the main reasons for acquiring the value of Social Big Data (SBD) is to provide frameworks and methodologies using which the credibility of OSNs users can be evaluated. These approaches should be scalable to accommodate large-scale social data. Hence, there is a need for well comprehending of social trust to improve and expand the analysis process and inferring credibility of SBD. Given the exposed environment's settings and fewer limitations related to OSNs, the medium allows legitimate and genuine users as well as spammers and other low trustworthy users to publish and spread their content. Hence, this paper presents an approach incorporates semantic analysis and machine learning modules to measure and predict users' trustworthiness in numerous domains in different time periods. The evaluation of the conducted experiment validates the applicability of the incorporated machine learning techniques to predict highly trustworthy domain-based users.

show abstract

“…One example is the study by Jiang and Yang (2015) that enhanced CHV with new terms by mining online health communities and determined whether terms should be added to the ontology using co-occurrence metrics. Another example is a study by He et al (2017) exploring mining a social Q&A site to find candidate terms for expanding CHV. This stands as important work, although the results of this have not been incorporated into the publically accessible CHV.…”

Section: Expanding Chvmentioning

confidence: 99%

An Automatic Approach to Extending the Consumer Health Vocabulary

Monselise

Greenberg

Liang

et al. 2020

Journal of Data and Information Science

View full text Add to dashboard Cite

PurposeGiven the ubiquitous presence of the internet in our lives, many individuals turn to the web for medical information. A challenge here is that many laypersons (as “consumers”) do not use professional terms found in the medical nomenclature when describing their conditions and searching the internet. The Consumer Health Vocabulary (CHV) ontology, initially developed in 2007, aimed to bridge this gap, although updates have been limited over the last decade. The purpose of this research is to implement a means of automatically creating a hierarchical consumer health vocabulary. This overall purpose is improving consumers’ ability to search for medical conditions and symptoms with an enhanced CHV and improving the search capabilities of our searching and indexing tool HIVE (Helping Interdisciplinary Vocabulary Engineering).Design/methodology/approachThe research design uses ontological fusion, an approach for automatically extracting and integrating the Medical Subject Headings (MeSH) ontology into CHV, and further convert CHV from a flat mapping to a hierarchical ontology. The additional relationships and parent terms from MeSH allow us to uncover relationships between existing terms in the CHV ontology as well. The research design also included improving the search capabilities of HIVE identifying alternate relationships and consolidating them to a single entry.FindingsThe key findings are an improved CHV with a hierarchical structure that enables consumers to search through the ontology and uncover more relationships.Research limitationsThere are some cases where the improved search results in HIVE return terms that are related but not completely synonymous. We present an example and discuss the implications of this result.Practical implicationsThis research makes available an updated and richer CHV ontology using the HIVE tool. Consumers may use this tool to search consumer terminology for medical conditions and symptoms. The HIVE tool will return results about the medical term linked with the consumer term as well as the hierarchy of other medical terms connected to the term.Originality/valueThis is a first attempt in over a decade to improve and enhance the CHV ontology with current terminology and the first research effort to convert CHV's original flat ontology structure to a hierarchical structure. This research also enhances the HIVE infrastructure and provides consumers with a simple, efficient mechanism for searching the CHV ontology and providing meaningful data to consumers.

show abstract

Enriching consumer health vocabulary through mining a social Q&A site: A similarity-based approach

Cited by 37 publications

References 30 publications

Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases

Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases

Time-aware domain-based social influence prediction

An Automatic Approach to Extending the Consumer Health Vocabulary

Contact Info

Product

Resources

About