2017
DOI: 10.1016/j.jbi.2017.03.016
|View full text |Cite
|
Sign up to set email alerts
|

Enriching consumer health vocabulary through mining a social Q&A site: A similarity-based approach

Abstract: The widely known vocabulary gap between health consumers and healthcare professionals hinders information seeking and health dialogue of consumers on end-user health applications. The Open Access and Collaborative Consumer Health Vocabulary (OAC CHV), which contains health-related terms used by lay consumers, has been created to bridge such a gap. Specifically, the OAC CHV facilitates consumers’ health information retrieval by enabling consumer-facing health applications to translate between professional langu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
24
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
2
2

Relationship

2
8

Authors

Journals

citations
Cited by 37 publications
(25 citation statements)
references
References 30 publications
1
24
0
Order By: Relevance
“…Our semantic relation evaluation dataset focused on nouns and adjectives, which was based on the statistics of synsets in WordNet [ 40 ], where 69.79% (82,115/117,659) of the synsets were nouns, and 15.43% (18,156/117,659) of synsets were adjectives. We constructed the evaluation dataset in four steps: 1) We employed a named entity recognition tool developed in our lab, simiTerm [ 41 ], to generate all the candidate evaluation terms based on the N-gram model. 2) We filtered out the noisy terms including: Terms with more than four words; Terms with a frequency < 100 in the corpus; Terms starting or ending with a stop word (We used the default stop word list in simiTerm ); Unigrams that are not noun, adjective, or gerund; Multi-grams not ending with a noun or a gerund.…”
Section: Methodsmentioning
confidence: 99%
“…Our semantic relation evaluation dataset focused on nouns and adjectives, which was based on the statistics of synsets in WordNet [ 40 ], where 69.79% (82,115/117,659) of the synsets were nouns, and 15.43% (18,156/117,659) of synsets were adjectives. We constructed the evaluation dataset in four steps: 1) We employed a named entity recognition tool developed in our lab, simiTerm [ 41 ], to generate all the candidate evaluation terms based on the N-gram model. 2) We filtered out the noisy terms including: Terms with more than four words; Terms with a frequency < 100 in the corpus; Terms starting or ending with a stop word (We used the default stop word list in simiTerm ); Unigrams that are not noun, adjective, or gerund; Multi-grams not ending with a noun or a gerund.…”
Section: Methodsmentioning
confidence: 99%
“…Authors suggest that query expressions must be carefully chosen when sampling social media for disease-related micro-blogs. In their work in a similar domain, He et al experimented with social question-and-answer sites corpora on two disease domains -diabetes and cancer-in order to identify new, meaningful consumer terms [42]. Others developed a model comprising an ensemble of classifiers for mining social media data streams by combining similarity-based and genetic algorithm classifiers [43].…”
Section: Similarity-based Approachesmentioning
confidence: 99%
“…One example is the study by Jiang and Yang (2015) that enhanced CHV with new terms by mining online health communities and determined whether terms should be added to the ontology using co-occurrence metrics. Another example is a study by He et al (2017) exploring mining a social Q&A site to find candidate terms for expanding CHV. This stands as important work, although the results of this have not been incorporated into the publically accessible CHV.…”
Section: Expanding Chvmentioning
confidence: 99%