Phrase2VecGLM: Neural generalized language model–based semantic tagging for complex query reformulation in medical IR

Das, Manirupa; Fosler‐Lussier, Eric; Lin, Simon; Moosavinasab, Soheil; Chen, David; Rust, Steve; Huang, Yungui; Ramnath, Rajiv

doi:10.18653/v1/w18-2313

Cited by 3 publications

(7 citation statements)

References 31 publications

(50 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We develop a novel sequence-to-set end-toend encoder-decoder-based neural framework for multi-label prediction, by training document representations using no external supervision labels, for pseudo-relevance feedback-based unsupervised semantic tagging of a large collection of documents. We find that in this unsupervised task setting of PRF-based semantic tagging for query expansion, a multi-term prediction training objective that jointly optimizes both prediction of the TFIDF-based document pseudo-labels and the log likelihood of the labels given the document encoding, surpasses previous methods such as Phrase2VecGLM (Das et al, 2018) that used neural generalized language models for the same. Our initial hypothesis that bidirectional or self-attentional models could learn the most efficient semantic representations of documents when coupled with a loss more effective than cross-entropy at reducing language model perplexity of document encodings, is corroborated in all experimental setups.…”

Section: Discussionmentioning

confidence: 81%

“…We ran several sets of experiments with various document encoders, employing pre-trained and fine-tuned embedding schemes like skip-gram (Mikolov et al, 2013a) and Probabilistic Fast-Text (Athiwaratkun et al, 2018), see Appendix B. The experimental setup used is the same as the Phrase2VecGLM (Das et al, 2018), the only other known system for this dataset, that performs "unsupervised semantic tagging of documents by PRF", for downstream query expansion. Thus we take this system as the current state-of-the-art system baseline, while our non-attention-based document encoding models constitute our standard baselines.…”

Section: Unsupervised Task Experimentsmentioning

confidence: 99%

“…Similar to Das et al (2018), for the feedback loop based query expansion method, we had two separate human judgment-based baselines, one using the MeSH terms available from PMC for the top 15 documents returned in a first round of querying with Summary text, and the other based on human expert annotations of the 30 query topics, made available by the authors.…”

Section: Experimental Considerations and Hyperparameter Settingsmentioning

confidence: 99%

See 2 more Smart Citations

Sequence-to-Set Semantic Tagging for Complex Query Reformulation and Automated Text Categorization in Biomedical IR using Self-Attention

Das¹,

Li²,

Fosler‐Lussier³

et al. 2020

Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

Self Cite

View full text Add to dashboard Cite

Novel contexts, comprising a set of terms referring to one or more concepts, may often arise in complex querying scenarios such as in evidence-based medicine (EBM) involving biomedical literature. These may not explicitly refer to entities or canonical concept forms occurring in a fact-based knowledge source, e.g. the UMLS ontology. Moreover, hidden associations between related concepts meaningful in the current context, may not exist within a single document, but across documents in the collection. Predicting semantic concept tags of documents can therefore serve to associate documents related in unseen contexts, or categorize them, in information filtering or retrieval scenarios. Thus, inspired by the success of sequence-to-sequence neural models, we develop a novel sequence-to-set framework with attention, for learning document representations in a unique unsupervised setting, using no human-annotated document labels or external knowledge resources and only corpus-derived term statistics to drive the training. This can effect term transfer within a corpus for semantically tagging a large collection of documents. Our sequence-to-set modeling approach to predict semantic tags , gives to the best of our knowledge, the state-of-theart for both, an unsupervised query expansion (QE) task for the TREC CDS 2016 challenge dataset when evaluated on an Okapi BM25based document retrieval system; and also over the MLTM system baseline (Soleimani and Miller, 2016), for both supervised and semi-supervised multi-label prediction tasks with del.icio.us and Ohsumed datasets. We make our code and data publicly available 1 .

show abstract

Section: Discussionmentioning

confidence: 81%

Section: Unsupervised Task Experimentsmentioning

confidence: 99%

Section: Experimental Considerations and Hyperparameter Settingsmentioning

confidence: 99%

See 1 more Smart Citation

Sequence-to-Set Semantic Tagging for Complex Query Reformulation and Automated Text Categorization in Biomedical IR using Self-Attention

Das¹,

Li²,

Fosler‐Lussier³

et al. 2020

Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…In a related medical IR challenge (Roberts et al 2017) the authors specifically mention that with only six partially annotated queries for system development, it is likely that systems were either under-or over-tuned on these queries. Since the setup of the seq2set framework is an attempt to model the PRF based query expansion method of its closest related work (Das et al 2018) where the effort is also to train a neural generalized language model for unsupervised semantic tagging, we choose this system as the benchmark to compare against to our end-to-end approach for the same task.…”

Section: Related Workmentioning

confidence: 99%

“…Experiments -Unsupervised Task Setting We ran several sets of experiments with various document encoders, employing word embedding schemes like skipgram (Mikolov et al 2013) andProbabilistic Fasttext (Athiwaratkun, Wilson, andAnandkumar 2018). The experimental setup used is the same as the Phrase2VecGLM (Das et al 2018), the only other known system for this dataset, that performs "unsupervised semantic tagging of documents by PRF", for downstream query expansion. Thus we take this system as the current state-of-the-art system baseline, while our non-attention-based document encoding models constitute our standard baselines.…”

Section: Task Settings Semantic Tagging For Query Expansionmentioning

confidence: 99%

Sequence-to-Set Semantic Tagging: End-to-End Multi-label Prediction using Neural Attention for Complex Query Reformulation and Automated Text Categorization

Das,

Li,

Fosler-Lussier

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Novel contexts may often arise in complex querying scenarios such as in evidence-based medicine (EBM) involving biomedical literature, that may not explicitly refer to entities or canonical concept forms occurring in any fact-or rule-based knowledge source such as an ontology like the UMLS. Moreover, hidden associations between candidate concepts meaningful in the current context, may not exist within a single document, but within the collection, via alternate lexical forms. Thus, to predict such semantically related concept tags, and inspired by the recent success of sequence-to-sequence neural models in delivering the state-of-the-art in a wide range of NLP tasks, we develop a novel sequence-to-set framework with neural attention for learning document representations that can effect term transfer within the corpus, for semantically tagging a large collection of documents. We demonstrate that our proposed method can be effective in both a supervised multi-label classification setup for text categorization, as well as in a unique unsupervised setting with no human-annotated document labels that uses no external knowledge resources and only corpus-derived term statistics to drive the training. Further, we show that semi-supervised training using our architecture on large amounts of unlabeled data can augment performance on the text categorization task when limited labeled data is available. Our approach to generate document encodings employing our sequenceto-set models for inference of semantic tags, gives to the best of our knowledge, the state-of-the-art for both, the unsupervised query expansion task for the TREC CDS 2016 challenge dataset when evaluated on an Okapi BM25-based document retrieval system; and also over the MLTM baseline (Soleimani and Miller 2016), for both supervised and semi-supervised multi-label prediction tasks on the del.icio.us and Ohsumed datasets. We will make our code and data publicly available.Recent times have seen an upsurge in efforts towards personalized medicine where clinicians tailor their medical decisions to the individual patient, based on the patient's genetic information, other molecular analysis, and the patient's

show abstract

Head Concepts Selection for Verbose Medical Queries Expansion

et al. 2020

View full text Add to dashboard Cite

Semantic concepts and relations encoded in domain-specific ontologies and other medical semantic resources play a crucial role in deciphering terms in medical queries and documents. The exploitation of these resources for tackling the semantic gap issue has been widely studied in the literature. However, there are challenges that hinder their widespread use in real-world applications. Among these challenges is the insufficient knowledge individually encoded in existing medical ontologies, which is magnified when users express their information needs using long-winded natural language queries. In this context, many of the users' query terms are either unrecognized by the used ontologies, or cause retrieving false positives that degrade the quality of current medical information search approaches. In this article, we explore the combination of multiple extrinsic semantic resources in the development of a full-fledged medical information search framework to: i) highlight and expand head medical concepts in verbose medical queries (i.e. concepts among query terms that significantly contribute to the informativeness and intent of a given query), ii) build semantically-enhanced inverted index documents, iii) contribute to a heuristical weighting technique in the query-document matching process. To demonstrate the effectiveness of the proposed approach, we conducted several experiments over the CLEF e-Health 2014 dataset. Findings indicate that the proposed method combining several extrinsic semantic resources proved to be more effective than related approaches in terms of precision measure.

show abstract

Phrase2VecGLM: Neural generalized language model–based semantic tagging for complex query reformulation in medical IR

Cited by 3 publications

References 31 publications

Sequence-to-Set Semantic Tagging for Complex Query Reformulation and Automated Text Categorization in Biomedical IR using Self-Attention

Sequence-to-Set Semantic Tagging for Complex Query Reformulation and Automated Text Categorization in Biomedical IR using Self-Attention

Sequence-to-Set Semantic Tagging: End-to-End Multi-label Prediction using Neural Attention for Complex Query Reformulation and Automated Text Categorization

Head Concepts Selection for Verbose Medical Queries Expansion

Contact Info

Product

Resources

About