Manirupa Das scite author profile

Emerging infectious diseases are critical issues of public health and the economic and social stability of nations. As demonstrated by the international response to the severe acute respiratory syndrome (SARS) and influenza A, rapid genomic sequencing is a crucial tool to understand diseases that occur at the interface of human and animal populations. However, our ability to make sense of sequence data lags behind our ability to acquire the data. The potential of sequence data on pathogens is not fully realized until raw data are translated into public health intelligence. Sequencing technologies have become highly mechanized. If the political will for data sharing remains strong, the frontier for progress in emerging infectious diseases will be in analysis of sequence data and translation of results into better public health science and policy. For example, applying analytical tools such as Supramap (http://supramap.osu.edu) to genomic data for pathogens, public health scientists can track specific mutations in pathogens that confer the ability to infect humans or resist drugs. The results produced by the Supramap application are compelling visualizations of pathogen lineages and features mapped into geographic information systems that can be used to test hypotheses and to follow the spread of diseases across geography and hosts and communicate the results to a wide audience.

show abstract

Phrase2VecGLM: Neural generalized language model–based semantic tagging for complex query reformulation in medical IR

Das¹,

Fosler‐Lussier²,

Lin³

et al. 2018

View full text Add to dashboard Cite

In fact-based information retrieval, stateof-the-art performance is traditionally achieved by knowledge graphs driven by knowledge bases, as they can represent facts about and capture relationships between entities very well. However, in domains such as medical information retrieval, where addressing specific information needs of complex queries may require understanding query intent by capturing novel associations between potentially latent concepts, these systems can fall short. In this work, we develop a novel, completely unsupervised, neural language model-based ranking approach for semantic tagging of documents, using the document to be tagged as a query into the model to retrieve candidate phrases from top-ranked related documents, thus associating every document with novel related concepts extracted from the text. For this we extend the word embeddingbased generalized language model (GLM) due to (Ganguly et al., 2015), to employ phrasal embeddings, and use the semantic tags thus obtained for downstream query expansion, both directly and in feedback loop settings. Our method, evaluated using the TREC 2016 clinical decision support challenge dataset, shows statistically significant improvement not only over various baselines that use standard MeSH terms and UMLS concepts for query expansion, but also over baselines using human expert-assigned concept tags for the queries, on top of a standard Okapi BM25-based document retrieval system.

show abstract

Sequence-to-Set Semantic Tagging for Complex Query Reformulation and Automated Text Categorization in Biomedical IR using Self-Attention

Das¹,

Li²,

Fosler‐Lussier³

et al. 2020

View full text Add to dashboard Cite

Novel contexts, comprising a set of terms referring to one or more concepts, may often arise in complex querying scenarios such as in evidence-based medicine (EBM) involving biomedical literature. These may not explicitly refer to entities or canonical concept forms occurring in a fact-based knowledge source, e.g. the UMLS ontology. Moreover, hidden associations between related concepts meaningful in the current context, may not exist within a single document, but across documents in the collection. Predicting semantic concept tags of documents can therefore serve to associate documents related in unseen contexts, or categorize them, in information filtering or retrieval scenarios. Thus, inspired by the success of sequence-to-sequence neural models, we develop a novel sequence-to-set framework with attention, for learning document representations in a unique unsupervised setting, using no human-annotated document labels or external knowledge resources and only corpus-derived term statistics to drive the training. This can effect term transfer within a corpus for semantically tagging a large collection of documents. Our sequence-to-set modeling approach to predict semantic tags , gives to the best of our knowledge, the state-of-theart for both, an unsupervised query expansion (QE) task for the TREC CDS 2016 challenge dataset when evaluated on an Okapi BM25based document retrieval system; and also over the MLTM system baseline (Soleimani and Miller, 2016), for both supervised and semi-supervised multi-label prediction tasks with del.icio.us and Ohsumed datasets. We make our code and data publicly available 1 .

show abstract

High-spin states of $$^{204}$$At: isomeric states and shears band structure

et al. 2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Manirupa Das

Towards methods for systematic research on big data

Genome informatics of influenza A: from data sharing to shared analytical capabilities

Phrase2VecGLM: Neural generalized language model–based semantic tagging for complex query reformulation in medical IR

Sequence-to-Set Semantic Tagging for Complex Query Reformulation and Automated Text Categorization in Biomedical IR using Self-Attention

High-spin states of $$^{204}$$At: isomeric states and shears band structure

Contact Info

Product

Resources

About