Keyphrase provides highly-summative information that can be effectively used for understanding, organizing and retrieving text content. Though previous studies have provided many workable solutions for automated keyphrase extraction, they commonly divided the to-be-summarized content into multiple text chunks, then ranked and selected the most meaningful ones. These approaches could neither identify keyphrases that do not appear in the text, nor capture the real semantic meaning behind the text. We propose a generative model for keyphrase prediction with an encoder-decoder framework, which can effectively overcome the above drawbacks. We name it as deep keyphrase generation since it attempts to capture the deep semantic meaning of the content with a deep learning method. Empirical analysis on six datasets demonstrates that our proposed model not only achieves a significant performance boost on extracting keyphrases that appear in the source text, but also can generate absent keyphrases based on the semantic meaning of the text. Code and dataset are available at https://github.com/memray/seq2seq-keyphrase.
Different texts shall by nature correspond to different number of keyphrases. This desideratum is largely missing from existing neural keyphrase generation models. In this study, we address this problem from both modeling and evaluation perspectives.We first propose a recurrent generative model that generates multiple keyphrases as delimiter-separated sequences. Generation diversity is further enhanced with two novel techniques by manipulating decoder hidden states. In contrast to previous approaches, our model is capable of generating diverse keyphrases and controlling number of outputs.We further propose two evaluation metrics tailored towards the variable-number generation. We also introduce a new dataset (ST A C KEX) that expands beyond the only existing genre (i.e., academic writing) in keyphrase generation tasks. With both previous and new evaluation metrics, our model outperforms strong baselines on all datasets.
Sentence simplification aims to reduce the complexity of a sentence while retaining its original meaning. Current models for sentence simplification adopted ideas from machine translation studies and implicitly learned simplification mapping rules from normalsimple sentence pairs. In this paper, we explore a novel model based on a multi-layer and multi-head attention architecture and we propose two innovative approaches to integrate the Simple PPDB (A Paraphrase Database for Simplification), an external paraphrase knowledge base for simplification that covers a wide range of real-world simplification rules. The experiments show that the integration provides two major benefits: (1) the integrated model outperforms multiple stateof-the-art baseline models for sentence simplification in the literature (2) through analysis of the rule utilization, the model seeks to select more accurate simplification rules. The code and models used in the paper are available at https://github.com/ Sanqiang/text_simplification.
Recently, concatenating multiple keyphrases as a target sequence has been proposed as a new learning paradigm for keyphrase generation. Existing studies concatenate target keyphrases in different orders but no study has examined the effects of ordering on models' behavior. In this paper, we propose several orderings for concatenation and inspect the important factors for training a successful keyphrase generation model. By running comprehensive comparisons, we observe one preferable ordering and summarize a number of empirical findings and challenges, which can shed light on future research on this line of work.
No abstract
Objective This study aimed to conduct a systematic review and meta-analysis to compare differences in health utilities (HUs) assessed by self and proxy respondents in children, as well as to evaluate the effects of health conditions, valuation methods, and proxy types on the differences. Methods Eligible studies published in PubMed, Embase, Web of Science, and Cochrane Library up to December 2019 were identified according to PRISMA guidelines. Meta-analyses were performed to calculate the weighted mean differences (WMDs) in HUs between proxy- versus self-reports. Mixed-effects meta-regressions were applied to explore differences in WMDs among each health condition, valuation method and proxy type. Results A total of 30 studies were finally included, comprising 211 pairs of HUs assessed by 15,294 children and 16,103 proxies. This study identified 34 health conditions, 10 valuation methods, and 3 proxy types. In general, proxy-reported HUs were significantly different from those assessed by children themselves, while the direction and magnitude of these differences were inconsistent regarding health conditions, valuation methods, and proxy types. Meta-regression demonstrated that WMDs were significantly different in patients with ear diseases relative to the general population; in those measured by EQ-5D, Health utility index 2 (HUI2), and Pediatric asthma health outcome measure relative to Visual analogue scale method; while were not significantly different in individuals adopting clinician-proxy and caregiver-proxy relative to parent-proxy. Conclusion Divergence existed in HUs between self and proxy-reports. Our findings highlight the importance of selecting appropriate self and/or proxy-reported HUs in health-related quality of life measurement and economic evaluations.
Different texts shall by nature correspond to different number of keyphrases. This desideratum is largely missing from existing neural keyphrase generation models. In this study, we address this problem from both modeling and evaluation perspectives.We first propose a recurrent-generative model that generates multiple keyphrases as delimiter-separated sequences. Generation diversity is further enhanced with two novel techniques by manipulating decoder hidden states. In contrast to previous approaches, our model is capable of generating variable number of diverse keyphrases.We further propose two evaluation metrics tailored towards variable-number generation. We also introduce a new dataset (ST A C KEX) that expand beyond the only existing genre (i.e., academic writing) in keyphrase generation tasks. With both previous and new evaluation metrics, our model outperforms strong baselines on all datasets.
has been administrating a national cervical cancer screening program since 1992 by coordinating triennial cytology exam screenings for the female population between 25 and 69 years of age. Up to 80% of cancers are prevented through mass screening, but this comes at the expense of considerable screening activity and leads to overtreatment of clinically asymptomatic precancers. In this article, we present a continuous-time, time-inhomogeneous hidden Markov model which was developed to understand the screening process and cervical cancer carcinogenesis in detail. By leveraging 1.7 million individual's multivariate time-series of medical exams performed over a 25-year period, we simultaneously estimate all model parameters. We show that an age-dependent model reflects the Norwegian screening program by comparing empirical survival curves from observed registry data and data simulated from the proposed model. The model can be generalized to include more detailed individual-level covariates as well as new types of screening exams. By utilizing individual screening histories and covariate data, the proposed model shows potential for improving strategies for cancer screening programs by personalizing recommended screening intervals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.