The problem of generating a set of diverse paraphrase sentences while (1) not compromising the original meaning of the original sentence, and (2) imposing diversity in various semantic aspects, such as a lexical or syntactic structure, is examined. Existing work on paraphrase generation has focused more on the former, and the latter was trained as a fixed style transfer, such as transferring from positive to negative sentiments, even at the cost of losing semantics. In this work, we consider style transfer as a means of imposing diversity, with a paraphrasing correctness constraint that the target sentence must remain a paraphrase of the original sentence. However, our goal is to maximize the diversity for a set of k generated paraphrases, denoted as the diversified paraphrase (DP) problem. Our key contribution is deciding the style guidance at generation towards the direction of increasing the diversity of output with respect to those generated previously. As pre-materializing training data for all style decisions is impractical, we train with biased data, but with debiasing guidance. Compared to state-of-the-art methods, our proposed model can generate more diverse and yet semantically consistent paraphrase sentences. That is, our model, trained with the MSCOCO dataset, achieves the highest embedding scores, .94/.95/.86, similar to state-of-the-art results, but with a lower mBLEU score (more diverse) by 8.73%.
As amyloid-β (Aβ) peptide is considered a biomarker and pathological culprit of Alzheimer's disease, Aβtargeting compounds have been investigated for diagnostics development and drug discovery of the disorder. Unlike amyloid plaque targeting agents, such as clinically available amyloid radiotracers intercalating into the β-sheet structures of the aggregates, monomer and oligomer targeting chemicals are difficult to develop, as the transient and polymorphic nature of these peptides impedes their structural understanding. Here, we report a mapping approach to explore targeting residues of Aβ-imaging probes and Aβ-regulating drug candidates by utilizing a set of fragmented Aβ hexamers immobilized on a 96-well microplate in combination with fluorescent full-length Aβ for on-plate aggregation. To evaluate the mapping potential of the peptide plate, we tested previously reported fluorescent imaging agents (CRANAD-28, bis-ANS), aggregation inhibitors (curcumin, scyllo-inositol), and aggregate dissociators (necrostatin-1, sunitinib) targeting Aβ. Our approach enabled mechanistic understanding of compounds targeting nonfibrillar Aβ on an interacting sequence level.
This paper studies the problem of supporting question answering in a new language with limited training resources. As an extreme scenario, when no such resource exists, one can (1) transfer labels from another language, and (2) generate labels from unlabeled data, using translator and automatic labeling function respectively. However, these approaches inevitably introduce noises to the training data, due to translation or generation errors, which require a judicious use of data with varying confidence. To address this challenge, we propose a weakly-supervised framework that quantifies such noises from automatically generated labels, to deemphasize or fix noisy data in training. On reading comprehension task, we demonstrate the effectiveness of our model on low-resource languages with varying similarity to English, namely, Korean and French.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.