Paraphrases are sentences or phrases that convey the same meaning using different wording. Although the logical definition of paraphrases requires strict semantic equivalence, linguistics accepts a broader, approximate, equivalence-thereby allowing far more examples of "quasiparaphrase." But approximate equivalence is hard to define. Thus, the phenomenon of paraphrases, as understood in linguistics, is difficult to characterize. In this article, we list a set of 25 operations that generate quasi-paraphrases. We then empirically validate the scope and accuracy of this list by manually analyzing random samples of two publicly available paraphrase corpora. We provide the distribution of naturally occurring quasi-paraphrases in English text.
We present a graph-based semi-supervised label propagation algorithm for acquiring opendomain labeled classes and their instances from a combination of unstructured and structured text sources. This acquisition method significantly improves coverage compared to a previous set of labeled classes and instances derived from free text, while achieving comparable precision.
Paraphrases are textual expressions that convey the same meaning using different surface forms. Capturing the variability of language, they play an important role in many natural language applications including question answering, machine translation, and multidocument summarization. In linguistics, paraphrases are characterized by approximate conceptual equivalence. Since no automated semantic interpretation systems available today can identify conceptual equivalence, paraphrases are difficult to acquire without human effort. In this paper, we present a method for automatically acquiring paraphrases using a monolingual corpus. We learn paraphrases at both the surface and lexico-syntactic levels and build two paraphrase resources each containing about 2 million phrases. We evaluate these paraphrases extrinsically by using them to learn patterns for Information Extraction (IE). We show that the lexico-syntactic paraphrases performs better than the surface-level paraphrases for IE. We further show that the patterns learned using the lexicosyntactic paraphrases attain comparable performance to the traditional IE approach of learning patterns from domain-specific corpora.
<p>An ever-increasing energy demand and
environmental problems associated with exhaustible fossil fuels have led to the
search for an alternative renewable source of energy. In this context, biodiesel
has attracted attention worldwide as an alternative to fossil fuel for being
renewable, non-toxic, biodegradable, carbon-neutral; hence eco-friendly. Despite
homogeneous catalyst has its own merits, currently, much attention has been paid
to chemically synthesize heterogeneous catalysts for biodiesel production as it
can be tuned as per specific requirement, easily recovered, thus enhance
reusability. Recently, biomass-derived heterogeneous catalysts have risen to
the forefront of biodiesel productions because of their sustainable, economical
and eco-friendly nature. Further, nano and bifunctional catalysts have emerged
as a powerful catalyst largely due to their high surface area and potential to
convert free fatty acids and triglycerides to biodiesel, respectively. This
review highlighted the latest synthesis routes of various types of catalysts
including acidic, basic, bifunctional and nanocatalysts derived from different chemicals
as well as biomass. In addition, the impacts of different methods of
preparation of catalysts on the yield of biodiesel are also discussed in
details.</p>
Extreme multi-label classification (XMC) systems have been successfully applied in ecommerce (Shen et al., 2020;Dahiya et al., 2021) for retrieving products based on customer behavior. Such systems require large amounts of customer behavior data (e.g. queries, clicks, purchases) for training. However, behavioral data is limited in low-traffic e-commerce stores, impacting performance of these systems. In this paper, we present a technique that augments behavioral training data via query reformulation. We use the Aggregated Label eXtreme Multi-label Classification (AL-XMC) system (Shen et al., 2020) as an example semantic matching model and show via crowd-sourced human judgments that, when the training data is augmented through query reformulations, the quality of AL-XMC improves over a baseline that does not use query reformulation. We also show in online A/B tests that our method significantly improves business metrics for the AL-XMC model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.