Over the past few years, massive amounts of world knowledge have been accumulated in publicly available knowledge bases, such as Freebase, NELL, and YAGO. Yet despite their seemingly huge size, these knowledge bases are greatly incomplete. For example, over 70% of people included in Freebase have no known place of birth, and 99% have no known ethnicity. In this paper, we propose a way to leverage existing Web-search-based question-answering technology to fill in the gaps in knowledge bases in a targeted way. In particular, for each entity attribute, we learn the best set of queries to ask, such that the answer snippets returned by the search engine are most likely to contain the correct value for that attribute. For example, if we want to find Frank Zappa's mother, we could ask the query who is the mother of Frank Zappa. However, this is likely to return 'The Mothers of Invention', which was the name of his band. Our system learns that it should (in this case) add disambiguating terms, such as Zappa's place of birth, in order to make it more likely that the search results contain snippets mentioning his mother. Our system also learns how many different queries to ask for each attribute, since in some cases, asking too many can hurt accuracy (by introducing false positives). We discuss how to aggregate candidate answers across multiple queries, ultimately returning probabilistic predictions for possible values for each attribute. Finally, we evaluate our system and show that it is able to extract a large number of facts with high confidence.
Recent years have witnessed a proliferation of large-scale knowledge bases, including Wikipedia, Freebase, YAGO, Microsoft's Satori, and Google's Knowledge Graph. To increase the scale even further, we need to explore automatic methods for constructing knowledge bases. Previous approaches have primarily focused on text-based extraction, which can be very noisy. Here we introduce Knowledge Vault, a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories. We employ supervised machine learning methods for fusing these distinct information sources. The Knowledge Vault is substantially bigger than any previously published structured knowledge repository, and features a probabilistic inference system that computes calibrated probabilities of fact correctness. We report the results of multiple studies that explore the relative utility of the different information sources and extraction methods.
The task of data fusion is to identify the true values of data items (e.g., the true date of birth for Tom Cruise) among multiple observed values drawn from different sources (e.g., Web sites) of varying (and unknown) reliability. A recent survey [20] has provided a detailed comparison of various fusion methods on Deep Web data. In this paper, we study the applicability and limitations of different fusion techniques on a more challenging problem: knowledge fusion. Knowledge fusion identifies true subject-predicateobject triples extracted by multiple information extractors from multiple information sources. These extractors perform the tasks of entity linkage and schema alignment, thus introducing an additional source of noise that is quite different from that traditionally considered in the data fusion literature, which only focuses on factual errors in the original sources. We adapt state-of-the-art data fusion techniques and apply them to a knowledge base with 1.6B unique knowledge triples extracted by 12 extractors from over 1B Web pages, which is three orders of magnitude larger than the data sets used in previous data fusion papers. We show great promise of the data fusion approaches in solving the knowledge fusion problem, and suggest interesting research directions through a detailed error analysis of the methods.
Advanced oxidation protein products (AOPPs), a novel protein marker of oxidative damage, have been confirmed to accumulate in patients with inflammatory bowel disease (IBD), as well as those with diabetes and chronic kidney disease. However, the role of AOPPs in the intestinal epithelium remains unclear. This study was designed to investigate whether AOPPs have an effect on intestinal epithelial cell (IEC) death and intestinal injury. Immortalized rat intestinal epithelial (IEC-6) cells and normal Sprague Dawley rats were treated with AOPP-albumin prepared by incubation of rat serum albumin (RSA) with hypochlorous acid. Epithelial cell death, nicotinamide adenine dinucleotide phosphate (NADPH) oxidase subunit activity, reactive oxygen species (ROS) generation, apoptosis-related protein expression, and c-jun N-terminal kinase (JNK) phosphorylation were detected both in vivo and in vitro. In addition, we measured AOPPs deposition and IEC death in 23 subjects with Crohn's disease (CD). Extracellular AOPP-RSA accumulation induced apoptosis in IEC-6 cultures. The triggering effect of AOPPs was mainly mediated by a redox-dependent pathway, including NADPH oxidase-derived ROS generation, JNK phosphorylation, and poly (ADP-ribose) polymerase-1 (PARP-1) activation. Chronic AOPP-RSA administration to normal rats resulted in AOPPs deposition in the villous epithelial cells and in inflammatory cells in the lamina propria. These changes were companied with IEC death, inflammatory cellular infiltration, and intestinal injury. Both cell death and intestinal injury were ameliorated by chronic treatment with apocynin. Furthermore, AOPPs deposition was also observed in IECs and inflammatory cells in the lamina propria of patients with CD. The high immunoreactive score of AOPPs showed increased apoptosis. Our results demonstrate that AOPPs trigger IEC death and intestinal tissue injury via a redox-mediated pathway. These data suggest that AOPPs may represent a novel pathogenic factor that contributes to IBD progression. Targeting AOPP-induced cellular mechanisms might emerge as a promising therapeutic option for patients with IBD.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.