No abstract
Database researchers have striven to improve the capability of a database in terms of both performance and functionality. We assert that the usability of a database is as important as its capability. In this paper, we study why database systems today are so difficult to use. We identify a set of five pain points and propose a research agenda to address these. In particular, we introduce a presentation data model and recommend direct data manipulation with a schema later approach. We also stress the importance of provenance and of consistency across presentation models.
Regular expressions have served as the dominant workhorse of practical information extraction for several years. However, there has been little work on reducing the manual effort involved in building high-quality, complex regular expressions for information extraction tasks. In this paper, we propose Re-LIE, a novel transformation-based algorithm for learning such complex regular expressions. We evaluate the performance of our algorithm on multiple datasets and compare it against the CRF algorithm. We show that ReLIE, in addition to being an order of magnitude faster, outperforms CRF under conditions of limited training data and cross-domain data. Finally, we show how the accuracy of CRF can be improved by using features extracted by ReLIE.
We examine wet scavenging of soluble trace gases in storms observed during the Deep Convective Clouds and Chemistry (DC3) field campaign. We conduct high‐resolution simulations with the Weather Research and Forecasting model with Chemistry (WRF‐Chem) of a severe storm in Oklahoma. The model represents well the storm location, size, and structure as compared with Next Generation Weather Radar reflectivity, and simulated CO transport is consistent with aircraft observations. Scavenging efficiencies (SEs) between inflow and outflow of soluble species are calculated from aircraft measurements and model simulations. Using a simple wet scavenging scheme, we simulate the SE of each soluble species within the error bars of the observations. The simulated SEs of all species except nitric acid (HNO3) are highly sensitive to the values specified for the fractions retained in ice when cloud water freezes. To reproduce the observations, we must assume zero ice retention for formaldehyde (CH2O) and hydrogen peroxide (H2O2) and complete retention for methyl hydrogen peroxide (CH3OOH) and sulfur dioxide (SO2), likely to compensate for the lack of aqueous chemistry in the model. We then compare scavenging efficiencies among storms that formed in Alabama and northeast Colorado and the Oklahoma storm. Significant differences in SEs are seen among storms and species. More scavenging of HNO3 and less removal of CH3OOH are seen in storms with higher maximum flash rates, an indication of more graupel mass. Graupel is associated with mixed‐phase scavenging and lightning production of nitrogen oxides (NOx), processes that may explain the observed differences in HNO3 and CH3OOH scavenging.
We have developed semi‐independent methods for determining CH2O scavenging efficiencies (SEs) during strong midlatitude convection over the western, south‐central Great Plains, and southeastern regions of the United States during the 2012 Deep Convective Clouds and Chemistry (DC3) Study. The Weather Research and Forecasting model coupled with chemistry (WRF‐Chem) was employed to simulate one DC3 case to provide an independent approach of estimating SEs and the opportunity to study CH2O retention in ice when liquid drops freeze. Measurements of CH2O in storm inflow and outflow were acquired on board the NASA DC‐8 and the NSF/National Center for Atmospheric Research Gulfstream V (GV) aircraft employing cross‐calibrated infrared absorption spectrometers. This study also relied heavily on the nonreactive tracers i‐/n‐butane and i‐/n‐pentane measured on both aircraft in determining lateral entrainment rates during convection as well as their ratios to ensure that inflow and outflow air masses did not have different origins. Of the five storm cases studied, the various tracer measurements showed that the inflow and outflow from four storms were coherently related. The combined average of the various approaches from these storms yield remarkably consistent CH2O scavenging efficiency percentages of: 54% ± 3% for 29 May; 54% ± 6% for 6 June; 58% ± 13% for 11 June; and 41 ± 4% for 22 June. The WRF‐Chem SE result of 53% for 29 May was achieved only when assuming complete CH2O degassing from ice. Further analysis indicated that proper selection of corresponding inflow and outflow time segments is more important than the particular mixing model employed.
Entity resolution (ER) is the task of identifying different representations of the same real-world entities across databases. It is a key step for knowledge base creation and text mining. Recent adaptation of deep learning methods for ER mitigates the need for dataset-specific feature engineering by constructing distributed representations of entity records. While these methods achieve stateof-the-art performance over benchmark data, they require large amounts of labeled data, which are typically unavailable in realistic ER applications. In this paper, we develop a deep learning-based method that targets lowresource settings for ER through a novel combination of transfer learning and active learning. We design an architecture that allows us to learn a transferable model from a highresource setting to a low-resource one. To further adapt to the target dataset, we incorporate active learning that carefully selects a few informative examples to fine-tune the transferred model. Empirical evaluation demonstrates that our method achieves comparable, if not better, performance compared to state-of-the-art learning-based methods while using an order of magnitude fewer labels. 1 17k labels were used for the DBLP-Scholar scenario.
Semantic role labeling (SRL) is crucial to natural language understanding as it identifies the predicate-argument structure in text with semantic labels. Unfortunately, resources required to construct SRL models are expensive to obtain and simply do not exist for most languages. In this paper, we present a two-stage method to enable the construction of SRL models for resourcepoor languages by exploiting monolingual SRL and multilingual parallel data. Experimental results show that our method outperforms existing methods. We use our method to generate Proposition Banks with high to reasonable quality for 7 languages in three language families and release these resources to the research community.
Deep convective transport of gaseous precursors to ozone (O3) and aerosols to the upper troposphere is affected by liquid phase and mixed‐phase scavenging, entrainment of free tropospheric air and aqueous chemistry. The contributions of these processes are examined using aircraft measurements obtained in storm inflow and outflow during the 2012 Deep Convective Clouds and Chemistry (DC3) experiment combined with high‐resolution (dx≤3 km) WRF‐Chem simulations of a severe storm, an air mass storm, and a mesoscale convective system (MCS). The simulation results for the MCS suggest that formaldehyde (CH2O) is not retained in ice when cloud water freezes, in agreement with previous studies of the severe storm. By analyzing WRF‐Chem trajectories, the effects of scavenging, entrainment, and aqueous chemistry on outflow mixing ratios of CH2O, methyl hydroperoxide (CH3OOH), and hydrogen peroxide (H2O2) are quantified. Liquid phase microphysical scavenging was the dominant process reducing CH2O and H2O2 outflow mixing ratios in all three storms. Aqueous chemistry did not significantly affect outflow mixing ratios of all three species. In the severe storm and MCS, the higher than expected reductions in CH3OOH mixing ratios in the storm cores were primarily due to entrainment of low‐background CH3OOH. In the air mass storm, lower CH3OOH and H2O2 scavenging efficiencies (SEs) than in the MCS were partly due to entrainment of higher background CH3OOH and H2O2. Overestimated rain and hail production in WRF‐Chem reduces the confidence in ice retention fraction values determined for the peroxides and CH2O.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.