Background: Various methods for differential expression analysis have been widely used to identify features which best distinguish between different categories of samples. Multiple hypothesis testing may leave out explanatory features, each of which may be composed of individually insignificant variables. Multivariate hypothesis testing holds a non-mainstream position, considering the large computation overhead of large-scale matrix operation. Random forest provides a classification strategy for calculation of variable importance. However, it may be unsuitable for different distributions of samples. Results: Based on the thought of using an ensemble classifier, we develop a feature selection tool for differential expression analysis on expression profiles (i.e., ECFS-DEA for short). Considering the differences in sample distribution, a graphical user interface is designed to allow the selection of different base classifiers. Inspired by random forest, a common measure which is applicable to any base classifier is proposed for calculation of variable importance. After an interactive selection of a feature on sorted individual variables, a projection heatmap is presented using k-means clustering. ROC curve is also provided, both of which can intuitively demonstrate the effectiveness of the selected feature. Conclusions: Feature selection through ensemble classifiers helps to select important variables and thus is applicable for different sample distributions. Experiments on simulation and realistic data demonstrate the effectiveness of ECFS-DEA for differential expression analysis on expression profiles. The software is available at http://bio-nefu.com/resource/ecfs-dea.
Large-scale flow models constructed using standard coarsening procedures may not accurately resolve detailed near-well effects. Such effects are often important to capture, however, as the interaction of the well with the formation can have a dominant impact on process performance. In this work, a near-well upscaling procedure, which provides three-phase wellblock properties, is developed and tested. The overall approach represents an extension of a recently developed oil-gas upscaling procedure and entails the use of local well computations (over a region referred to as the local well model (LWM)) along with a gradient-based optimization procedure to minimize the mismatch between fine and coarse-scale well rates, for oil, gas, and water, over the LWM. The gradients required for the minimization are computed efficiently through solution of adjoint equations. The LWM boundary conditions are determined using an iterative local-global procedure. With this approach, pressures and saturations computed during a global coarse-scale simulation are interpolated onto LWM boundaries and then used as boundary conditions for the fine-scale LWM computations. In addition to extending the overall approach to the three-phase case, this work also introduces new treatments that provide improved accuracy in cases with significant flux from the gas cap into the well block. The near-well multiphase upscaling method is applied to heterogeneous reservoir models, with production from vertical and horizontal wells. Simulation results illustrate that the method is able to accurately capture key near-well effects and to provide predictions for component production rates that are in close agreement with reference fine-scale results. The level of accuracy of the procedure is shown to be significantly higher than that of a standard approach which uses only upscaled single-phase flow parameters.
Existing end-to-end dialog systems perform less effectively when data is scarce. To obtain an acceptable success in real-life online services with only a handful of training examples, both fast adaptability and reliable performance are highly desirable for dialog systems. In this paper, we propose the Meta-Dialog System (MDS), which combines the advantages of both meta-learning approaches and human-machine collaboration. We evaluate our methods on a new extended-bAbI dataset and a transformed MultiWOZ dataset for lowresource goal-oriented dialog learning. Experimental results show that MDS significantly outperforms non-meta-learning baselines and can achieve more than 90% per-turn accuracies with only 10 dialogs on the extended-bAbI dataset.
Existing dialog state tracking (DST) models are trained with dialog data in a random order, neglecting rich structural information in a dataset. In this paper, we propose to use curriculum learning (CL) to better leverage both the curriculum structure and schema structure for task-oriented dialogs. Specifically, we propose a model-agnostic framework called Schema-aware Curriculum Learning for Dialog State Tracking (SaCLog), which consists of a preview module that pre-trains a DST model with schema information, a curriculum module that optimizes the model with CL, and a review module that augments mispredicted data to reinforce the CL training. We show that our proposed approach improves DST performance over both a transformerbased and RNN-based DST model (TripPy and TRADE) and achieves new state-of-the-art results on WOZ2.0 and MultiWOZ2.1.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.