Long-term traffic prediction is highly challenging due to the complexity of traffic systems and the constantly changing nature of many impacting factors. In this paper, we focus on the spatio-temporal factors, and propose a graph multi-attention network (GMAN) to predict traffic conditions for time steps ahead at different locations on a road network graph. GMAN adapts an encoder-decoder architecture, where both the encoder and the decoder consist of multiple spatio-temporal attention blocks to model the impact of the spatio-temporal factors on traffic conditions. The encoder encodes the input traffic features and the decoder predicts the output sequence. Between the encoder and the decoder, a transform attention layer is applied to convert the encoded traffic features to generate the sequence representations of future time steps as the input of the decoder. The transform attention mechanism models the direct relationships between historical and future time steps that helps to alleviate the error propagation problem among prediction time steps. Experimental results on two real-world traffic prediction tasks (i.e., traffic volume prediction and traffic speed prediction) demonstrate the superiority of GMAN. In particular, in the 1 hour ahead prediction, GMAN outperforms state-of-the-art methods by up to 4% improvement in MAE measure. The source code is available at https://github.com/zhengchuanpan/GMAN.
The development of new protein-ligand scoring functions using machine learning algorithms, such as random forest, has been of significant interest. By efficiently utilizing expanded feature sets and a large set of experimental data, random forest based scoring functions (RFbScore) can achieve better correlations to experimental protein-ligand binding data with known crystal structures; however, more extensive tests indicate that such enhancement in scoring power comes with significant under-performance in docking and screening power tests compared to traditional scoring functions. In this work, in order to improve scoring-docking-screening powers of protein-ligand docking functions simultaneously, we have introduced a ΔvinaRF parameterization and feature selection framework based on random forest. Our developed scoring function ΔvinaRF20, which employs twenty descriptors in addition to the AutoDock Vina score, can achieve superior performance in all power tests of both CASF-2013 and CASF-2007 benchmarks compared to classical scoring functions. The ΔvinaRF20 scoring function and its code are freely available on the web at: https://www.nyu.edu/projects/yzhang/DeltaVina.
Recent developments have made model-based imputation of network data feasible in principle, but the extant literature provides few practical examples of its use. In this paper we consider 14 schools from the widely used In-School Survey of Add Health (Harris et al., 2009), applying an ERGM-based estimation and simulation approach to impute the network missing data for each school. Add Health's complex study design leads to multiple types of missingness, and we introduce practical techniques for handing each. We also develop a cross-validation based method – Held-Out Predictive Evaluation (HOPE) – for assessing this approach. Our results suggest that ERGM-based imputation of edge variables is a viable approach to the analysis of complex studies such as Add Health, provided that care is used in understanding and accounting for the study design.
Data sparsity and data imbalance are practical and challenging issues in cross-domain recommender systems. This paper addresses those problems by leveraging the concepts which derive from representation learning, adversarial learning and transfer learning (particularly, domain adaptation). Although various transfer learning methods have shown promising performance in this context, our proposed novel method RecSys-DAN focuses on alleviating the cross-domain and within-domain data sparsity and data imbalance and learns transferable latent representations for users, items and their interactions. Different from existing approaches, the proposed method transfers the latent representations from a source domain to a target domain in an adversarial way. The mapping functions in the target domain are learned by playing a min-max game with an adversarial loss, aiming to generate domain indistinguishable representations for a discriminator. Four neural architectural instances of ResSys-DAN are proposed and explored. Empirical results on real-world Amazon data show that, even without using labeled data (i.e., ratings) in the target domain, RecSys-DAN achieves competitive performance as compared to the state-of-the-art supervised methods. More importantly, RecSys-DAN is highly flexible to both unimodal and multimodal scenarios, and thus it is more robust to the cold-start recommendation which is difficult for previous methods.Index Terms-adversarial learning, neural networks, recommender systems, imbalanced data, domain adaptation * Corresponding author.Cheng Wang is with the
A review text is normally represented as a bag-of-words (BOW) in sentiment classification. Such a simplified BOW model has fundamental deficiencies in modeling some complex linguistic phenomena such as negation. In this work, we propose a dual-view co-training algorithm based on dual-view BOW representation for semisupervised sentiment classification. In dual-view BOW, we automatically construct antonymous reviews and model a review text by a pair of bags-of-words with opposite views. We make use of the original and antonymous views in pairs, in the training, bootstrapping and testing process, all based on a joint observation of two views. The experimental results demonstrate the advantages of our approach, in meeting the two co-training requirements, addressing the negation problem, and enhancing the semi-supervised sentiment classification efficiency.
Estimation of variances and covariances is required for many statistical methods such as t-test, principal component analysis and linear discriminant analysis.High-dimensional data such as gene expression microarray data and financial data pose challenges to traditional statistical and computational methods. In this paper, we review some recent developments in the estimation of variances, covariance matrix, and precision matrix, with emphasis on the applications to microarray data analysis.
A new method of modified optimization of double helical gears is proposed based on reducing vibration and noise and raising machining efficiency. Firstly, the straight profile of rack-cutter edge is replaced by three segment parabolas, and the equation of the rack-cutter profile is ultimately represented in the rack-cutter surface. Secondly, the physical and mathematical model of tooth contact analysis and loaded tooth contact analysis of double helical gears are introduced and then the loaded transmission errors are obtained. The optimal modification parameters are achieved based on the minimum amplitude of loaded transmission error. Finally, a set of equipment for measuring loaded transmission error, the analysis software platform and the test-bed of vibration and noise are designed, respectively. The feasibility of the method of modified optimization is verified. Compared with vibration and noise before modification, those after modification averagely decrease 18% and 2.7 dB.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.