Computational material discovery is under intense study owing to its ability to explore the vast space of chemical systems. Neural network potentials (NNPs) have been shown to be particularly effective in conducting atomistic simulations for such purposes. However, existing NNPs are generally designed for narrow target materials, making them unsuitable for broader applications in material discovery. Here we report a development of universal NNP called PreFerred Potential (PFP), which is able to handle any combination of 45 elements. Particular emphasis is placed on the datasets, which include a diverse set of virtual structures used to attain the universality. We demonstrated the applicability of PFP in selected domains: lithium diffusion in LiFeSO4F, molecular adsorption in metal-organic frameworks, an order–disorder transition of Cu-Au alloys, and material discovery for a Fischer–Tropsch catalyst. They showcase the power of PFP, and this technology provides a highly useful tool for material discovery.
Unsupervised domain adaptation is the problem setting where data generating distributions in the source and target domains are different and labels in the target domain are unavailable. An important question in unsupervised domain adaptation is how to measure the difference between the source and target domains. Existing discrepancy measures for unsupervised domain adaptation either require high computation costs or have no theoretical guarantee. To mitigate these problems, this paper proposes a novel discrepancy measure called source-guided discrepancy (S-disc), which exploits labels in the source domain unlike the existing ones. As a consequence, S-disc can be computed efficiently with a finitesample convergence guarantee. In addition, it is shown that S-disc can provide a tighter generalization error bound than the one based on an existing discrepancy measure. Finally, experimental results demonstrate the advantages of S-disc over the existing discrepancy measures.
We consider a document classification problem where document labels are absent but only relevant keywords of a target class and unlabeled documents are given. Although heuristic methods based on pseudo-labeling have been considered, theoretical understanding of this problem has still been limited. Moreover, previous methods cannot easily incorporate welldeveloped techniques in supervised text classification. In this paper, we propose a theoretically guaranteed learning framework that is simple to implement and has flexible choices of models, e.g., linear models or neural networks. We demonstrate how to optimize the area under the receiver operating characteristic curve (AUC) effectively and also discuss how to adjust it to optimize other well-known evaluation metrics such as the accuracy and F 1measure. Finally, we show the effectiveness of our framework using benchmark datasets.
Bottlenecks of binary classification from positive and unlabeled data (PU classification) are the requirements that given unlabeled patterns are drawn from the test marginal distribution, and the penalty of the false positive error is identical to the false negative error. However, such requirements are often not fulfilled in practice. In this paper, we generalize PU classification to the class prior shift and asymmetric error scenarios. Based on the analysis of the Bayes optimal classifier, we show that given a test class prior, PU classification under class prior shift is equivalent to PU classification with asymmetric error. Then, we propose two different frameworks to handle these problems, namely, a risk minimization framework and density ratio estimation framework. Finally, we demonstrate the effectiveness of the proposed frameworks and compare both frameworks through experiments using benchmark datasets.
Learning from triplet comparison data has been extensively studied in the context of metric learning, where we want to learn a distance metric between two instances, and ordinal embedding, where we want to learn an embedding in an Euclidean space of the given instances that preserves the comparison order as well as possible. Unlike fullylabeled data, triplet comparison data can be collected in a more accurate and humanfriendly way. Although learning from triplet comparison data has been considered in many applications, an important fundamental question of whether we can learn a classifier only from triplet comparison data has remained unanswered. In this paper, we give a positive answer to this important question by proposing an unbiased estimator for the classification risk under the empirical risk minimization framework. Since the proposed method is based on the empirical risk minimization framework, it inherently has the advantage that any surrogate loss function and any model, including neural networks, can be easily applied. Furthermore, we theoretically establish an estimation error bound for the proposed empirical risk minimizer. Finally, we provide experimental results to show that our method empirically works well and outperforms various baseline methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.