Polymer Genome is a web-based machine-learning capability to perform near-instantaneous predictions of a variety of polymer properties. The prediction models are trained on (and interpolate between) an underlying database of polymers and their properties obtained from first principles computations and experimental measurements. In this contribution, we first provide an overview of some of the critical technical aspects of Polymer Genome, including polymer data curation, representation, learning algorithms, and prediction model usage. Then, we provide a series of pedagogical examples to demonstrate how Polymer Genome can be used to predict dozens of polymer properties, appropriate for a range of applications. This contribution is closed with a discussion on the remaining challenges and possible future directions.
We study the problem of learning a named entity recognition (NER) tagger using noisy labels from multiple weak supervision sources. Though cheap to obtain, the labels from weak supervision sources are often incomplete, inaccurate, and contradictory, making it difficult to learn an accurate NER model. To address this challenge, we propose a conditional hidden Markov model (CHMM), which can effectively infer true labels from multi-source noisy labels in an unsupervised way. CHMM enhances the classic hidden Markov model with the contextual representation power of pretrained language models. Specifically, CHMM learns token-wise transition and emission probabilities from the BERT embeddings of the input tokens to infer the latent true labels from noisy observations. We further refine CHMM with an alternate-training approach (CHMM-ALT). It fine-tunes a BERT-NER model with the labels inferred by CHMM, and this BERT-NER's output is regarded as an additional weak source to train the CHMM in return. Experiments on four NER benchmarks from various domains show that our method outperforms state-of-the-art weakly supervised NER models by wide margins.
A single-step, template-free aerosol chemical vapor deposition (ACVD) method is demonstrated to grow well-aligned SnO 2 nanocolumn arrays. The ACVD system parameters, which control thin film morphologies, were systematically explored to gain a qualitative understanding of nanocolumn growth mechanisms. Key growth variables include feed rates, substrate temperature, and deposition time. System dynamics relating synthesis variables to aerosol characteristics and processes (collision and sintering) are elucidated. By adjusting system parameters, control of the aspect ratio, height, and crystal structure of columns is demonstrated. A self-catalyzed (SnO 2 particles) vapor-solid (VS) growth mechanism, whereby a vapor-particle deposition regime results in the formation of nanocrystals that act as nucleation sites for the preferential formation and growth of nanocolumns, is proposed and supported. Density functional theory (DFT) calculations indicate that the preferential orientation of thin films is a function of the system redox conditions, further supporting the proposed VS growth mechanism. When taken together, these results provide quantitative insight into the growth mechanismIJs) of SnO 2 nanocolumn thin films via ACVD, which is critical for engineering these, and other, nanostructured films for direct incorporation into functional devices.
Flexible polymer dielectrics tolerant to electric field and temperature extremes are urgently needed for a spectrum of electrical and electronic applications. Given the complexity of the dielectric breakdown mechanism and the vast chemical space of polymers, the discovery of suitable candidates is nontrivial. We have laid the foundation for a systematic search of the polymer chemical space, which starts with “gold-standard” experimental measurements and data on the temperature-dependent breakdown strength (E bd) for a benchmark set of commercial dielectric polymer films. Phenomenological guidelines are derived from this data set on easily accessible properties (or “proxies”) that are correlated with E bd. Screening criteria based on these proxy properties (e.g., band gap, charge injection barrier, and cohesive energy density) and other necessary characteristics (e.g., a high glass transition temperature to maintain the thermal stability and a high dielectric constant for high energy density) were then setup. These criteria, along with machine learning models of these properties, were used to screen polymers candidates from a candidate list of more than 13 000 previously synthesized polymers, followed by experimental validation of some of the screened candidates. These efforts have led to the creation of a consistent and high-quality data set of temperature-dependent E bd, and the identification of screening criteria, chemical design rules, and a list of optimal polymer candidates for high-temperature and high-energy-density capacitor applications, thus demonstrating the power of an integrated and informatics-based philosophy for rational materials design.
Weakly-supervised learning (WSL) has shown promising results in addressing label scarcity on many NLP tasks, but manually designing a comprehensive, high-quality labeling rule set is tedious and difficult. We study interactive weakly-supervised learning-the problem of iteratively and automatically discovering novel labeling rules from data to improve the WSL model. Our proposed model, named PR-BOOST, achieves this goal via iterative promptbased rule discovery and model boosting. It uses boosting to identify large-error instances and then discovers candidate rules from them by prompting pre-trained LMs with rule templates. The candidate rules are judged by human experts, and the accepted rules are used to generate complementary weak labels and strengthen the current model. Experiments on four tasks show PRBOOST outperforms state-of-the-art WSL baselines up to 7.1%, and bridges the gaps with fully supervised models.Our Implementation is available at https: //github.com/rz-zhang/PRBoost.
There is a need to scrutinise and retrieve information from data in today's world. Clustering is an analytical technique which involves dividing data into groups of similar objects. Every group is called a cluster, and it is formed from objects that have affinities within the cluster but are significantly different to objects in other groups. The aim of this paper is to look at and compare two different types of hierarchical clustering algorithms. Partition and hierarchical clustering are the two main types of clustering techniques. Hierarchical clustering algorithm is one of the algorithms discussed here. The aforementioned algorithms are described and analysed in terms of factors such as dataset size, data set type, number of clusters formed, consistency, accuracy, and efficiency. Hierarchical clustering is a cluster analysis technique that aims to create a hierarchy of clusters. A hierarchical clustering method is a set of simple (flat) clustering methods arranged in a tree structure. These methods create clusters by recursively partitioning the entities in a top-down or bottom-up manner. We examine and compare hierarchical clustering algorithms in this paper. The intent of discussing the various implementations of hierarchical clustering algorithms is to assist new researchers and beginners to understand how they function, so they can come up with new approaches and innovations for improvement.
A rich body of literature has emerged in recent years that discusses the extraction of structured information from materials science text through named entity recognition models. Relatively little work has been done to address the “normalization” of extracted entities, that is, recognizing that two or more seemingly different entities actually refer to the same entity in reality. In this work, we address the normalization of polymer named entities, polymers being a class of materials that often have a variety of common names for the same material in addition to the IUPAC name. We have trained supervised clustering models using Word2Vec and fastText word embeddings reported in previous work so that named entities referring to the same polymer are categorized within the same cluster in the word embedding space. We report the use of parameterized cosine distance functions to cluster and normalize textually derived entities, achieving an F1 score of 0.85. Furthermore, a labeled data set of polymer names was utilized to train our model and to infer the true total number of unique polymers that are actively reported in the literature. For ∼15,500 polymer named entities extracted from our corpus of 0.5 million papers, we detected 6734 unique clusters (i.e., unique polymers), 632 of which were manually curated to train the normalization model. This work will serve as a critical ingredient in a natural language processing-based pipeline for the automatic and efficient extraction of knowledge from the polymer literature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.