Alberto Alexander Gayle scite author profile

Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, SMILES, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, SMILES has several shortcomings -most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELFIES (SELF-referencIng Embedded Strings). SELFIES has since simplified and enabled numerous new applications in chemistry. In this manuscript, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete Future Projects for

show abstract

Navigating the challenges of medical English education: a novel approach using computational linguistics

Gayle

2016

Preprint

View full text Add to dashboard Cite

1Recent studies have shown that international medical graduates (IMG) comprise a substantial and 2 increasingly larger share of the medical workforce, internationally. IMGs wishing to work in 3English-speaking countries face many challenges. And overcoming such challenges plays an 4 important role in ensuring a more comfortable transition and improved outcomes for patients. This 5 study addresses one such area of concern: the efficient acquisition of advanced language 6 competence for use in the medical workplace. This research also addresses the needs of medical 7 students and practitioners in other countries, where English is not the primary language. 8Medical terminology and phrasing is based on a tradition spanning more than 2500 years-a 9 tradition that cuts across typical linguistic and cultural boundaries. Indeed, as is commonly 10 understood, the language required by doctors and other medical professionals varies substantially 11 from the norm. In the present study, this dynamic is exploited to identify and characterize the 12 language and patterns of usage specific to medical English, as it is used in practice and reporting. 13Overall, constructions comprised of preposition-dependent nouns, verbs and adjectives were found 14 to be most prevalent (38%), followed by prepositional phrases (33%). The former includes 15 constructions such as "present with", "present to", and "present in"; while constructions such as 16 "of … patient", "in … group", and "with … disease" comprise the latter. Preposition-independent 17 noun and verb-based constructions were far less prevalent overall (18% and 5%, respectively). 18Up to now, medical language reference and learning material has focused on relatively uncommon,

show abstract

Additive Compendium Map of Outbreak Risk Determinants of West Nile Virus in Europe at NUTS3

Gayle

2020

Preprint

View full text Add to dashboard Cite

Annual emergence of West Nile virus depends on a complex transmission chain. Predictive efforts are consequently confounded by time-varying associations and scale-dependent effect variability. SHAP (SHaply Additive Explanation) is a novel AI-driven solution with potential to overcome this. SHAP takes a high-performance XGBoost model and deductively imputes the marginal contribution of each feature with respect to the log relative risk associated with the local XGBoost prediction (an additive model). The resulting effect matrix is dimensionally identical to the original data but IID and homogenized in terms of units, scale, and interpretation. Such "synthetic data" can therefore serve as surrogate to allow for high-power statistical analyses. Here, we applied SHAP to a database consisting of high-resolution data from various domains - climate, environment, economic, sociodemographic, vector and host distribution - to derive an effect matrix of WNV outbreak risk determinants in Europe. This effect data proved superior to the original, nominal data in predictive tasks and delivered qualitatively compelling, domain-specific risk mappings. Further applications are discussed and others are invited to experiment.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.