Closing the gap between measurable genetic information and observable traits is a longstanding challenge in genomics. Yet, the prediction of molecular phenotypes from DNA sequences alone remains limited and inaccurate, often driven by the scarcity of annotated data and the inability to transfer learnings between prediction tasks. Here, we present an extensive study of foundation models pre-trained on DNA sequences, named the Nucleotide Transformer, integrating information from 3,202 diverse human genomes, as well as 850 genomes from a wide range of species, including model and non-model organisms. These transformer models yield transferable, context-specific representations of nucleotide sequences, which allow for accurate molecular phenotype prediction even in low-data settings. We show that the representations alone match or outperform specialized methods on 11 of 18 prediction tasks, and up to 15 after fine-tuning. Despite no supervision, the transformer models learnt to focus attention on key genomic elements, including those that regulate gene expression, such as enhancers. Lastly, we demonstrate that utilizing model representations alone can improve the prioritization of functional genetic variants. The training and application of foundational models in genomics explored in this study provide a widely applicable stepping stone to bridge the gap of accurate molecular phenotype prediction from DNA sequence alone.
The authors investigated factors influencing the occurrence of dissolved lead in tap water using different sampling protocols. The principal factor affecting the concentration of dissolved lead at the distribution system taps was the length of lead service lines (LSLs). However, dissolved lead levels in first‐litre samples were also associated with lead particles being trapped in the aerator. Collecting the first‐litre sample after 30 min of stagnation provided a good estimate of lead concentration in premise plumbing and LSLs, even though it could sometimes underestimate peak lead concentrations in the LSLs. Also it gives mean exposure estimates close to that obtained using random daytime sampling. Lead levels remained relatively high in flushed samples despite short (26‐s) contact time between the water and lead pipe, illustrating high rates of mass transfer.
SUMMARY
The effective bulk and shear viscosity of the matrix of a partially molten rock are important properties for the process of melt migration and matrix compaction. A set of porosity‐dependent effective bulk and shear viscosity models is presented based on a self‐consistent poroelastic formulation for partially molten rock where the melt occurs in spherical, ellipsoidal, film‐like or tubular inclusions. For these melt geometries a graphical user interface operated MATLAB program is provided which allows the calculation of effective bulk and shear viscosities as a function of porosity, as well as effective elastic moduli and seismic velocities. For all melt geometries an inverse porosity—dependence of the effective bulk viscosity is found. Depending on the assumed melt geometry, this formulation predicts both the effective bulk and shear viscosity to drop to zero at and above finite melt fractions of 20–50 per cent. Fitting equations are derived allowing direct implementation into two‐phase flow melt–matrix formulations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.