Matchplay characteristics of Grand Slam tennis: implications for training and conditioning

Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge.Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample.Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmatContact: allen99@llnl.govSupplementary information: Supplementary data are available at Bioinformatics online.

show abstract

Using populations of human and microbial genomes for organism detection in metagenomes

Ames

Gardner

Martí

et al. 2015

Genome Res.

View full text Add to dashboard Cite

Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-free human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. Left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.

show abstract

On the Role of NVRAM in Data-intensive Architectures: An Evaluation

Essen

Pearce

Ames

et al. 2012

View full text Add to dashboard Cite

Observations for Model Intercomparison Project (Obs4MIPs): status for CMIP6

et al. 2020

View full text Add to dashboard Cite

Abstract. The Observations for Model Intercomparison Project (Obs4MIPs) was initiated in 2010 to facilitate the use of observations in climate model evaluation and research, with a particular target being the Coupled Model Intercomparison Project (CMIP), a major initiative of the World Climate Research Programme (WCRP). To this end, Obs4MIPs (1) targets observed variables that can be compared to CMIP model variables; (2) utilizes dataset formatting specifications and metadata requirements closely aligned with CMIP model output; (3) provides brief technical documentation for each dataset, designed for nonexperts and tailored towards relevance for model evaluation, including information on uncertainty, dataset merits, and limitations; and (4) disseminates the data through the Earth System Grid Federation (ESGF) platforms, making the observations searchable and accessible via the same portals as the model output. Taken together, these characteristics of the organization and structure of obs4MIPs should entice a more diverse community of researchers to engage in the comparison of model output with observations and to contribute to a more comprehensive evaluation of the climate models. At present, the number of obs4MIPs datasets has grown to about 80; many are undergoing updates, with another 20 or so in preparation, and more than 100 are proposed and under consideration. A partial list of current global satellite-based datasets includes humidity and temperature profiles; a wide range of cloud and aerosol observations; ocean surface wind, temperature, height, and sea ice fraction; surface and top-of-atmosphere longwave and shortwave radiation; and ozone (O3), methane (CH4), and carbon dioxide (CO2) products. A partial list of proposed products expected to be useful in analyzing CMIP6 results includes the following: alternative products for the above quantities, additional products for ocean surface flux and chlorophyll products, a number of vegetation products (e.g., FAPAR, LAI, burned area fraction), ice sheet mass and height, carbon monoxide (CO), and nitrogen dioxide (NO2). While most existing obs4MIPs datasets consist of monthly-mean gridded data over the global domain, products with higher time resolution (e.g., daily) and/or regional products are now receiving more attention. Along with an increasing number of datasets, obs4MIPs has implemented a number of capability upgrades including (1) an updated obs4MIPs data specifications document that provides additional search facets and generally improves congruence with CMIP6 specifications for model datasets, (2) a set of six easily understood indicators that help guide users as to a dataset's maturity and suitability for application, and (3) an option to supply supplemental information about a dataset beyond what can be found in the standard metadata. With the maturation of the obs4MIPs framework, the dataset inclusion process, and the dataset formatting guidelines and resources, the scope of the observations being considered is expected to grow to include gridded in situ datasets as well as datasets with a regional focus, and the ultimate intent is to judiciously expand this scope to any observation dataset that has applicability for evaluation of the types of Earth system models used in CMIP.

show abstract

Toward Standardized Data Sets for Climate Model Experimentation

Durack¹,

Taylor²,

Eyring³

et al. 2018

Eos

View full text Add to dashboard Cite

show abstract

DI-MMAP—a scalable memory-map runtime for out-of-core data-intensive applications

et al. 2013

View full text Add to dashboard Cite

We present DI-MMAP, a high-performance runtime that memory-maps large external data sets into an application's address space and shows significantly better performance than the Linux mmap system call. Our implementation is particularly effective when used with high performance locally attached Flash arrays on highly concurrent, latency-tolerant data-intensive HPC applications. We describe the kernel module and show performance results on a benchmark test suite, a new bioinformatics metagenomic classification application, and on a levelasynchronous Breadth-First Search (BFS) graph traversal algorithm. Using DI-MMAP, the metagenomics classification application performs up to 4× better than standard Linux mmap. A fully external memory configuration of BFS executes up to 7.44× faster than traditional mmap. Finally,

show abstract

Coordinating an operational data distribution network for CMIP6 data

et al. 2021

View full text Add to dashboard Cite

Abstract. The distribution of data contributed to the Coupled Model Intercomparison Project Phase 6 (CMIP6) is via the Earth System Grid Federation (ESGF). The ESGF is a network of internationally distributed sites that together work as a federated data archive. Data records from climate modelling institutes are published to the ESGF and then shared around the world. It is anticipated that CMIP6 will produce approximately 20 PB of data to be published and distributed via the ESGF. In addition to this large volume of data a number of value-added CMIP6 services are required to interact with the ESGF; for example the citation and errata services both interact with the ESGF but are not a core part of its infrastructure. With a number of interacting services and a large volume of data anticipated for CMIP6, the CMIP Data Node Operations Team (CDNOT) was formed. The CDNOT coordinated and implemented a series of CMIP6 preparation data challenges to test all the interacting components in the ESGF CMIP6 software ecosystem. This ensured that when CMIP6 data were released they could be reliably distributed.

show abstract

DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications

Essen

Hsieh

Ames

et al. 2012

View full text Add to dashboard Cite

Abstract-We present DI-MMAP, a high-performance runtime that memory-maps large external data sets into an application's address space and shows significantly better performance than the Linux mmap system call. Our implementation is particularly effective when used with high performance locally attached Flash arrays on highly concurrent, latencytolerant data-intensive HPC applications. We describe the kernel module and show performance results on a benchmark test suite and on a new bioinformatics metagenomic classification application. For the complex metagenomics classification application, DI-MMAP performs up to 4.88× better than standard Linux mmap.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sasha Ames

Scalable metagenomic taxonomy classification using a reference genome database

Using populations of human and microbial genomes for organism detection in metagenomes

On the Role of NVRAM in Data-intensive Architectures: An Evaluation

Observations for Model Intercomparison Project (Obs4MIPs): status for CMIP6

Toward Standardized Data Sets for Climate Model Experimentation

DI-MMAP—a scalable memory-map runtime for out-of-core data-intensive applications

Coordinating an operational data distribution network for CMIP6 data

DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications

Contact Info

Product

Resources

About