Changjun Wu scite author profile

Abstract-Metagenomics is the study of environmental microbial communities using state-of-the-art genomic tools. Recent advancements in high-throughput technologies have enabled the accumulation of large volumes of metagenomic data that was until a couple of years back was deemed impractical for generation. A primary bottleneck, however, is in the lack of scalable algorithms and open source software for largescale data processing. In this paper, we present the design and implementation of a novel parallel approach to identify protein families from large-scale metagenomic data. Given a set of peptide sequences we reduce the problem to one of detecting arbitrarily-sized dense subgraphs from bipartite graphs. Our approach efficiently parallelizes this task on a distributed memory machine through a combination of divide-and-conquer and combinatorial pattern matching heuristic techniques. We present performance and quality results of extensively testing our implementation on 160K randomly sampled sequences from the CAMERA environmental sequence database using 512 nodes of a BlueGene/L supercomputer.

show abstract

Dynamic analyses of rice blast resistance for the assessment of genetic and environmental effects

Jiang

et al. 2007

Plant Breeding

View full text Add to dashboard Cite

A doubled haploid population was employed to characterize the dynamic changes of the genetic components involved in rice blast resistance, including main-effect quantitative trait loci (QTLs), epistatic QTLs and QTL-by-environment interactions. The study was carried out at three different developmental stages of rice, using natural infection tests over 2 years. The number of main-effect QTLs, epistatic QTLs and their environmental interactions differed across the various measuring stages. One QTL (d12) on chromosome 12 was detected at all stages, whereas most QTLs were active only at one or two stages in the population. These findings suggest that the unstable expression of most QTLs identified for blast resistance was influenced by the developmental status of the plants, epistatic effects between different loci and the environments in which they were grown. These findings demonstrate the complexity of expression of rice blast resistance and have important implications for durable resistancebreeding and map-based cloning of quantitative traits.

show abstract

Polymorphism in exon 3 of follicle stimulating hormone beta (FSHB) subunit gene and its association with litter traits and superovulation in the goat

Zhang

Zeng

et al. 2011

Small Ruminant Research

View full text Add to dashboard Cite

GPU-Accelerated Protein Family Identification for Metagenomics

Kalyanaraman

2013

View full text Add to dashboard Cite

The clustering of putative protein/Open Reading Frame (ORF) sequences available from large-scale metagenomics survey projects is a core analytical function that has led to the identification and characterization of novel protein families of environmental microbial communities. The implementation of this function, however, is currently challenged not only by data size but also by data complexity. In this paper, we present a CPU-GPU implementation of a randomized graph clustering heuristic called Shingling, which was originally developed by Gibson et al. Our implementation uses the CPU and GPU for different stages of computation, using GPUs for the most time-consuming steps. Experimental results of a 2M ocean metagenomics data set obtained from the Sorcerer II Global Ocean Sampling project show that our new implementation is able to achieve a ∼7X speedup over our serial implementation without using asynchronous CPU-GPU communication, with the GPU part alone contributing to over ∼374X speedup in the accelerated part. Qualitative evaluation of the 2M data set shows that our method is able to improve sensitivity of clustering over existing methods, and is more successful in recruiting more sequences into the clustering without impacting the overall specificity. As a demonstration of a large scale run, we were able to cluster a real world homology graph, containing 11M vertices and 640M edges, and constructed from sequences of an ongoing Pacific Ocean metagenomics survey project, in about 94 minutes.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Changjun Wu

The genome of the domesticated apple (Malus × domestica Borkh.)

Polymorphism of the growth hormone gene and its association with growth traits in Boer goat bucks

Pyramiding and evaluation of the brown planthopper resistance genes Bph14 and Bph15 in hybrid rice

pGraph: Efficient Parallel Construction of Large-Scale Protein Sequence Homology Graphs

An efficient parallel approach for identifying protein families in large-scale metagenomic data sets

Dynamic analyses of rice blast resistance for the assessment of genetic and environmental effects

Polymorphism in exon 3 of follicle stimulating hormone beta (FSHB) subunit gene and its association with litter traits and superovulation in the goat

GPU-Accelerated Protein Family Identification for Metagenomics

Contact Info

Product

Resources

About