BackgroundHigh-throughput targeted sequencing of the 16S ribosomal RNA marker gene is often used to profile and characterize the taxonomic composition of microbial communities. This type of big high-through sequencing data is rapidly being applied to various infectious diseases like diarrhea. While many studies are limited to single "snapshots" of these communities, there is increasing recognition that longitudinal profiling of these communities are required to understand community dynamics and the complex relationships between dynamics and phenotypes of interest. Statistical methods that determine microbial features that are differentially expressed are required as an initial step to characterizing phenotypic associations with community dynamics in big data and infectious diseases.
We assembled teams of genomics professionals to assess whether we could rapidly develop pipelines to answer biological questions commonly asked by biologists and others new to bioinformatics by facilitating analysis of high-throughput sequencing data. In January 2015, teams were assembled on the National Institutes of Health (NIH) campus to address questions in the DNA-seq, epigenomics, metagenomics and RNA-seq subfields of genomics. The only two rules for this hackathon were that either the data used were housed at the National Center for Biotechnology Information (NCBI) or would be submitted there by a participant in the next six months, and that all software going into the pipeline was open-source or open-use. Questions proposed by organizers, as well as suggested tools and approaches, were distributed to participants a few days before the event and were refined during the event. Pipelines were published on GitHub, a web service providing publicly available, free-usage tiers for collaborative software development (https://github.com/features/). The code was published at https://github.com/DCGenomics/ with separate repositories for each team, starting with hackathon_v001.
Cell lines are an indispensable tool in biomedical research and often used as surrogates for tissues. An important question is how well a cell line's transcriptional and regulatory processes reflect those of its tissue of origin. We analyzed RNA-Seq data from GTEx for 127 paired Epstein-Barr virus transformed lymphoblastoid cell lines and whole blood samples; and 244 paired fibroblast cell lines and skin biopsies. A combination of gene expression and network analyses shows that while cell lines carry the expression signatures of their primary tissues, albeit at reduced levels, they also exhibit changes in their patterns of transcription factor regulation. Cell cycle genes are over-expressed in cell lines compared to primary tissue, and they have a reduction of repressive transcription factor targeting. Our results provide insight into the expression and regulatory alterations observed in cell lines and suggest that these changes should be considered when using cell lines as models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.