Author contributions DCJ coordinated all analyses, isolated DNA for sequencing, analysed and filtered SNP calls, conducted diversity analysis and GWAS and drafted the manuscript. CR produced phenotype data for growth on various solid media and growth rates in liquid media. AR conducted analysis of dating using mitochondrial data. DS conducted GWAS. MP analysed all phenotype data. TM identified LTR transposon insertions and analysed transposon insertion data. FXM conducted crosses for analysis of spore viability ZI produced indel calls with Cortex. WL conducted analysis of recombination rate, linkage disequilibrium decay and PCA for distance between strains. TMKC assisted with phenotype and population analysis. RP analysed Cortex and GATK indel calls. MM conducted amino acid profiling. JLDL and AC produced automated measures of cell morphology. SB aligned reads and produced GATK SNP calls. GH analysed population structure using fineSTRUCTURE. BO'F estimated the TMRCA from the nuclear genome using ACG. TK identified LTR transposon insertions JTS produced de novo assemblies. LB developed the custom Workspace workflow Spotsizer. BT assisted with sequence analysis. DAB assisted with analysis of novel genes. TS assisted with strain verification. SC produced images of wild strains and assisted with strain verification. JEEUH assisted with SNP validation. LvT and MT assisted with LTR validation. LJ and JL assisted with manual measures of cell morphology and FACS. SA produced gene expression data. MF, KM and ND assisted with sequencing. WB initiated and assisted with strain collection. JH coordinated manual measures of cell morphology and FACS. RECS coordinated automated measures of cell morphology. MR coordinated amino acid profiling. NM conducted analysis of recombination, linkage disequilibrium and advised on aspects of diversity and GWAS. DJB advised on GWAS. RD facilitated sequencing. JB contributed to the initiation and development of the project and financed the JB laboratory. AccessionsSequence data are archived in the European Nucleotide Archive (www.ebi.ac.uk/ena/), Study Accessions PRJEB2733 and PRJEB6284 (Supplementary Table 7). All SNPs and indels were submitted to NCBI dbSNP (www.ncbi.nlm.nih.gov/SNP/). Accessions are 974514578-974688138 (SNPs) and 974702618-974688139 (indels). Europe PMC Funders Group AbstractNatural variation within species reveals aspects of genome evolution and function. The fission yeast Schizosaccharomyces pombe is an important model for eukaryotic biology, but researchers typically use one standard laboratory strain. To extend the utility of this model, we surveyed the genomic and phenotypic variation in 161 natural isolates. We sequenced the genomes of all strains, revealing moderate genetic diversity (π = 3 ×10 −3 ) and weak global population structure. We estimate that dispersal of S. pombe began within human antiquity (~340 BCE), and ancestors of these strains reached the Americas at ~1623 CE. We quantified 74 traits, revealing substantial heritable phenotypic diversity. We cond...
Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-todate lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for ZNF213-AS1 and lnc-KHDC3L-2.
Inflammatory bowel disease (IBD) is a chronic intestinal disorder, with two main types: Crohn’s disease (CD) and ulcerative colitis (UC), whose molecular pathology is not well understood. The majority of IBD-associated SNPs are located in non-coding regions and are hard to characterize since regulatory regions in IBD are not known. Here we profile transcription start sites (TSSs) and enhancers in the descending colon of 94 IBD patients and controls. IBD-upregulated promoters and enhancers are highly enriched for IBD-associated SNPs and are bound by the same transcription factors. IBD-specific TSSs are associated to genes with roles in both inflammatory cascades and gut epithelia while TSSs distinguishing UC and CD are associated to gut epithelia functions. We find that as few as 35 TSSs can distinguish active CD, UC, and controls with 85% accuracy in an independent cohort. Our data constitute a foundation for understanding the molecular pathology, gene regulation, and genetics of IBD.
Background 5′-end sequencing assays, and Cap Analysis of Gene Expression (CAGE) in particular, have been instrumental in studying transcriptional regulation. 5′-end methods provide genome-wide maps of transcription start sites (TSSs) with base pair resolution. Because active enhancers often feature bidirectional TSSs, such data can also be used to predict enhancer candidates. The current availability of mature and comprehensive computational tools for the analysis of 5′-end data is limited, preventing efficient analysis of new and existing 5′-end data. Results We present CAGEfightR, a framework for analysis of CAGE and other 5′-end data implemented as an R/Bioconductor-package. CAGEfightR can import data from BigWig files and allows for fast and memory efficient prediction and analysis of TSSs and enhancers. Downstream analyses include quantification, normalization, annotation with transcript and gene models, TSS shape statistics, linking TSSs to enhancers via co-expression, identification of enhancer clusters, and genome-browser style visualization. While built to analyze CAGE data, we demonstrate the utility of CAGEfightR in analyzing nascent RNA 5′-data (PRO-Cap). CAGEfightR is implemented using standard Bioconductor classes, making it easy to learn, use and combine with other Bioconductor packages, for example popular differential expression tools such as limma, DESeq2 and edgeR. Conclusions CAGEfightR provides a single, scalable and easy-to-use framework for comprehensive downstream analysis of 5′-end data. CAGEfightR is designed to be interoperable with other Bioconductor packages, thereby unlocking hundreds of mature transcriptomic analysis tools for 5′-end data. CAGEfightR is freely available via Bioconductor: bioconductor.org/packages/CAGEfightR .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.