The advent of high-resolution chromosome conformation capture assays (such as 5C, Hi-C and Pore-C) has allowed for unprecedented sequence-level investigations into the structure–function relationship of the genome. In order to comprehensively understand this relationship, computational tools are required that utilize data generated from these assays to predict 3D genome organization (the 3D genome reconstruction problem). Many computational tools have been developed that answer this need, but a comprehensive comparison of their underlying algorithmic approaches has not been conducted. This manuscript provides a comprehensive review of the existing computational tools (from November 2006 to September 2019, inclusive) that can be used to predict 3D genome organizations from high-resolution chromosome conformation capture data. Overall, existing tools were found to use a relatively small set of algorithms from one or more of the following categories: dimensionality reduction, graph/network theory, maximum likelihood estimation (MLE) and statistical modeling. Solutions in each category are far from maturity, and the breadth and depth of various algorithmic categories have not been fully explored. While the tools for predicting 3D structure for a genomic region or single chromosome are diverse, there is a general lack of algorithmic diversity among computational tools for predicting the complete 3D genome organization from high-resolution chromosome conformation capture data.
Development of yellow mustard (Sinapis alba L.) with superior quality traits (low erucic and linolenic acid contents, and low glucosinolate content) can make this species as a potential oilseed crop. We have recently isolated three inbred lines Y1127, Y514 and Y1035 with low (3.8%), medium (12.3%) and high (20.8%) linolenic acid (C18∶3) content, respectively, in this species. Inheritance studies detected two fatty acid desaturase 3 (FAD3) gene loci controlling the variation of C18∶3 content. QTL mapping revealed that the two FAD3 gene loci responsible for 73.0% and 23.4% of the total variation and were located on the linkage groups Sal02 and Sal10, respectively. The FAD3 gene on Sal02 was referred to as SalFAD3.LA1 and that on Sal10 as SalFAD3.LA2. The dominant and recessive alleles were designated as LA1 and la1 for SalFAD3.LA1, and LA2 and la2 for SalFAD3.LA2. Cloning and alignment of the coding and genomic DNA sequences revealed that the SalFAD3.LA1 and SalFAD3.LA2 genes each contained 8 exons and 7 introns. LA1 had a coding DNA sequence (CDS) of 1143 bp encoding a polypeptide of 380 amino acids, whereas la1 was a loss-of-function allele due to an insertion of 584 bp in exon 3. Both LA2 and la2 had a CDS of 1152 bp encoding a polypeptide of 383 amino acids. Allele-specific markers for LA1, la1, LA2 and la2 co-segregated with the C18∶3 content in the F2 populations and will be useful for improving fatty acid composition through marker assisted selection in yellow mustard breeding.
Abstract. Background: Hi-C is a proximity-based ligation reaction used to detect regions of the genome that are close in 3D space (or "interacting"). Typically, results from Hi-C experiments (whole-genome contact maps) are visualized as heatmaps or Circos plots. While informative, these visualizations do not intuitively represent the complex organization and folding of the genome in 3D space, making the interpretation of the underlying 3D genomic organization difficult. Our objective was to utilize existing tools to generate a graph-based representation of a whole-genome contact map that leads to a more intuitive visualization. Methodology: Whole-genome contact maps were converted into graphs where each vertex represented a genomic region and each edge represented a detected or known interaction between two vertices. Three types of interactions were represented in the graph: linear, intra-chromosomal (cis-), and inter-chromosomal (trans-) interactions. Each edge had an associated weight related to the linear distance (Hi-C experimental resolution) or the associated interaction frequency from the contact map. Graphs were generated based on this representation scheme for wholegenome contact maps from a fission yeast dataset where yeast mutants were used to identify specific principles influencing genome organization (GEO accession: GSE56849). Graphs were visualized in Cytoscape with an edge-weighted spring embedded layout where vertices and linear interaction edges were coloured according to their corresponding chromosome. Results: The graph-based visualizations (compared to the equivalent heatmaps) more intuitively represented the effects of the rad21 mutant on genome organization. Specifically, the graph based visualizations clearly highlighted the loss of structural globules and a greater intermingling of chromosomes in the mutant strain when compared to the wild-type. The graph-based representation and visualization protocol developed here will aid in understanding the complex organization and folding of the genome.
In order to comprehensively understand the structure-function relationship of the genome, 3D genome structures must first be predicted from biological data (like Hi-C) using computational tools. Many of these existing tools rely partially or completely on multi-dimensional scaling (MDS) to embed predicted structures in 3D space. MDS is known to have inherent problems when applied to high-dimensional datasets like Hi-C. Alternatively, t-Distributed Stochastic Neighbor Embedding (t-SNE) is able to overcome these problems but has not been applied to predict 3D genome structures. In this manuscript, we present a new workflow called StoHi-C (pronounced "stoic") that uses t-SNE to predict 3D genome structure from Hi-C data. StoHi-C was used to predict 3D genome structures for multiple, independent existing fission yeast Hi-C datasets. Overall, StoHi-C was able to generate 3D genome structures that more clearly exhibit the established principles of fission yeast 3D genomic organization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.