Deciphering the 3D genome organization across species from Hi-C data

Deciphering the 3D genome organization across species from Hi-C data

Deciphering the 3D Genome Organization Across Species from Hi-C Data

Three-dimensional (3D) genome organization plays a critical role in gene expression regulation and function. Recent advances in Hi-C and Micro-C data across various species provide insights into the mechanisms governing 3D genome formation, such as loop extrusion. While visual patterns like topologically associating domains (TADs) and loops are conserved across species, the underlying biological mechanisms may differ. Both species-specific architectural factors and DNA sequences influence chromatin folding, complicating comparative studies on the evolution of 3D organization of the genome.

This work leverages existing Hi-C data and machine learning to explore species-specific 3D genome folding mechanisms and predict chromatin structures from DNA sequences. Here, we present Chimaera, a neural network that not only predicts Hi-C or Micro-C contact maps from DNA sequence, but also enables the search, quantification, and interpretation of associations between DNA sequences and 3D genome patterns.

Fundamentals of Chromosome Conformation Capture

Chromosome conformation capture (3C) and its derivatives have revolutionized our understanding of genome organization. These techniques quantify the frequencies of contacts between distal DNA segments in cell populations, yielding incomparable information-rich data on genome topology at the genome-wide level.

The principal steps of 3C and related methods are similar: crosslink chromatin, digest with restriction enzymes, re-ligate, and interrogate the rearranged DNA fragments. Variants like 4C, 5C, and ChIA-PET offer increased throughput and targeted analysis of specific genomic loci or protein-mediated interactions. In contrast, Hi-C generates an “all-versus-all” contact map, providing an unbiased genome-wide view of chromatin folding.

Computational Approaches to Hi-C Data Analysis

Analyzing Hi-C data involves several key steps. First, sequencing reads are aligned to the reference genome, and valid read pairs corresponding to contacting DNA fragments are identified. Normalization techniques are then applied to correct for various biases. The resulting contact matrix enables the visualization of long-range chromatin interactions and the identification of structural features like TADs and compartments.

Advances in computational methods have enabled the exploration of polymer models and the integration of Hi-C data with other genomic and epigenomic information. However, the high sequencing depth required for high-resolution contact maps remains a significant challenge, limiting the widespread adoption of Hi-C.

Comparative Genomics of 3D Genome Organization

By applying Chimaera to Hi-C and Micro-C datasets across multiple species, we can gain insights into the evolutionary dynamics of 3D genome organization. Chimaera not only predicts chromatin contact maps from DNA sequences but also enables the identification and quantification of various 3D genome features, such as insulation, loops, stripes, and fountains/jets.

Exploring the latent representations learned by Chimaera, we can detect and validate the roles of specific DNA sequence elements in shaping chromatin structure. For instance, we confirm the importance of CTCF in generating insulation patterns in vertebrates and BEAF-32 in Drosophila, while also identifying previously unreported motifs in mouse and Drosophila.

Interestingly, Chimaera demonstrates the impact of gene arrangement on the DNA strand for the formation of loops in Dictyostelium, corroborating the hypothesis about the influence of convergent gene positioning on 3D genome organization in this amoeba. A pronounced but diverse effect of genes is evident when predicting chromatin interactions in other organisms.

Principles of Chromosomal Folding

Chromatin Looping and Topological Domains

Chromatin loops are dynamic structures that facilitate communication between regulatory elements and their target genes, as well as the start and end of a gene. The formation of these loops is associated with changes in transcriptional state, underscoring their functional relevance.

Topologically associating domains (TADs) are self-interacting chromosomal regions that are demarcated by insulating boundaries. TADs are thought to play a role in gene regulation by facilitating or restricting interactions between regulatory elements and their target genes.

Spatial Compartmentalization of the Genome

Hi-C analyses have revealed the existence of two main chromatin compartments, A and B, which exhibit distinct functional and structural properties. The A compartment is enriched in active genes and open chromatin, while the B compartment is associated with repressed genomic regions and heterochromatin.

Role of Architectural Proteins

Proteins like CTCF, cohesin, and BEAF-32 have been identified as key architectural factors that shape the 3D genome. These proteins mediate chromatin looping and the formation of TAD boundaries, thereby influencing gene regulation and genome stability.

Evolutionary Dynamics of 3D Genome Structure

Conserved Features of Genome Organization

Despite the diversity of genome structures across species, certain organizational principles appear to be conserved. For example, the Rabl configuration, with clustering of centromeres and telomeres, has been observed in multiple organisms.

Species-Specific Variations in 3D Genome

While visual patterns like TADs and loops are shared, the underlying mechanisms governing 3D genome formation can differ significantly between species. This is likely due to the interplay between species-specific architectural factors and DNA sequence elements.

Influence of Genome Evolution

Genome rearrangements, mutations, and gene insertions can have profound impacts on the 3D organization of the genome. By applying Chimaera to predict the consequences of such genomic changes, we can gain a deeper understanding of the evolutionary forces shaping chromatin folding patterns.

Functional Implications of 3D Genome Architecture

Regulation of Gene Expression

The highly organized chromatin architecture facilitates communication between genes and their regulatory elements, influencing gene expression programs and cellular differentiation.

Genome Stability and Chromosomal Interactions

The spatial arrangement of the genome also plays a crucial role in maintaining genome stability and orchestrating specific chromosomal interactions, such as those involved in DNA repair processes.

Cell Type-Specific Genome Organization

Chromatin folding patterns exhibit cell type-specific differences, reflecting the dynamic nature of the 3D genome and its adaptation to the functional requirements of different cellular contexts.

By integrating the insights gained from Chimaera and other advanced techniques, we are poised to unravel the complex interplay between genome sequence, architecture, and function, ultimately advancing our understanding of the 3D genome and its evolutionary underpinnings.

Facebook
Twitter
LinkedIn