Background
DNA methylation is an epigenetic process that involves the addition of a methyl group (CH3) to a DNA molecule. In vertebrates, DNA methylation typically occurs at cytosine nucleotides that are followed by guanine (CpG). As with all epigenetic marks, this type of modification does not change the underlying nucleotide sequence itself but can still significantly impact how genes are regulated and expressed.
Methyl‑sequencing (also known as methyl-seq or bisulfite sequencing) is a method used by researchers to uncover methylation patterns in their DNA samples. By sequencing DNA methylation marks during different stages of development, scientists have established that methylation plays a critical role in cell and tissue differentiation [2,3]. They've also found that abnormal methylation patterns are often a sign of cellular dysfunction and can contribute to a number of disease processes, including cancer [4,5] and neurological disorders [6].
To improve our understanding of methylation signatures, Loyfer et al. created a comprehensive DNA methylation atlas to serve as a baseline human “methylome” across multiple cell types from normal tissues. The resulting atlas, which includes 39 cell types sorted from 205 healthy tissue samples, is an improvement upon previously published methylomes that were limited in scope, sample type, and/or predefined subsets of CpG methylation sites. The completion of this human DNA methylation atlas represents a valuable resource for gene regulation research and discovery efforts of tissue-specific biomarkers for use in liquid biopsies.
Methods
The researchers obtained 205 normal human tissue and cell samples from 137 donors. The tissue samples were enzymatically dissociated, incubated with relevant antibodies for the desired cell type, and sorted by FACS. To prepare for methyl-seq, DNA was extracted from the sorted cell populations, representing 77 different primary cell types, and subjected to bisulfite conversion. The researchers used Accel-NGS Methyl-Seq DNA library preparation kits (now called the xGen™ Methyl‑Seq DNA Library Prep Kit) to generate sequencing libraries from the bisulfite‑treated DNA and performed whole genome bisulfite sequencing (WGBS) on an Illumina NovaSeq® instrument.
After mapping reads to the human genome, the researchers used a custom-built genomic segmentation algorithm [7] to divide the genome into non-overlapping continuous blocks according to their DNA methylation content. These blocks were subjected to clustering analysis based on variability in average methylation across all samples. Subsequent analysis identified genomic blocks that were differentially methylated according to cell type and categorized them as either unmethylated or methylated “markers.” A subset of the markers (e.g., the top 1,000 markers per cell type) were used in additional analytical tests, such as motif analysis, chromatin analysis, gene set annotation enrichment, and gene association.
Results
Segmenting methylomes into genomic blocks maintains regional context of DNA methylation that are consistent across cell types
As described in the methods, Loyfer et al. built their DNA methylation atlas by performing WGBS on 205 human tissue samples, producing a dataset of 205 methylomes. The researchers noted that, when arranged by cell type, the methylomes exhibited distinctive changes between cell types in a block-like manner. To investigate these differentially methylated regions, they developed a computational program to segment the genomes into blocks based on stretches of correlated CpG sites that are similarly methylated rather than focus on individual CpGs. This resulted in nearly 3 million methylation blocks containing an average of 8 CpGs and spanning 544 bp. Methylation patterns across the blocks were very similar within the same cell type across different donor individuals and were consistent enough within cell types to enable a clustering algorithm to group samples based upon lineage relationships. For example, pancreatic islet cells that originate from the same endocrine progenitor cell type clustered together.
Identification and characterization of cell-type-specific methylation markers
The researchers organized the samples into 39 groups of related cells, such as blood cells (monocytes, granulocytes, etc.) or lung epithelium (alveolar and bronchial). They identified genomic blocks that were unmethylated in a single cell type group but methylated in all other groups and sorted them based on absolute difference in methylation. The top 25 uniquely unmethylated regions for each cell type (1246 total markers) were highlighted as potential biomarkers for translational research applications such as identifying the cellular origin of circulating cell-free DNA fragments (cfDNA). Next, the researchers selected the top 250 unmethylated markers for each cell type, identified adjacent genes, and performed gene annotation enrichment analysis. They found that genes near the unmethylated genomic block tended to be expressed in those cells and reflect the function of the cell type. Loyfer et al. also performed a series of assays to study the DNA accessibility and chromatin packaging of cell-type-specific markers and performed motif analysis. In general, the top motifs included key transcription factor binding motifs and known master regulators. This data, combined with additional fragment-level analysis of the unmethylated genomic blocks, helped the researchers create a catalog of putative gene regulatory regions in each cell type. In the opposite scenario (i.e., genomic blocks that were uniquely methylated rather than unmethylated compared to other cell types), Loyfer et al. observed strong enrichment of CpG islands and CTCF binding sites.
Leveraging the methylation atlas to identify the tissue of origin for cfDNA and deconvolute mixed/composite samples
In order to study methylomes from mixed tissue samples or cfDNA, the researchers also developed a deconvolution algorithm using the top 25 markers for each cell-type. When applied to methyl‑sequencing results from cfDNA obtained from SARS-CoV-2-positive donors, the researchers were able to computationally identify the cellular origin of the cfDNA and their cell types, such as granulocytes and erythrocyte progenitors. They also identified previously undetected cfDNA that originated in vascular endothelial cells. The deconvolution process was also applied to previously published methylomes, which revealed that some samples were comprised of mixed cell types rather than composed of a single type. For example, some lung methylome samples consisted primarily of blood (40%), endothelium (34%), and smooth muscle cells (5%) rather than lung epithelial cells (20%). These results demonstrate the capability of the methylation atlas and related deconvolution algorithm to permit analysis of composite tissue and cfDNA samples by distinguishing unique methylation signatures corresponding to cell type.
Conclusion
The DNA methylation atlas developed by Loyfer et al. represents a significant contribution to the research community. This comprehensive collection of cell types and associated methylation patterns presents opportunities for comparative methylome analysis, identification of new regulatory circuits, and many other research applications. The atlas also provides cell-type-specific biomarkers that can be used to discriminate the cellular origin of cfDNA in liquid biopsy samples, and a set of computational tools to help infer the composition of mixed tissue samples.