Extract statistics from human and mouse reference genomes: number of (1) CG-motifs, (2) CCGG-motifs and distribution of (3) gene types. CG-motifs are interesting for the methylome as these can be methylated and DNA methylation has been shown to affect gene expression ref. CCGG-motifs are targets for the restiction enzyme Mspl and are used as a reduced representation of the genome for DNA methylation.
In mouse X % (21M / 3000M) are CG-motifs, of these X % (1.6M / 21M) are CCGG-motifs
Todo: add figure from the code, rmd?