-
Notifications
You must be signed in to change notification settings - Fork 3
Input files
The main input for RLM consists of a BAM file from a bisulfite sequencing alignment tool. RLM supports BAM files from either BSMAP, BISMARK, segemehl or GEM (e.g. included in gemBS) and the used alignment tool needs to be specified when running RLM using the option -a
. All four alignment tools use different tags to report the strand each read originated from and based on this option RLM will choose which tag to look for.
We recommend trimming of low quality bases (and potentially tail trimming for swift libraries) prior to the alignment (e.g. using cutadapt) as well as the removal of technical duplicates after the alignment (e.g. using Picard MarkDuplicates). This way technical bias is reduced and read-level metrics are not influenced by artefacts. We also recommend sorting the BAM file by position in order to reduce memory consumption while running RLM in paired-end mode but RLM can also process unsorted or name-sorted BAM files.
RLM can process BAM files of different bisulfite sequencing experiments such as whole genome bisulfite sequencing (WGBS), reduced representation bisulfite sequencing (RRBS), hybrid capture methods or amplicons with single-end or paired-end reads. For RRBS, we recommend using BSMAP in RRBS mode in order to reduce runtime and avoid mis-mappings.
RLM can be run using any reference genome (also custom genomes or assemblies), however, it should be the same genome (ideally the same file) that was used to align the reads in the BAM file. The order and number of reference sequences in the BAM header and the reference genome FASTA file are compared by RLM and different numbers or order of sequences will result in an error and termination of the program.