Difference in total counts of each probability per modification in different pore chemistries

I have samples basecalled with dorado software (v0.4.1) with detection of 5mCG_5hmCG modifications enabled using either: 

dna_r10.4.1_e8.2_400bps_hac@v4.2.0
dna_r9.4.1_e8_hac@v3.3

I've noticed the two "batches" have distinct distribution of probabilities in base modifications. In particular the R9.4.1 samples have a massive peak of C:m at the far right of the histogram which looks like some sort of artefact (second plot).

I'm wondering what the explanation for this would be and what is the best way to mitigate this issue? 

command: 
```
modkit sample-probs \
${input_bam} \
--log-filepath ${log} \
--percentiles 0.1,0.25,0.5,0.75,0.9 \
--out-dir ${output} \
--hist \
--prefix ${prefix} \
--suppress-progress \
--force
done
```


![Counts (1)](https://github.com/user-attachments/assets/ea0327e1-5a56-4b75-b700-07e4eb4fad90)
![Counts](https://github.com/user-attachments/assets/c802c90b-07ca-4384-8cfb-0c87c862b542)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Difference in total counts of each probability per modification in different pore chemistries #269

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Difference in total counts of each probability per modification in different pore chemistries #269

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions