Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNA004 Does not output any CTC data #379

Open
VBHarrisN opened this issue Feb 5, 2024 · 3 comments
Open

RNA004 Does not output any CTC data #379

VBHarrisN opened this issue Feb 5, 2024 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@VBHarrisN
Copy link

Hello!

I am working on training a RNA specific basecaller model. To that end, I have been attempting to use the RNA004 basecaller for training. However, this model does not seem to be outputting the CTC data correctly. No matter what data I put in, the resulting chunks.npy is always 0 by 10000. To make sure it was not my data, I fed the RNA data through the DNA r_10 basecalling model and got a 59000 by 9996 numpy array. Furthermore, all outputs from the RNA004 basecaller model are sub 1 kb of storage, which I believe are just empty files. In addition, the model even says "saving CTC data" in the console (just to prove that the data isn't the problem) when using the RNA004 model. I believe this is a bug, as the RNA004 model does not throw any errors, it just does not save any data correctly. I am very confused as to how to proceed, as I need the RNA CTC data to train my specific basecalling model.

Let me know if I can provide any more information to help diagnose/solve this problem!

@iiSeymour
Copy link
Member

Only high quality chunks (>99% accuracy by default) are saved for training. You will want to change this filter with --min-accuracy-save-ctc to be in line with the distribution on your RNA calls.

https://github.com/nanoporetech/bonito/blob/master/bonito/cli/basecaller.py#L211

@iiSeymour iiSeymour added the question Further information is requested label Apr 4, 2024
@iiSeymour iiSeymour self-assigned this Apr 4, 2024
@VBHarrisN
Copy link
Author

I had read about this issue in other github issues. We tried setting the --min-accuracy-save-ctc flag to 15, 1, and 0.2. No data was every written to chunks.npy. Our data, in terms of quality typically has an average quality score of 14. I don't totally understand how you judge what is a high quality chunk or not.

@Sgreenfield9
Copy link

We're in the same boat. I've actually dropped my --min-accuracy-save-ctc flag down to 0 but still nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants