Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very different corrections when bootstrapping #45

Open
jbruxaux opened this issue Feb 20, 2024 · 14 comments
Open

very different corrections when bootstrapping #45

jbruxaux opened this issue Feb 20, 2024 · 14 comments

Comments

@jbruxaux
Copy link

Hi!

I used ReLERNN to estimate the recombination rate along a very long genome, and ran the analyses by pieces of 500Mb. The results between the different parts are comparable when I use the results of the "predict" function, but differ a lot after correction with the "bscorrect" function.
For example, before correction:
recombination_rate_1Mb_chr8_2
And after correction:
recombination_rate_1Mb_chr8
Any idea what could cause such differences? Is there anything I should do?
Thanks in advance!

@andrewkern
Copy link
Member

interesting behavior here. is there any reason to believe that that portion of the chromosome is non-recombining?

@jbruxaux
Copy link
Author

No, not at all. Each portion is 500Mb long (except the last one that is shorter) and harbor a lot of SNPs, and there is nothing different on any other parameter for that region (heterozygosity, nucleotide diversity...).

@andrewkern
Copy link
Member

can you rerun the BSCORRECT module? I wonder if you will get a different answer?

@jbruxaux
Copy link
Author

I will have to wait for a few days (GPU nodes in maintenance on our cluster), but I will try as soon as possible.

@andrewkern
Copy link
Member

you might also try this on a cpu?

@jbruxaux
Copy link
Author

jbruxaux commented Jun 4, 2024

Hi!

Sorry, it took me a few weeks to come back to this, but I re-ran the analysis on a smaller chromosome (only the BSCORRECT part), and I get the exact same result as last time (using the cmp command). So this part is at least consistent.

@andrewkern
Copy link
Member

okay! this is very strange. any chance this has to do with seeds being maintained across runs? are you setting seeds manually?

@jbruxaux
Copy link
Author

Yes, I used the same seed!
I re-ran again with different seeds, and here is what I get (for a smaller chromosome).
The original run:
recombination_rate_1Mb_chr12_3
The new run:
recombination_rate_1Mb_chr12_4
For reference, the uncorrected results:
recombination_rate_1Mb_chr12_2

@andrewkern
Copy link
Member

okay this is definitely a seed issue. when is the last time you pulled the code? we recently fixed a potential bug in commit #48 which may affect bootstrapping with a manual seed.

@nspope
Copy link
Contributor

nspope commented Jun 10, 2024

what version are you using @jbruxaux ? there was a bug fixed wrt to setting seeds a couple months ago (here) but you'll have to use the current github version of the repo

@andrewkern
Copy link
Member

what version are you using @jbruxaux ? there was a bug fixed wrt to setting seeds a couple months ago (here) but you'll have to use the current github version of the repo

jinx

@jbruxaux
Copy link
Author

jbruxaux commented Jun 14, 2024

A quick update. I re-ran the analysis from scratch without any seed (and with the last ReLERNN version). I still see this pattern, but not as strongly.
Here are the non-corrected results:
recombination_rate_1Mb_chr12_6

And here are the corrected results:
recombination_rate_1Mb_chr12_5

What do you think?
As you can see, my third part (from 1000 in the graph, so 1Gb in reality) shows almost no correction compared to the two others.

@nspope
Copy link
Contributor

nspope commented Sep 23, 2024

Hey @jbruxaux sorry for the long silence ... if you still have your script handy, could you please post the exact invocation for BSCORRECT? Were you using the default arguments; or a copy-paste from the examples/example_pipeline.sh (e.g. with --nSlice 2 --nReps 2); or something else?

If it's the copy-paste from example_pipeline.sh, then I think that's the issue -- two slices and two reps per slice is way to few (the defaults are 100 slices, 1000 reps per slice or thereabouts) which would make the correction very very noisy.

Thanks!

@jbruxaux
Copy link
Author

I indeed used the example_pipeline.sh script, with --nSlice 2 --nReps 2. So that explains the issue. I will give it another try with more reasonable values. Thank you for your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants