Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recombination event interval prediction #173

Open
XingerTang opened this issue Aug 28, 2024 · 0 comments
Open

Recombination event interval prediction #173

XingerTang opened this issue Aug 28, 2024 · 0 comments

Comments

@XingerTang
Copy link
Contributor

XingerTang commented Aug 28, 2024

@gregorgorjanc @AprilYUZhang
While the prediction of the exact locus of recombination is challenging with the current estimated segregation data, an alternative could be the prediction of the interval of the recombination events.

Based on the estimated segregation data with the highest accuracy we have ever obtained (the one with random peeling update in #171), we can make use of the previous method of estimating the sites of recombinations to predict the interval of the recombination. The best result reaches 0.9 precision and 0.7 recall at the same time for the specific generation, however, the mean length of the interval is about 60 loci.

The method we use is the one that calculates the sum of the multiplications of the differences between the probabilities of different segregation states of adjacent loci. The advantages of using this method including:

  1. ability to detect recombination sites that are not found by the other methods,
  2. narrower interval predicted,
  3. less wrongly predicted intervals,

which can be illustrated by the following figures, where lines of different colors representing different methods to predict the recombination events and the red cross represents the true recombination sites:
Ind 139
Ind 173
Ind 646

The prediction is achieved by first finding the sites which have high absolute values (spikes) and then comparing the "gradient" between the spikes with thresholds. If the gradient is higher than the threshold then we would predict that there is a recombination in the interval.

While evaluating the results, each interval can only be assigned to one true recombination site, and each true recombination site can only be assigned to one predicted interval as well.

The final prediction result is the following:

40spikes

The mean lengths of the intervals can be further restricted, however, it would reduce the recall and prediction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant