Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNA Mutation doesn't handle unknown bases #475

Open
Paucey opened this issue Feb 17, 2025 · 1 comment
Open

DNA Mutation doesn't handle unknown bases #475

Paucey opened this issue Feb 17, 2025 · 1 comment

Comments

@Paucey
Copy link
Contributor

Paucey commented Feb 17, 2025

From what I understand, DNA Strings will sometimes have "n" in them, to indicate an unknown base. The way DNA Mutation is set up, currently, it doesn't recognize "n" as a possible base, so if it is encountered during the mutation process, it will likely throw an error. There are two options, I think, for how to handle this, and they're both simple to implement:

  1. If encountered during mutation, an unknown base will be kept unknown. They wouldn't be considered when looking at what bases to mutate, or,

  2. If encountered during mutation, a "n" base will be treated like any other base, and changed to one of the known bases.

For option 1, it could just be a matter of filtering out "n" or "N" when producing the available mutatable positions within the original dnaString:

List<Integer> availablePositions = new ArrayList<>();
for (int i = 0; i < dnaString.length(); i++) {
  char base = dnaString.charAt(i);
  if (base == 'n' || base == 'N') {
    continue; // Skip
  }
  availablePositions.add(i);
}

For option 2, I think it would just be a matter of adding a little filter to the beginning of the getDifferentBase() method to catch them:

if (originalBase == 'n' || originalBase == 'N') { // Unknown base in original DNA sequence
      return BASES[random.nextInt(BASES.length)]; // Randomly choose any known base
}

Which do you think would be the better approach? @VerisimilitudeX

@VerisimilitudeX
Copy link
Owner

Option 1 sounds good! It’s critical that the N stays there. We can use imputation to determine its value later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants