Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MISC] Display layout #191

Open
eseiler opened this issue Jul 28, 2023 · 2 comments
Open

[MISC] Display layout #191

eseiler opened this issue Jul 28, 2023 · 2 comments
Assignees

Comments

@eseiler
Copy link
Member

eseiler commented Jul 28, 2023

  • Confirm difference between k-mer and minimiser view with current display_layout
  • Add R scripts

Need to reconfirm:

I noticed that it makes a difference whether prepared files (uses canonical k-mers) are used or the original sequence files (uses single-strand k-mers).

For RefSeq:

                Sum size        Sum shared
Sequence files  33,144,873,270  9,978,878
Minimizer files 30,633,516,270  29,443,347
@eseiler eseiler self-assigned this Jul 28, 2023
@eseiler
Copy link
Member Author

eseiler commented Jul 29, 2023

After rerun (files in #192):

Sum Estimated Size Sum Actual Shared k-mers Percent Shared
Sequence Files 33,144,869,031 9,564,640 0.029
Minimiser Files 30,751,967,528 37,070,817 0.121

@eseiler
Copy link
Member Author

eseiler commented Aug 9, 2023

Using minimisers seems to result in better layouts, see new plots in #195

Sum Estimated Size Sum Actual Shared k-mers Percent Shared
Sequence Files 33,144,869,031 9,564,641 0.029
Minimiser Files 28,332,169,000 47,688,807 0.168

For minimiser files, two bins stand out:

tb_index size      shared_size   ub_count kind   splits
46       3,848,605 3,734,818     192      merged 1
47       3,854,088 3,658,388     192      merged 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant