-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unitigs with no colors #73
Comments
Dear @hangsuUNC, In color mode, it is guaranteed that each k-mer gets at least one color so if you have k-mers or unitigs without colors, it is a bug from either Bifrost or Pyfrost (which is developed independently from Bifrost). I am not familiar with the Pyfrost syntax but I had a quick look on the Pyfrost README and I saw that to iterate over pairs of k-mers and colors, the syntax is:
From your code, it seems that Thanks, |
Unfortunately, I don't know why this fails. I think that contacting first the Pyfrost author is a good idea. Make you link this issue in the one you are going to create and I'll assist with any Bifrost related test or bug. |
Thank you so much, Guillaume! Will do! Best, Hang |
Hi @hangsuUNC, I'll close this for now since it is unclear at the moment if the issue is with Pyfrost or Bifrost. Don't hesitate to reopen or link to this issue if there is some progress on the matter. Guillaume |
Hi Guillaume, Sorry for the delay of response! I contacted the pyfrost author @lrvdijk and examined different versions of Bifrost and its output graph. Lucas created a test C++ program using the Bifrost API directly (so not using the pyfrost Python library), and it still fails with kmers of no colors. It sounds like a Bifrost color matrix issue instead of the python library issue. Could you please help check the color matrix for bifrost graph? If there is any additional information you need to test, please let us know! Thanks a lot for your help! Best, Hang
|
For reference, here's the C++ test program (using Catch2 test framework, but you get the idea): #include <iostream>
#include <unordered_set>
#include <catch2/catch.hpp>
#include <ColoredCDBG.hpp>
#include <Kmer.hpp>
TEST_CASE("Test unitig color data", "[unitig_color_data]") {
CCDBG_Build_opt opt;
opt.filename_graph_in = "data/MT_graph_Bfrost_graph.gfa";
opt.filename_colors_in = "data/MT_graph_Bfrost_graph.bfg_colors";
ColoredCDBG<> ccdbg(opt.k, opt.g);
ccdbg.read(opt.filename_graph_in, opt.filename_colors_in, 2);
auto total_num_colors = ccdbg.getColorNames().size();
ofstream anchors;
anchors.open("data/anchors.txt");
for(auto const& um : ccdbg) {
auto colorset = um.getData()->getUnitigColors(um);
std::cout << "Testing colorset of " << um.getMappedHead().toString() << std::endl;
REQUIRE(colorset != nullptr);
std::unordered_map<size_t, size_t> colors_per_kmer{};
for(auto it = colorset->begin(um); it != colorset->end(); ++it) {
colors_per_kmer.emplace(it.getKmerPosition(), 0).first->second++;
}
for(auto const& p : colors_per_kmer) {
if(p.second == total_num_colors) {
anchors << um.getUnitigKmer(p.first).toString() << std::endl;
}
}
}
anchors.close();
}
Graph is created with Bifrost <1.2 and the test is also run with the same pre-1.2 Bifrost version. For many k-mers, the colorset pointer is fine, but for some it's not. (When using the Python wrapper, the same k-mer fails too). |
I attached the bifrost graphs here for your reference! Thanks in advance! Hang |
Hi @hangsuUNC, I am reopening the issue. Would it be also possible for you to share the input data used to build the graph as well as the exact Bifrost version/commit used? Thanks! Guillaume |
Hi Guillaume, Thanks for your reply! Here is the construction command: Bifrost build -t ~{num_threads} -k ~{kmersize} -i -d -c -s ~{sep=" -s " fas} -r ~{ref} -o ~{outputpref}_Bfrost_graph The docker I use is listed here:
All of the outputs are of the same issue... Here is the input file merged into a single fasta: Thanks again for your help! Best, Hang |
Hi,
Thanks for this wonderful tool! I'm writing to ask a question about the color of unitigs. I found about 3% of the unitigs created from Bifrost is of no colors. Is this because these unitigs do not exist in any of the samples (randome recombinations between a set of kmers)? I used pyfrost to load the graph and the color files to do the analysis
Here is the command:
Bifrost build -t 16 -k 31 -c -s -r -o <Bifrost_graph>
Pyfrost:
g = pyfrost.load(<Bifrost_graph>)
nodelist = list(g.nodes)
no_colors = 0
for node in nodelist:
try:
colors = g.nodes[node]['colors']
except:
no_colors += 1
Results:
![image](https://private-user-images.githubusercontent.com/129195112/262075208-3980854b-fca8-472a-968a-313930e7ceeb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5MjExNjUsIm5iZiI6MTczODkyMDg2NSwicGF0aCI6Ii8xMjkxOTUxMTIvMjYyMDc1MjA4LTM5ODA4NTRiLWZjYTgtNDcyYS05NjhhLTMxMzkzMGU3Y2VlYi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwN1QwOTM0MjVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00OTNjYTNmZGQzNDZhMGNhZGNiZDgxZWY0YWU2NWVkOTk4YmFmMmMzYTlmMjUyYjI3YmFiYWE5MmQ2YWY3ZDllJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.TuNeKv-lY26CoFDiLYQtwqhwNzNlPwZ1OoSciNMfFgY)
Thanks for your help!
Best,
Hang
The text was updated successfully, but these errors were encountered: