Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unitigs with no colors #73

Open
hangsuUNC opened this issue Aug 21, 2023 · 10 comments
Open

unitigs with no colors #73

hangsuUNC opened this issue Aug 21, 2023 · 10 comments

Comments

@hangsuUNC
Copy link

hangsuUNC commented Aug 21, 2023

Hi,

Thanks for this wonderful tool! I'm writing to ask a question about the color of unitigs. I found about 3% of the unitigs created from Bifrost is of no colors. Is this because these unitigs do not exist in any of the samples (randome recombinations between a set of kmers)? I used pyfrost to load the graph and the color files to do the analysis

Here is the command:
Bifrost build -t 16 -k 31 -c -s -r -o <Bifrost_graph>
Pyfrost:
g = pyfrost.load(<Bifrost_graph>)
nodelist = list(g.nodes)
no_colors = 0
for node in nodelist:
try:
colors = g.nodes[node]['colors']
except:
no_colors += 1

Results:
image

Thanks for your help!

Best,

Hang

@GuillaumeHolley
Copy link
Collaborator

Dear @hangsuUNC,

In color mode, it is guaranteed that each k-mer gets at least one color so if you have k-mers or unitigs without colors, it is a bug from either Bifrost or Pyfrost (which is developed independently from Bifrost). I am not familiar with the Pyfrost syntax but I had a quick look on the Pyfrost README and I saw that to iterate over pairs of k-mers and colors, the syntax is:

for n, data in g.nodes(data=True):
    for c in data['colors']:
        print("Node", n, "has color", c)

From your code, it seems that data=True is not used and that you don't access colors per k-mer but per unitig instead which I am not sure is possible. Could you look into that first?

Thanks,
Guillaume

@hangsuUNC
Copy link
Author

hangsuUNC commented Aug 21, 2023

Hi Guillaume,

Thanks for your reply! I tried:
`
nodes_info = []

for n, data in g.nodes(data=True):

num = 0

for c in data['colors']:

    num += 1

nodes_info.append(["Node", n, "has color", c])

`

Got an error message:
image

Not sure what does that mean... Will contact pyfrost author later!

Thank you!

Hang

@GuillaumeHolley
Copy link
Collaborator

GuillaumeHolley commented Aug 21, 2023

Unfortunately, I don't know why this fails. I think that contacting first the Pyfrost author is a good idea. Make you link this issue in the one you are going to create and I'll assist with any Bifrost related test or bug.

@hangsuUNC
Copy link
Author

Thank you so much, Guillaume! Will do!

Best,

Hang

@GuillaumeHolley
Copy link
Collaborator

Hi @hangsuUNC,

I'll close this for now since it is unclear at the moment if the issue is with Pyfrost or Bifrost. Don't hesitate to reopen or link to this issue if there is some progress on the matter.

Guillaume

@hangsuUNC
Copy link
Author

hangsuUNC commented Nov 8, 2023

Hi Guillaume,

Sorry for the delay of response! I contacted the pyfrost author @lrvdijk and examined different versions of Bifrost and its output graph. Lucas created a test C++ program using the Bifrost API directly (so not using the pyfrost Python library), and it still fails with kmers of no colors. It sounds like a Bifrost color matrix issue instead of the python library issue.

Could you please help check the color matrix for bifrost graph? If there is any additional information you need to test, please let us know!

Thanks a lot for your help!

Best,

Hang

Hi @hangsuUNC,

I'll close this for now since it is unclear at the moment if the issue is with Pyfrost or Bifrost. Don't hesitate to reopen or link to this issue if there is some progress on the matter.

Guillaume

@lrvdijk
Copy link

lrvdijk commented Nov 8, 2023

For reference, here's the C++ test program (using Catch2 test framework, but you get the idea):

#include <iostream>
#include <unordered_set>

#include <catch2/catch.hpp>
#include <ColoredCDBG.hpp>
#include <Kmer.hpp>

TEST_CASE("Test unitig color data", "[unitig_color_data]") {
    CCDBG_Build_opt opt;
    opt.filename_graph_in = "data/MT_graph_Bfrost_graph.gfa";
    opt.filename_colors_in = "data/MT_graph_Bfrost_graph.bfg_colors";

    ColoredCDBG<> ccdbg(opt.k, opt.g);
    ccdbg.read(opt.filename_graph_in, opt.filename_colors_in, 2);
    auto total_num_colors = ccdbg.getColorNames().size();

    ofstream anchors;
    anchors.open("data/anchors.txt");

    for(auto const& um : ccdbg) {
        auto colorset = um.getData()->getUnitigColors(um);
        std::cout << "Testing colorset of " << um.getMappedHead().toString() << std::endl;
        REQUIRE(colorset != nullptr);

        std::unordered_map<size_t, size_t> colors_per_kmer{};

        for(auto it = colorset->begin(um); it != colorset->end(); ++it) {
            colors_per_kmer.emplace(it.getKmerPosition(), 0).first->second++;
        }

        for(auto const& p : colors_per_kmer) {
            if(p.second == total_num_colors) {
                anchors << um.getUnitigKmer(p.first).toString() << std::endl;
            }
        }
    }

    anchors.close();
}

Graph is created with Bifrost <1.2 and the test is also run with the same pre-1.2 Bifrost version.

For many k-mers, the colorset pointer is fine, but for some it's not.

Screenshot 2023-11-08 at 3 12 55 PM

(When using the Python wrapper, the same k-mer fails too).

@hangsuUNC
Copy link
Author

bifrost_graphs.zip

I attached the bifrost graphs here for your reference!

Thanks in advance!

Hang

@GuillaumeHolley
Copy link
Collaborator

Hi @hangsuUNC,

I am reopening the issue. Would it be also possible for you to share the input data used to build the graph as well as the exact Bifrost version/commit used? Thanks!

Guillaume

@hangsuUNC
Copy link
Author

Hi Guillaume,

Thanks for your reply! Here is the construction command:

Bifrost build -t ~{num_threads} -k ~{kmersize} -i -d -c -s ~{sep=" -s " fas} -r ~{ref} -o ~{outputpref}_Bfrost_graph

The docker I use is listed here:

  1. hangsuunc/bifrost:v1 (Bifrost 1.2.0 )
  2. us-central1-docker.pkg.dev/broad-dsp-lrma/fusilli/fusilli:devel (Bifrost 1.0.6.5)

All of the outputs are of the same issue...

Here is the input file merged into a single fasta:
all.fasta.gz

Thanks again for your help!

Best,

Hang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants