Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seqwish crashes on chrom21 for 3 hgsvc samples #18

Open
glennhickey opened this issue Jul 8, 2019 · 4 comments
Open

seqwish crashes on chrom21 for 3 hgsvc samples #18

glennhickey opened this issue Jul 8, 2019 · 4 comments

Comments

@glennhickey
Copy link

glennhickey commented Jul 8, 2019

This ran through fine on one sample (HG00514), but when I scaled up to 3 it crashed. The input sequences can be found here:

https://transfer.sh/SZ5pU/hgsvc-chr21-seqs.tar.gz

# runs in 40min
./pan-minimap2 hg38_chr21.fa HG00514_chr21_0.fa HG00514_chr21_1.fa HG00733_chr21_0.fa HG00733_chr21_1.fa NA19240_chr21_0.fa NA19240_chr21_1.fa | fpa drop -l 10000 > hgsvc_seqwish_fpa10000.paf

# (hgsvc_chr21.fa is the above sequences catted together with hg38 first)
seqwish -s hgsvc_chr21.fa -p hgsvc_seqwish_fpa10000.paf -t 16 -b work/x -g hgsvc_seqwish_fpa10000.gfa

# crashes after 7.5 hours
seqwish: /ebs1/seqwish/src/links.cpp:23: void seqwish::derive_links(seqwish::seqindex_t&, size_t, m\
mmulti::map<long unsigned int, long unsigned int>&, mmmulti::map<long unsigned int, long unsigned i\
nt>&, mmmulti::map<long unsigned int, long unsigned int>&): Assertion `v1.size() == v2.size() == 1'\
 failed.
Command terminated by signal 6

Is it possible that 126G of RAM is not enough?

@ekg
Copy link
Owner

ekg commented Jul 8, 2019 via email

@ekg
Copy link
Owner

ekg commented Jul 8, 2019 via email

@glennhickey
Copy link
Author

The test case I sent the other day was just one sample (hg38 + 2 sequences). This one (I put a new link to the data above) contains those, plus another 4 sequences. I'm working on a disk with 1.6T free space.

I don't do any particular awking, but my sequences have unique names

grep '>' *.fa
HG00514_chr21_0.fa:>HG00514_chr21_0_0
HG00514_chr21_0.fa:>HG00514_chr21_0_1
HG00514_chr21_0.fa:>HG00514_chr21_0_2
HG00514_chr21_1.fa:>HG00514_chr21_1_0
HG00514_chr21_1.fa:>HG00514_chr21_1_1
HG00733_chr21_0.fa:>HG00733_chr21_0_0
HG00733_chr21_1.fa:>HG00733_chr21_1_0
hg38_chr21.fa:>chr21
NA19240_chr21_0.fa:>NA19240_chr21_0_0
NA19240_chr21_1.fa:>NA19240_chr21_1_0

@ekg
Copy link
Owner

ekg commented Jul 16, 2019

@glennhickey I'm not sure that the fasta reader is going to be OK with the sequences named that way. But I can't be sure that this is the problem. I'll see if I can reproduce with a simpler test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants