Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuttlefish 2 complains that the file kmc_01999_100_100.bin cannot be opened #47

Open
sebschmi opened this issue Jan 3, 2025 · 3 comments

Comments

@sebschmi
Copy link

sebschmi commented Jan 3, 2025

Input file: 2505x human pangenome called references.fa.

Cuttlefish command:

ulimit -n 2048
/bin/time -v cuttlefish build -s "references.fa" -k 27 -t 128  -o "unitigs" -w "workdir" --ref -c 1

Error log:

Constructing the compacted reference de Bruijn graph for k = 27.

Enumerating the edges of the de Bruijn graph.
<many many many * that I cut out here>
Stage 1: 100%
Warning: using counter_max == 1 will cause not storying counters in KMC output file, all counters will be assumed to be 1. This is experimental and is not currently supported in kmc_tools. Will be implemented soon.
Stage 2: 0%
*
Structural information for the de Bruijn graph is written to <my directory>/unitigs.json.
Error: can not open file : <my directory>/kmc_01999_100_100.bin

Usage :
<remainder of log removed>

Directory at time of crash:

$ ls
kmc_01999_000_000.bin
kmc_01999_001_001.bin
kmc_01999_002_002.bin
kmc_01999_003_003.bin
kmc_01999_004_004.bin
kmc_01999_005_005.bin
kmc_01999_006_006.bin
kmc_01999_007_007.bin
kmc_01999_008_008.bin
kmc_01999_009_009.bin
kmc_01999_010_010.bin
kmc_01999_011_011.bin
kmc_01999_012_012.bin
kmc_01999_013_013.bin
kmc_01999_014_014.bin
kmc_01999_015_015.bin
kmc_01999_016_016.bin
kmc_01999_017_017.bin
kmc_01999_018_018.bin
kmc_01999_019_019.bin
kmc_01999_020_020.bin
kmc_01999_021_021.bin
kmc_01999_022_022.bin
kmc_01999_023_023.bin
kmc_01999_024_024.bin
kmc_01999_025_025.bin
kmc_01999_026_026.bin
kmc_01999_027_027.bin
kmc_01999_028_028.bin
kmc_01999_029_029.bin
kmc_01999_030_030.bin
kmc_01999_031_031.bin
kmc_01999_032_032.bin
kmc_01999_033_033.bin
kmc_01999_034_034.bin
kmc_01999_035_035.bin
kmc_01999_036_036.bin
kmc_01999_037_037.bin
kmc_01999_038_038.bin
kmc_01999_039_039.bin
kmc_01999_040_040.bin
kmc_01999_041_041.bin
kmc_01999_042_042.bin
kmc_01999_043_043.bin
kmc_01999_044_044.bin
kmc_01999_045_045.bin
kmc_01999_046_046.bin
kmc_01999_047_047.bin
kmc_01999_048_048.bin
kmc_01999_049_049.bin
kmc_01999_050_050.bin
kmc_01999_051_051.bin
kmc_01999_052_052.bin
kmc_01999_053_053.bin
kmc_01999_054_054.bin
kmc_01999_055_055.bin
kmc_01999_056_056.bin
kmc_01999_057_057.bin
kmc_01999_058_058.bin
kmc_01999_059_059.bin
kmc_01999_060_060.bin
kmc_01999_061_061.bin
kmc_01999_062_062.bin
kmc_01999_063_063.bin
kmc_01999_064_064.bin
kmc_01999_065_065.bin
kmc_01999_066_066.bin
kmc_01999_067_067.bin
kmc_01999_068_068.bin
kmc_01999_069_069.bin
kmc_01999_070_070.bin
kmc_01999_071_071.bin
kmc_01999_072_072.bin
kmc_01999_073_073.bin
kmc_01999_074_074.bin
kmc_01999_075_075.bin
kmc_01999_076_076.bin
kmc_01999_077_077.bin
kmc_01999_078_078.bin
kmc_01999_079_079.bin
kmc_01999_080_080.bin
kmc_01999_081_081.bin
kmc_01999_082_082.bin
kmc_01999_083_083.bin
kmc_01999_084_084.bin
kmc_01999_085_085.bin
kmc_01999_086_086.bin
kmc_01999_087_087.bin
kmc_01999_088_088.bin
kmc_01999_089_089.bin
kmc_01999_090_090.bin
kmc_01999_091_091.bin
kmc_01999_092_092.bin
kmc_01999_093_093.bin
kmc_01999_094_094.bin
kmc_01999_095_095.bin
kmc_01999_096_096.bin
kmc_01999_097_097.bin
kmc_01999_098_098.bin
kmc_01999_099_099.bin
references.fa
unitigs.json
workdir

As visible from the ls, the file that cuttlefish tries to open actually does not exist. After execution, there is still at least 20% of disk space free, so it did not run out of disk space or anything.

@jamshed
Copy link
Member

jamshed commented Jan 3, 2025

Hi @sebschmi,

Looks like KMC is trying to open a lot of files in this instance. Can you please try with larger numbers of handles with ulimit and see if it succeeds?

@sebschmi
Copy link
Author

sebschmi commented Jan 7, 2025

Thanks for the fast response! What ulimit should I set? Now I am using 2048.

@jamshed
Copy link
Member

jamshed commented Jan 8, 2025

Hi Sebastian,

I don't have a good sense of how many files might KMC need in this case. How about trying some large count, like 16384 or so?

Also if you aren't concerned with higher RAM usage, you can also try adding -m <RAM in GB> for the execution, which may reduce the KMC bucket count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants