Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap analyses large data memory error #73

Open
Ptero64 opened this issue Jun 7, 2021 · 2 comments
Open

Bootstrap analyses large data memory error #73

Ptero64 opened this issue Jun 7, 2021 · 2 comments

Comments

@Ptero64
Copy link

Ptero64 commented Jun 7, 2021

Hello,
Iam trying to run bootstrap analyses with astral on a large dataset (~14000 loci). Unfortunatly the run failed with error message from java which seems related to memory issue?

To Reproduce
Here is the command used:

#Multi-locus bootstrapping (MLBS) (use 1000 uboot2 output from iqtree)

java -jar astral.5.7.7.jar -i ./Input_unrooted_tree/InputGenesTree_NoColap.trees -b bootstrap_genetrees_path.txt -r 1000 -s 1984 -o Results/Astral_MLBS_1000.tre 2>out_Astral_MLBS_1000.log

#version with Gene+Site resampling

java -jar astral.5.7.7.jar -i ./Input_unrooted_tree/InputGenesTree_NoColap.trees -b bootstrap_genetrees_path.txt -g -r 500 -s 1984 -o Results/Astral_MLBS_GeneSite_500.tre 2>out_Astral_MLBS_GeneSite_500.log

Log file
And the log file of out_Astral_MLBS_1000.log:
================== ASTRAL =====================

This is ASTRAL version 5.7.7
Gene trees are treated as unrooted
13388 trees read from ./Input_unrooted_tree/InputGenesTree_NoColap.trees
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
at java.lang.StringBuffer.append(StringBuffer.java:367)
at java.io.BufferedReader.readLine(BufferedReader.java:358)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at phylonet.coalescent.CommandLine.readTreeFileAsString(CommandLine.java:728)
at phylonet.coalescent.CommandLine.readOptions(CommandLine.java:374)
at phylonet.coalescent.CommandLine.main(CommandLine.java:485)

log from out_Astral_MLBS_GeneSite_1000.log
================== ASTRAL =====================

This is ASTRAL version 5.7.7
Gene trees are treated as unrooted
13388 trees read from ./Input_unrooted_tree/InputGenesTree_NoColap.trees
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:541)
at java.lang.StringBuffer.append(StringBuffer.java:350)
at java.util.regex.Matcher.appendReplacement(Matcher.java:888)
at java.util.regex.Matcher.replaceAll(Matcher.java:955)
at java.lang.String.replaceAll(String.java:2223)
at phylonet.coalescent.CommandLine.readTreeFileAsString(CommandLine.java:730)
at phylonet.coalescent.CommandLine.readOptions(CommandLine.java:374)
at phylonet.coalescent.CommandLine.main(CommandLine.java:485)

** Version
astral 5.7.7
Additional context
I try to run it on a hpc requesting 10 cores x 50G memory (high memory nodes). Input bootstrap tree are from iqtree2 (ufboot).
Astral analyses (LPP) using the same input worked correctly.
Add any other context about the problem here.

Thank you in advance for the help,

regards
nicolas

@Ptero64
Copy link
Author

Ptero64 commented Jun 7, 2021

I tried asking for 100 replicates and this time I have this error message:

================== ASTRAL =====================

This is ASTRAL version 5.7.7
Gene trees are treated as unrooted
13388 trees read from ./Input_unrooted_tree/InputGenesTree_NoColap.trees
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:541)
at java.lang.StringBuffer.append(StringBuffer.java:350)
at java.util.regex.Matcher.appendReplacement(Matcher.java:888)
at java.util.regex.Matcher.replaceAll(Matcher.java:955)
at java.lang.String.replaceAll(String.java:2223)
at phylonet.coalescent.CommandLine.readTreeFileAsString(CommandLine.java:730)
at phylonet.coalescent.CommandLine.readOptions(CommandLine.java:374)
at phylonet.coalescent.CommandLine.main(CommandLine.java:485)

@smirarab
Copy link
Owner

The issue is lack of memory. ASTRAL bootstrapping is a bit inefficient with memory. Two solutions come to mind.

  • If you have 50GB on your machine, you can add the -Xmx option. For example, you can run
java -Xmx47g -jar astral.5.7.7.jar -i ./Input_unrooted_tree/InputGenesTree_NoColap.trees -b bootstrap_genetrees_path.txt -g -r 100 -s 1984 -o Results/Astral_MLBS_GeneSite_500.tre 2>out_Astral_MLBS_GeneSite_500.log

This will tell Java that it can use up to 47G of memory. If you are running other things on that machine, you may want to reduce that a bit.

  • ASTRAL bootstrapping is not anything other than running ASTRAL 101 or 1001 times. If you just manually run ASTRAL those many times, it will work just fine. To do that, you would need to:
  1. Create the 100 or 1000 bootstrap replicate inputs to astral. ASTRAL can do that using -k bootstraps_norun option. So you would run
java -Xmx47g -jar astral.5.7.7.jar -i ./Input_unrooted_tree/InputGenesTree_NoColap.trees -b bootstrap_genetrees_path.txt -g -r 100 -s 1984 -o Results/MLBS-reps -k bootstraps_norun 2>out_Astral_MLBS_GeneSite_500.log

This will produce files like Results/MLBS-reps.35.bs.

  1. Then, you will run ASTRAL on each of these files separately.
  2. You also run ASTRAL on main input files with no bootstrapping
  3. Draw bipartition support onto the ASTRAL tree using the collection of ASTRAL bootstrap replicate trees. Many tools can do this. My favorite is RAxML's -f b option.

Hope one of these two solutions help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants