Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with constax :( #15

Open
mashalcopperman opened this issue Mar 18, 2024 · 2 comments
Open

issue with constax :( #15

mashalcopperman opened this issue Mar 18, 2024 · 2 comments

Comments

@mashalcopperman
Copy link

mashalcopperman commented Mar 18, 2024

hi there, I'm having an issue I hope you can help with. the checks were okay, but there is an error during the training process. I attached the log file,(let me know if you can access that alright).

log_constax2_2024-03-18_13-13-54.txt

(constax2) [copperm2@dev-amd20 Cecilia]$ constax -d /mnt/home/copperm2/Databases/sh_general_release_dynamic_s_all_25.07.2023.fasta -t -f /mnt/home/copperm2/Databases/trainfiles/UNITE --mem 128 -n 16 -i outputs/14_constax_euk/train.fasta -b
Welcome to CONSTAX version 2.0.19 build 0 - The CONSensus TAXonomy classifier
This software is distributed under MIT License
© Copyright 2022, Julian A. Liber, Gian M. N. Benucci & Gregory M. Bonito
https://github.com/liberjul/CONSTAXv2
https://constax.readthedocs.io/

Please cite us as:
CONSTAX2: Improved taxonomic classification of environmental DNA markers
Julian Aaron Liber, Gregory Bonito, Gian Maria Niccolò Benucci
Bioinformatics, Volume 37, Issue 21, 1 November 2021, Pages 3941–3943; doi: https://doi.org/10.1093/bioinformatics/btab347
Overwriting previous classification...
Performing training and overwriting training files...
Using the user-supplied pathfile at /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/pathfile.txt
All needed executables exist.
SINTAX: vsearch
RDP: classifier
CONSTAX: /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0
python /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/detect_format.py -d /mnt/home/copperm2/Databases/sh_general_release_dynamic_s_all_25.07.2023.fasta 2>&1
UNITE
Memory size: 128mb
python /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/FormatRefDB.py -d /mnt/home/copperm2/Databases/sh_general_release_dynamic_s_all_25.07.2023.fasta -t /mnt/home/copperm2/Databases/trainfiles/UNITE -f UNITE -p /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0
Importing subscripts from /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0


Reformatting database

UNITE format detected

Reference database FASTAs formatted in 2.970266915 seconds...

Training Taxonomy


Adding Full Lineage

Database formatting complete



Training SINTAX Classifier
vsearch -makeudb_usearch /mnt/home/copperm2/Databases/trainfiles/UNITE/sh_general_release_dynamic_s_all_25.07.2023__UTAX.fasta -output /mnt/home/copperm2/Databases/trainfiles/UNITE/sintax.db
^[__________________________________________________________________________
Training BLAST Classifier
makeblastdb -in /mnt/home/copperm2/Databases/trainfiles/UNITE/sh_general_release_dynamic_s_all_25.07.2023__RDP_trained.fasta -dbtype nucl -out /mnt/home/copperm2/Databases/trainfiles/UNITE/sh_general_release_dynamic_s_all_25.07.2023__BLAST

Building a new DB, current time: 03/18/2024 13:17:43
New DB name: /mnt/home/copperm2/Databases/trainfiles/UNITE/sh_general_release_dynamic_s_all_25.07.2023__BLAST
New DB title: /mnt/home/copperm2/Databases/trainfiles/UNITE/sh_general_release_dynamic_s_all_25.07.2023__RDP_trained.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 326727 sequences in 80.4233 seconds.


Training RDP Classifier
classifier train -o /mnt/home/copperm2/Databases/trainfiles/UNITE/. -s /mnt/home/copperm2/Databases/trainfiles/UNITE/sh_general_release_dynamic_s_all_25.07.2023__RDP_trained.fasta -t /mnt/home/copperm2/Databases/trainfiles/UNITE/sh_general_release_dynamic_s_all_25.07.2023__RDP_taxonomy_trained.txt -Xmx > rdp_train.out 2>&1
cp /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/rRNAClassifier.properties /mnt/home/copperm2/Databases/trainfiles/UNITE/


Assigning taxonomy to OTU's representative sequences
python /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/check_input_names.py -i outputs/14_constax_euk/train.fasta
vsearch -sintax -db /mnt/home/copperm2/Databases/trainfiles/UNITE/sintax.db -tabbedout ./taxonomy_assignments/otu_taxonomy.sintax -strand both -sintax_cutoff 0.8 -threads 16
sed -i'' -e 's|([0-1][.][0-9]{2}|&00|g' ./taxonomy_assignments/otu_taxonomy.sintax
python /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/split_inputs.py -i
Input FASTA:

./taxonomy_assignments/blast.out
blastn -query _*.fasta -db /mnt/home/copperm2/Databases/trainfiles/UNITE/ -num_threads 16 -outfmt 7 qacc sacc evalue bitscore pident qcovs -max_target_seqs 10 >> ./taxonomy_assignments/blast.out
python /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/blast_to_df.py -i ./taxonomy_assignments/blast.out -o ./taxonomy_assignments/otu_taxonomy.blast -d /mnt/home/copperm2/Databases/sh_general_release_dynamic_s_all_25.07.2023.fasta -t /mnt/home/copperm2/Databases/trainfiles/UNITE -f UNITE
classifier classify --conf 0.8 --format allrank --train_propfile /mnt/home/copperm2/Databases/trainfiles/UNITE/rRNAClassifier.properties -o ./taxonomy_assignments/otu_taxonomy.rdp -Xmx
Command Error: Failed to find input file ""
usage: [options] [,idmappingfile] ...
-b,--bootstrap_outfile the output file containing the number of
matching assignments out of 100 bootstraps for
major ranks. Default is null
-c,--conf assignment confidence cutoff used to determine
the assignment count for each taxon. Range
[0-1], Default is 0.8.
-d,--metadata the tab delimited metadata file for the samples,
with first row containing attribute name and
first column containing the sample name
-f,--format tab-delimited output format:
[allrank|fixrank|biom|filterbyconf|db]. Default
is allRank.
allrank: outputs the results for all ranks
applied for each sequence: seqname, orientation,
taxon name, rank, conf, ...
fixrank: only outputs the results for fixed
ranks in order: domain, phylum, class, order,
family, genus
biom: outputs rich dense biom format if OTU or
metadata provided
filterbyconf: only outputs the results for major
ranks as in fixrank, results below the
confidence cutoff were bin to a higher rank
unclassified_node
db: outputs the seqname, trainset_no, tax_id,
conf.
-g,--gene 16srrna, fungallsu, fungalits_warcup,
fungalits_unite. Default is 16srrna. This option
can be overwritten by -t option
-h,--hier_outfile tab-delimited output file containing the
assignment count for each taxon in the
hierarchical format. Default is null.
-m,--biomFile the input clluster biom file. The classification
result will replace the taxonomy of the
corresponding cluster id.
-o,--outputFile tab-delimited text output file for
classification assignment.
-q,--queryFile legacy option, no longer needed
-s,--shortseq_outfile the output file containing the sequence names
that are too short to be classified
-t,--train_propfile property file containing the mapping of the
training files if not using the default. Note:
the training files and the property file should
be in the same directory.
-w,--minWords minimum number of words for each bootstrap
trial. Default(maximum) is 1/8 of the words of
each sequence. Minimum is 5


Combining Taxonomies
python /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/CombineTaxonomy.py -c 0.8 -o ./outputs/ -x ./taxonomy_assignments/ -b -e 1.0 -m 10 -p 0.0 -f UNITE -d /mnt/home/copperm2/Databases/sh_general_release_dynamic_s_all_25.07.2023.fasta -t /mnt/home/copperm2/Databases/trainfiles/UNITE -i False --hl null --iso_qc 75 --iso_id 1 --hl_qc 75 --hl_id 1 -s false -n false


packages in environment at /mnt/home/copperm2/miniconda3/envs/constax2:

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
blast 2.5.0 hc0b0e79_3 bioconda
boost 1.80.0 py311h59ea3da_4 conda-forge
boost-cpp 1.80.0 h75c5d50_0 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
ca-certificates 2022.12.7 ha878542_0 conda-forge
constax 2.0.19 pyhdfd78af_0 bioconda
icu 70.1 h27087fc_0 conda-forge
ld_impl_linux-64 2.39 hcc3a1bd_1 conda-forge
libblas 3.9.0 16_linux64_openblas conda-forge
libcblas 3.9.0 16_linux64_openblas conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 12.2.0 h65d4601_19 conda-forge
libgfortran-ng 12.2.0 h69a702a_19 conda-forge
libgfortran5 12.2.0 h337968e_19 conda-forge
libgomp 12.2.0 h65d4601_19 conda-forge
liblapack 3.9.0 16_linux64_openblas conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge
libsqlite 3.40.0 h753d276_0 conda-forge
libstdcxx-ng 12.2.0 h46fd767_19 conda-forge
libuuid 2.32.1 h7f98852_1000 conda-forge
libzlib 1.2.13 h166bdaf_4 conda-forge
ncurses 6.3 h27087fc_1 conda-forge
numpy 1.24.1 pypi_0 pypi
openjdk 8.0.332 h166bdaf_0 conda-forge
openssl 3.0.7 h0b41bf4_1 conda-forge
pandas 1.5.2 pypi_0 pypi
pip 22.3.1 pyhd8ed1ab_0 conda-forge
python 3.11.0 he550d4f_1_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-tzdata 2024.1 pyhd8ed1ab_0 conda-forge
python_abi 3.11 3_cp311 conda-forge
pytz 2022.7.1 pyhd8ed1ab_0 conda-forge
rdptools 2.0.3 hdfd78af_1 bioconda
readline 8.1.2 h0f457ee_0 conda-forge
setuptools 66.0.0 pyhd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
tk 8.6.12 h27826a3_0 conda-forge
tzdata 2022g h191b570_0 conda-forge
vsearch 2.22.1 hf1761c0_0 bioconda
wheel 0.38.4 pyhd8ed1ab_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
zlib 1.2.13 h166bdaf_4 conda-forge
zstd 1.5.2 h6239696_4 conda-forge


(constax2) [copperm2@dev-amd20 Cecilia]$ constax -d /mnt/home/copperm2/Databases/sh_general_release_dynamic_s_all_25.07.2023.fasta -t -f /mnt/home/copperm2/Databases/trainfiles/UNITE --mem 128 --check -i outputs/14_constax_euk/train.fasta -b
Welcome to CONSTAX version 2.0.19 build 0 - The CONSensus TAXonomy classifier
This software is distributed under MIT License
© Copyright 2022, Julian A. Liber, Gian M. N. Benucci & Gregory M. Bonito
https://github.com/liberjul/CONSTAXv2
https://constax.readthedocs.io/

Please cite us as:
CONSTAX2: Improved taxonomic classification of environmental DNA markers
Julian Aaron Liber, Gregory Bonito, Gian Maria Niccolò Benucci
Bioinformatics, Volume 37, Issue 21, 1 November 2021, Pages 3941–3943; doi: https://doi.org/10.1093/bioinformatics/btab347
Overwriting previous classification...
Performing training and overwriting training files...
Using the user-supplied pathfile at /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/pathfile.txt
All needed executables exist.
SINTAX: vsearch
RDP: classifier
CONSTAX: /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0
python /mnt/home/copperm2/miniconda3/envs/constax2/opt/constax-2.0.19-0/detect_format.py -d /mnt/home/copperm2/Databases/sh_general_release_dynamic_s_all_25.07.2023.fasta 2>&1
UNITE
Memory size: 128mb
All checks passed, rerun without --check flag.

@liberjul
Copy link
Owner

Hi @mashalcopperman,

The issue I see in the log file is that the inputs didn't get formatted corrected due to the python module numpy and pandas being missing. You should be able to just install these on the command line with the right conda environment activated:

pip install numpy pandas

When you run CONSTAX again, you can remove the -t/--train flag, given that the training was successful.

I hope that helps,

Julian

@mashalcopperman
Copy link
Author

mashalcopperman commented Mar 19, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants