Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inStrain compare - output #164

Open
marina-vn opened this issue Oct 31, 2023 · 9 comments
Open

inStrain compare - output #164

marina-vn opened this issue Oct 31, 2023 · 9 comments

Comments

@marina-vn
Copy link

marina-vn commented Oct 31, 2023

I'm running the program "instrain Compare" (v. 1.8.0) and it only generates one file in the output directory (MG_mapped_only_iSp_COMPARE_comparisonsTable.tsv) and no figures.

The command that I'm running is:

inStrain compare -i BL090113_mapped_only_iSp/ BL090210_mapped_only_iSp/ BL090317_mapped_only_iSp/ BL090421_mapped_only_iSp/ BL090512_mapped_only_iSp/ BL090609_mapped_only_iSp/ BL090721_mapped_only_iSp/ BL090804_mapped_only_iSp/ BL090915_mapped_only_iSp/ BL091006_mapped_only_iSp/ BL091105_mapped_only_iSp/ BL091222_mapped_only_iSp/ BL100120_mapped_only_iSp/ BL100217_mapped_only_iSp/ BL100322_mapped_only_iSp/ BL100413_mapped_only_iSp/ BL100525_mapped_only_iSp/ BL100622_mapped_only_iSp/ BL100706_mapped_only_iSp/ BL100803_mapped_only_iSp/ BL100914_mapped_only_iSp/ BL101019_mapped_only_iSp/ BL101116_mapped_only_iSp/ BL101214_mapped_only_iSp/ BL110112_mapped_only_iSp/ BL110208_mapped_only_iSp/ BL110314_mapped_only_iSp/ BL110412_mapped_only_iSp/ BL110517_mapped_only_iSp/ BL110615_mapped_only_iSp/ BL110704_mapped_only_iSp/ BL110802_mapped_only_iSp/ BL110913_mapped_only_iSp/ BL111010_mapped_only_iSp/ BL111129_mapped_only_iSp/ BL111219_mapped_only_iSp/ BL120110_mapped_only_iSp/ BL120214_mapped_only_iSp/ BL120313_mapped_only_iSp/ BL120411_mapped_only_iSp/ BL120518_mapped_only_iSp/ BL120620_mapped_only_iSp/ BL120703_mapped_only_iSp/ BL120807_mapped_only_iSp/ BL120913_mapped_only_iSp/ BL121009_mapped_only_iSp/ BL121106_mapped_only_iSp/ BL121211_mapped_only_iSp/ BL130115_mapped_only_iSp/ BL130206_mapped_only_iSp/ BL130312_mapped_only_iSp/ BL130417_mapped_only_iSp/ BL130507_mapped_only_iSp/ BL130604_mapped_only_iSp/ BL130709_mapped_only_iSp/ BL130801_mapped_only_iSp/ BL130917_mapped_only_iSp/ BL131015_mapped_only_iSp/ BL131105_mapped_only_iSp/ BL131204_mapped_only_iSp/ BL140114_mapped_only_iSp/ BL140211_mapped_only_iSp/ BL140310_mapped_only_iSp/ BL140407_mapped_only_iSp/ BL140505_mapped_only_iSp/ BL140602_mapped_only_iSp/ BL140707_mapped_only_iSp/ BL140804_mapped_only_iSp/ BL140916_mapped_only_iSp/ BL141007_mapped_only_iSp/ BL141111_mapped_only_iSp/ BL141216_mapped_only_iSp/ BL150113_mapped_only_iSp/ BL150210_mapped_only_iSp/ BL150310_mapped_only_iSp/ BL150415_mapped_only_iSp/ BL150512_mapped_only_iSp/ BL150609_mapped_only_iSp/ BL150707_mapped_only_iSp/ BL150804_mapped_only_iSp/ BL150916_mapped_only_iSp/ BL151006_mapped_only_iSp/ BL151110_mapped_only_iSp/ BL151215_mapped_only_iSp/ -p 30 -d -o MG_mapped_only_iSp_COMPARE2

And the it seems to without any issue, but it doesn't generate the output that I need:


***************************************************
    ..:: inStrain compare Step 1. Load data ::..
***************************************************

Loading Profiles into RAM: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 84/84 [00:00<00:00, 219.56it/s]
1 of 1 scaffolds are in at least 2 samples
***************************************************
..:: inStrain compare Step 2. Run comparisons ::..
***************************************************

Running group 1 of 1
Comparing scaffolds: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [03:51<00:00, 231.64s/it]
***************************************************
..:: inStrain compare Step 3. Auxiliary processing ::..
***************************************************

***************************************************
..:: inStrain compare Step 4. Store results ::..
***************************************************


$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

..:: inStrain compare finished ::..

Output tables........ /scr/marina_scr/MG20/bamreads/MG_mapped_only_iSp_COMPARE2/output/
Figures.............. /scr/marina_scr/MG20/bamreads/MG_mapped_only_iSp_COMPARE2/figures/
Logging.............. /scr/marina_scr/MG20/bamreads/MG_mapped_only_iSp_COMPARE2/log/

See documentation for output descriptions - https://instrain.readthedocs.io/en/latest/

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

..:: Overall ::..
InStrain version 1.8.0 started at 2023-10-31 15:51:19 and ended at 2023-10-31 15:55:13.
Runtime = 3 minutes, 54 seconds
Command = /home/emm4/anaconda3/envs/inStrain2/bin/inStrain compare -i BL090113_mapped_only_iSp/ BL090210_mapped_only_iSp/ BL090317_mapped_only_iSp/ BL090421_mapped_only_iSp/ BL090512_mapped_only_iSp/ BL090609_mapped_only_iSp/ BL090721_mapped_only_iSp/ BL090804_mapped_only_iSp/ BL090915_mapped_only_iSp/ BL091006_mapped_only_iSp/ BL091105_mapped_only_iSp/ BL091222_mapped_only_iSp/ BL100120_mapped_only_iSp/ BL100217_mapped_only_iSp/ BL100322_mapped_only_iSp/ BL100413_mapped_only_iSp/ BL100525_mapped_only_iSp/ BL100622_mapped_only_iSp/ BL100706_mapped_only_iSp/ BL100803_mapped_only_iSp/ BL100914_mapped_only_iSp/ BL101019_mapped_only_iSp/ BL101116_mapped_only_iSp/ BL101214_mapped_only_iSp/ BL110112_mapped_only_iSp/ BL110208_mapped_only_iSp/ BL110314_mapped_only_iSp/ BL110412_mapped_only_iSp/ BL110517_mapped_only_iSp/ BL110615_mapped_only_iSp/ BL110704_mapped_only_iSp/ BL110802_mapped_only_iSp/ BL110913_mapped_only_iSp/ BL111010_mapped_only_iSp/ BL111129_mapped_only_iSp/ BL111219_mapped_only_iSp/ BL120110_mapped_only_iSp/ BL120214_mapped_only_iSp/ BL120313_mapped_only_iSp/ BL120411_mapped_only_iSp/ BL120518_mapped_only_iSp/ BL120620_mapped_only_iSp/ BL120703_mapped_only_iSp/ BL120807_mapped_only_iSp/ BL120913_mapped_only_iSp/ BL121009_mapped_only_iSp/ BL121106_mapped_only_iSp/ BL121211_mapped_only_iSp/ BL130115_mapped_only_iSp/ BL130206_mapped_only_iSp/ BL130312_mapped_only_iSp/ BL130417_mapped_only_iSp/ BL130507_mapped_only_iSp/ BL130604_mapped_only_iSp/ BL130709_mapped_only_iSp/ BL130801_mapped_only_iSp/ BL130917_mapped_only_iSp/ BL131015_mapped_only_iSp/ BL131105_mapped_only_iSp/ BL131204_mapped_only_iSp/ BL140114_mapped_only_iSp/ BL140211_mapped_only_iSp/ BL140310_mapped_only_iSp/ BL140407_mapped_only_iSp/ BL140505_mapped_only_iSp/ BL140602_mapped_only_iSp/ BL140707_mapped_only_iSp/ BL140804_mapped_only_iSp/ BL140916_mapped_only_iSp/ BL141007_mapped_only_iSp/ BL141111_mapped_only_iSp/ BL141216_mapped_only_iSp/ BL150113_mapped_only_iSp/ BL150210_mapped_only_iSp/ BL150310_mapped_only_iSp/ BL150415_mapped_only_iSp/ BL150512_mapped_only_iSp/ BL150609_mapped_only_iSp/ BL150707_mapped_only_iSp/ BL150804_mapped_only_iSp/ BL150916_mapped_only_iSp/ BL151006_mapped_only_iSp/ BL151110_mapped_only_iSp/ BL151215_mapped_only_iSp/ -p 30 -o MG_mapped_only_iSp_COMPARE2 -d

..:: Checkpoints ::..

..:: Filter reads report ::..


..:: Profile report ::..

* Profiling splits *

* Merging splits and profiling genes *

..:: Geneome level report ::..

..:: Plotting ::..

..:: Compare ::..
CreateScaffoldComparisonObjects took <1 second       ( 0.4% of overall) RAM went from 162.02 MB to 163.21 MB (increased by 1.20 MB)
multiprocessing      took 3.0 minutes, 52.0 seconds (99.1% of overall)  RAM went from 163.21 MB to 305.41 MB (increased by 142.20 MB)
SaveResults          took <1 second       ( 0.4% of overall)    RAM went from 306.69 MB to 307.27 MB (increased by 600.00 KB)

Wall time                       3 minutes, 49 seconds
Total processes used            1
Average number processes used   1.0
Paralellization efficiency      100.0%
Units profiled                  1
Average time per unit           3.0 minutes, 49.0 seconds
Median time per unit            3.0 minutes, 49.0 seconds
Maximum unit time               3.0 minutes, 49.0 seconds
Longest running unit            vSAG-37-F6
Per-process efficiency          ['100.0']
unit per-process strating RAM           ['214.34 MB']
unit per-process final RAM              ['214.34 MB']
unit per-process minimum RAM            ['214.34 MB']
unit per-process maximum RAM            ['214.34 MB']

..:: Failures ::..
No failures

Any idea of what could be the issue and hot to fix it?

Best,

log.log

@MrOlm
Copy link
Owner

MrOlm commented Oct 31, 2023

Hello @marina-vn - could you let me know what files you'd like to have and are missing? Also, are you mapping to a single genome or multiple genomes?

Thanks,
Matt

@marina-vn
Copy link
Author

Hi Matt, I'm mapping a temporal serie to a single genome and I was hopping to get some figure that shows a comparison of the samples based on SNVs, to see how the population changes month by month.

I'm looking into the "plot" module, but I'm not sure on how to use it with the "compare" results. It says that "Skipping plot 10 - you don't have all required information. You need to run inStrain genome_wide first" but in the manual it reads that if you have provided the gene file in the "profile" step is better, but thought I have done that I'm getting this message. Also, I'm not sure where it's generating the other 9 plots, because thay aren't in the directory that I'm providing not in the one that I'm working in

@MrOlm
Copy link
Owner

MrOlm commented Oct 31, 2023

Hi @marina-vn - OK this makes sense. If you want to dig into this on a SNV-by-SNV level I would consider also adding .bam files to your inStrain compare command in order to run the SNV Pooling functions.

  1. I think this is a small bug with the inStrain compare module, where it wants to have a scaffold to bin file to make that plot. I think the best workaround would just be to make an .stb file that just has you genome, and pass that to inStrain compare. That should make it make plot 10. Info on that is here: https://instrain.readthedocs.io/en/master/user_manual.html#preparing-a-scaffold-to-bin-file

  2. Plots 1-9 are generated within inStrain profile, not inStrain compare.

Best,
Matt

@marina-vn
Copy link
Author

Hi Matt

I have added the bams files an the scaffold to bin, but I'm getting this error:

Traceback (most recent call last):
  File "/home/emm4/anaconda3/envs/inStrain2/bin/inStrain", line 31, in <module>
    inStrain.controller.Controller().main(args)
  File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/controller.py", line 57, in main
    self.compare_operation(args)
  File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/controller.py", line 89, in compare_operation
    inStrain.compare_controller.CompareController(args).main()
  File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/compare_controller.py", line 46, in main
    self.parse_arguments()
  File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/compare_controller.py", line 150, in parse_arguments
    self.name2bam = gen_name_2_bam(self.inputs, args.bams, self.kwargs.get('processes', 1))
  File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/compare_controller.py", line 679, in gen_name_2_bam
    name2bam[name] = inStrain.profile.samtools_ops.prepare_bam_fie(bam, processes)
  File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/profile/samtools_ops.py", line 61, in prepare_bam_fie
    pysam.AlignmentFile(bam).mapped
  File "pysam/libcalignmentfile.pyx", line 748, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 958, in pysam.libcalignmentfile.AlignmentFile._open
  File "pysam/libchtslib.pyx", line 361, in pysam.libchtslib.HTSFile.check_truncation
OSError: no BGZF EOF marker; file may be truncated

I have installed the program this afternoon using the following command:

conda install -c conda-forge -c bioconda -c defaults instrain

Is there anything else that I should doregarding the installation to correct this error?

@MrOlm
Copy link
Owner

MrOlm commented Nov 1, 2023

Hi @marina-vn - that error means that pysam doesn't like something about the .bam files you're providing. If you provide the same .bam files that you provided to inStrain profile it should work (since it's the same pysam code in both inStrain profile and inStrain compare)

Best,
Matt

@marina-vn
Copy link
Author

Hi @MrOlm ,

I have submitted the bams in the same order than the profiles and the scaffold-to-bin file, and now I'm getting this error


..:: inStrain compare Step 3. Auxiliary processing ::..


Traceback (most recent call last):
File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3653, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'cryptic'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/emm4/anaconda3/envs/inStrain2/bin/inStrain", line 31, in <module>
    inStrain.controller.Controller().main(args)
  File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/controller.py", line 57, in main
    self.compare_operation(args)
  File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/controller.py", line 89, in compare_operation
    inStrain.compare_controller.CompareController(args).main()
  File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/compare_controller.py", line 73, in main
    self.run_auxillary_processing()
  File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/compare_controller.py", line 297, in run_auxillary_processing
    PM = inStrain.polymorpher.PoolController(self.SC_objects, self.name2bam)
  File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/inStrain/polymorpher.py", line 65, in __init__
    db = db[db['cryptic'] == False]
  File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/pandas/core/frame.py", line 3761, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/emm4/anaconda3/envs/inStrain2/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3655, in get_loc
    raise KeyError(key) from err
KeyError: 'cryptic'

@MrOlm
Copy link
Owner

MrOlm commented Nov 3, 2023

huh- could you send me the log file please? and can you confirm that each of your inStrain profile objects has at least 1 SNV called in the SNV output tables?

Best,
Matt

@marina-vn
Copy link
Author

Hi,

I have checked the profiles and removed two that didn't have any SNV, run it again and it generated the output but not the figure:

`***************************************************
..:: inStrain compare Step 1. Load data ::..


Loading Profiles into RAM: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 82/82 [00:01<00:00, 42.79it/s]
1 of 1 scaffolds are in at least 2 samples


..:: inStrain compare Step 2. Run comparisons ::..


Running group 1 of 1
Comparing scaffolds: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [03:33<00:00, 213.40s/it]


..:: inStrain compare Step 3. Auxiliary processing ::..


Pulling 74482 SNVs from 82 scaffolds. This should take ~ 1.4 min in total
Pulling SNVs from BAMs: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 82/82 [00:03<00:00, 23.05it/s]
Generating pooled SNV summary table: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.46it/s]
Cannot cluster genome 37F6.fasta; 14 of 3321 comaprisons involve no genomic overlap at all: see log for more


..:: inStrain compare Step 4. Store results ::..


making plots 10
Plotting plot 10
Cannot cluster genome 37F6.fasta; 14 of 3321 comaprisons involve no genomic overlap at all: see log for more
Done!
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

..:: inStrain compare finished ::..

Output tables........ /scr/marina_scr/MG20/bamreads/MG_mapped_only_iSp_COMPARE_bams_noempty/output/
Figures.............. /scr/marina_scr/MG20/bamreads/MG_mapped_only_iSp_COMPARE_bams_noempty/figures/
Logging.............. /scr/marina_scr/MG20/bamreads/MG_mapped_only_iSp_COMPARE_bams_noempty/log/

See documentation for output descriptions - https://instrain.readthedocs.io/en/latest/

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$`

If I remove the 3 different samples that appear to be a problem (Cannot cluster genome 37F6.fasta; 14 of 3321 comaprisons involve no genomic overlap at all: see log for more) it does generate the dendogram, but I was wondering why those samples suppose a problem when they are correctly mapped and are present in all the output files

Also, is there a way to represent the similarty between samples that isn't the dendogram? With 82 different samples, it's not a very legible way of understanding the results, I was wondering if there's an easy way of generaint a double entry table/heatmap that depicts the similarity between the samples

Best,

noempty_log.log

@MrOlm
Copy link
Owner

MrOlm commented Nov 8, 2023

Hi @marina-vn -

Yes- I agree with that many samples a dendrogram can become very unweildy! I would recommend making heatmaps from the genomewide-compare.tsv (https://instrain.readthedocs.io/en/latest/example_output.html#genomewide-compare-tsv) . The link above should make it pretty straightforward for you to do so, but please let me know if you run into trouble.

Best,
Matt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants