field values combined_report.py script #35

bala-ruokavirasto · 2022-01-27T15:28:00Z

Dear INNUCA team,

Is there any documentation somewhere about the below field values mentioned in the combined_report.py scripy?

fields = ['#samples',
'number_reads_sequenced', 'number_bp_sequenced',
'min_reads_length', 'max_reads_length',
'reads_kraken_number_taxon_found', 'reads_kraken_percentage_unknown_fragments',
'reads_kraken_most_abundant_taxon', 'reads_kraken_percentage_most_abundant_taxon',
'first_coverage',
'trueCoverage_absent_genes', 'trueCoverage_multiple_alleles', 'trueCoverage_sample_coverage',
'second_Coverage',
'pear_assembled_reads', 'pear_unassembled_reads', 'pear_dicarded_reads',
'SPAdes_number_contigs', 'SPAdes_number_bp', 'SPAdes_filtered_contigs', 'SPAdes_filtered_bp',
'assembly_coverage_initial', 'assembly_coverage_filtered', 'mapped_reads_percentage',
'mapping_filtered_contigs', 'mapping_filtered_bp',
'Pilon_changes', 'Pilon_contigs_changed', 'Pilon_contigs', 'Pilon_bp',
'MLST_scheme', 'MLST_ST',
'assembly_kraken_number_taxon_found', 'assembly_kraken_percentage_unknown_fragments',
'assembly_kraken_most_abundant_taxon', 'assembly_kraken_percentage_most_abundant_taxon',
'insert_size_mean', 'insert_size_sd',
'final_assembly']

I would like to know some minimum information about these field values. Although most of the values were straight-forward, i like to know for sure that it means the same thing if you have some documentation for these values.

Thanks in advance,

Best Regards,
Bala

ramirma · 2022-02-01T15:06:15Z

Dear Bala,

I am afraid we never got around to create a proper documentation with a detailed description of each of those items. The ones that seem to me less straightforward are:
'first_coverage' total number of bp in output divided by the provided genome size
'trueCoverage_absent_genes' if it is one of the species for which chewBBACA has a set of reference genes (expected to be present in all isolates) it is the number of missing genes in that set.
'trueCoverage_multiple_alleles 'if it is one of the species for which chewBBACA has a set of reference genes (expected to be present in all isolates) it is the number of possible alleles present in those genes in that set (this may suggest intra-species contamination, i.e. multiple strains of the same species in the sample)

Do let us know if there is anything else we can help you with.

Best Regards,

Mario

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

field values combined_report.py script #35

field values combined_report.py script #35

bala-ruokavirasto commented Jan 27, 2022

ramirma commented Feb 1, 2022

field values combined_report.py script #35

field values combined_report.py script #35

Comments

bala-ruokavirasto commented Jan 27, 2022

ramirma commented Feb 1, 2022