Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

field values combined_report.py script #35

Open
bala-ruokavirasto opened this issue Jan 27, 2022 · 1 comment
Open

field values combined_report.py script #35

bala-ruokavirasto opened this issue Jan 27, 2022 · 1 comment

Comments

@bala-ruokavirasto
Copy link

Dear INNUCA team,

Is there any documentation somewhere about the below field values mentioned in the combined_report.py scripy?

fields = ['#samples',
'number_reads_sequenced', 'number_bp_sequenced',
'min_reads_length', 'max_reads_length',
'reads_kraken_number_taxon_found', 'reads_kraken_percentage_unknown_fragments',
'reads_kraken_most_abundant_taxon', 'reads_kraken_percentage_most_abundant_taxon',
'first_coverage',
'trueCoverage_absent_genes', 'trueCoverage_multiple_alleles', 'trueCoverage_sample_coverage',
'second_Coverage',
'pear_assembled_reads', 'pear_unassembled_reads', 'pear_dicarded_reads',
'SPAdes_number_contigs', 'SPAdes_number_bp', 'SPAdes_filtered_contigs', 'SPAdes_filtered_bp',
'assembly_coverage_initial', 'assembly_coverage_filtered', 'mapped_reads_percentage',
'mapping_filtered_contigs', 'mapping_filtered_bp',
'Pilon_changes', 'Pilon_contigs_changed', 'Pilon_contigs', 'Pilon_bp',
'MLST_scheme', 'MLST_ST',
'assembly_kraken_number_taxon_found', 'assembly_kraken_percentage_unknown_fragments',
'assembly_kraken_most_abundant_taxon', 'assembly_kraken_percentage_most_abundant_taxon',
'insert_size_mean', 'insert_size_sd',
'final_assembly']

I would like to know some minimum information about these field values. Although most of the values were straight-forward, i like to know for sure that it means the same thing if you have some documentation for these values.

Thanks in advance,

Best Regards,
Bala

@ramirma
Copy link
Member

ramirma commented Feb 1, 2022

Dear Bala,

I am afraid we never got around to create a proper documentation with a detailed description of each of those items. The ones that seem to me less straightforward are:
'first_coverage' total number of bp in output divided by the provided genome size
'trueCoverage_absent_genes' if it is one of the species for which chewBBACA has a set of reference genes (expected to be present in all isolates) it is the number of missing genes in that set.
'trueCoverage_multiple_alleles 'if it is one of the species for which chewBBACA has a set of reference genes (expected to be present in all isolates) it is the number of possible alleles present in those genes in that set (this may suggest intra-species contamination, i.e. multiple strains of the same species in the sample)

Do let us know if there is anything else we can help you with.

Best Regards,

Mario

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants