Skip to content

Commit

Permalink
Update CARD-Download-README.txt
Browse files Browse the repository at this point in the history
  • Loading branch information
Cateline authored Oct 25, 2024
1 parent 2d80ab5 commit b709416
Showing 1 changed file with 15 additions and 53 deletions.
68 changes: 15 additions & 53 deletions case_studies/CARD/CARD_data/CARD-Download-README.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
# CARD README

Use or reproduction of these materials, in whole or in part, by any commercial
organization whether or not for non-commercial (including research) or commercial purposes
is prohibited, except with written permission of McMaster University. Commercial uses are
offered only pursuant to a written license and user fee. To obtain permission and begin
the licensing process, see http://card.mcmaster.ca/about.
## Source:
This dataset was downloaded from the Comprehensive Antibiotic Resistance Database (CARD) in 2024-10 at https://card.mcmaster.ca/download/0/broadstreet-v3.3.0.tar.bz2


CITATION:

Expand All @@ -14,57 +12,21 @@ prediction at the Comprehensive Antibiotic Resistance Database" Nucleic Acids Re

## CARD SHORT NAMES

A CARD-specific abbreviation for AMR gene names associated with Antibiotic Resistance
Ontology terms, often not based on the literature. This is used for programmatic and
compatibility purposes and is not ontologically relevant. Each ontology term with an
associated AMR detection model has a CARD Short Name that appears in CARD data files
and output generated by RGI. If the original gene name is less than 15 characters, the
CARD short name is identical; if the gene name is greater than 15 characters, the CARD
Short Name has been abbreviated by CARD curators specifically to identify the proper
gene or protein name. All CARD Short Names are unique and have whitespace characters
replaced by underscore characters. The convention for pathogen names is capitalized
first letter of the genus followed by the lowercase first three letters of the species
name. The antibiotic abbreviations are from https://journals.asm.org/journal/aac/abbreviations
plus some custom abbreviations by the CARD curators. Simple CARD Short Names often do not
involve either, e.g. CTX-M-15, but where applicable the CARD Short Names follow pathogen_gene
or pathogen_gene_drug. The full lists of abbreviations can be found in the enclosed files:

"shortname_antibiotics.tsv"
"shortname_pathogens.tsv"

## FASTA

Nucleotide and corresponding protein FASTA downloads are available as separate files for
each model type. For example, the "protein homolog" model type contains sequences of
antimicrobial resistance genes that do not include mutation as a determinant of resistance
- these data are appropriate for BLAST analysis of metagenomic data or searches excluding
secondary screening for resistance mutations. In contrast, the "protein variant" model
includes reference wild type sequences used for mapping SNPs conferring antimicrobial
resistance - without secondary mutation screening, analyses using these data will include
false positives for antibiotic resistant gene variants or mutants.

## MODELS
The CARD database uses standardized abbreviations, known as CARD Short Names, for AMR gene names associated with Antibiotic Resistance Ontology terms. These names are created for compatibility across data files and outputs from the Resistance Gene Identifier (RGI). Short Names for genes with 15 or fewer characters retain the original gene name, while longer names are abbreviated to uniquely represent each gene or protein. All CARD Short Names replace whitespace with underscores. For pathogen names, CARD follows the convention of capitalizing the first letter of the genus followed by the first three letters of the species in lowercase. Where applicable, CARD Short Names adopt formats such as “pathogen_gene,” “pathogen_gene_drug,” or “gene_drug.” Full lists of these abbreviations are available in the provided files:

The file "card.json" contains the complete data for all of CARD's AMR detection models,
including reference sequences, SNP mapping data, model parameters, and ARO classification.
"card.json" is used by the Resistance Gene Identifier software.
shortname_antibiotics.tsv
shortname_pathogens.tsv"

Values for "High Confidence TB", "Moderate Confidence TB", "Minimal Confidence TB", and
"Indeterminate Confidence TB" were obtained from https://platform.reseqtb.org.

## INDEX FILES

The file "aro_index.tsv" contains a list of ARO tagging of GenBank accessions stored in
CARD.
## FASTA

The file "aro_categories.tsv" contains a list of ARO terms used to categorize all entries
in CARD and results via the RGI. These categories reflect AMR gene family, target drug
class, and mechanism of resistance.
The FASTA files included here contain retrieved sequences of antimicrobial resistance genes.

The file "aro_categories_index.tsv" contains a list a GenBank accessions stored
in CARD cross-referenced with the major categories within the ARO. These categories
reflect AMR gene family, target drug class, and mechanism of resistance, so GenBank
accessions may have more than one cross-reference. For more complex categorization of
the data, use the full ARO available at http://card.mcmaster.ca/download.
## Data Files Downloaded
aro_index.tsv
This file contains an index of ARO (Antibiotic Resistance Ontology) identifiers with associated GenBank accessions. Each entry includes information used to link antibiotic resistance genes to GenBank sequences.
shortname_antibiotics.tsv
Contains standardized abbreviations for antibiotics used in CARD’s short names. These abbreviations, which follow conventions from the American Society for Microbiology (ASM) and additional custom terms, provide a uniform naming system for antibiotics referenced within CARD data.

The file "snps.txt" lists the SNPs associated with specific detection models.
shortname_pathogens.tsv
Lists standardized abbreviations for pathogens used in CARD. Each abbreviation represents pathogen names in a condensed format, commonly the first letter of the genus followed by the first three letters of the species. This abbreviation system simplifies pathogen referencing in CARD outputs.

0 comments on commit b709416

Please sign in to comment.