Skip to content

fasta_reference

Dave Lawrence edited this page Nov 21, 2022 · 1 revision

Indexed fasta files

For fast random access, we need indexed fasta files which means they must be compressed with BGZip

If your files are gzipped, you will see the error:

[E::fai_build3_core] Cannot index files compressed with gzip, please use bgzip

Download

Pick your fasta from NCBI human genome assemblies

You can download and bgzip in 1 step via:

FASTA_VERSION=GCF_000001405.40_GRCh38.p14
FASTA_FILE=${FASTA_VERSION}_genomic.fna.gz
wget --quiet -O - https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/all_assembly_versions/${FASTA_VERSION}/${FASTA_FILE} | gzip -d | bgzip > ${FASTA_FILE}
samtools faidx ${FASTA_FILE}
Clone this wiki locally