Skip to content

Releases: HKU-BAL/Clair3

v1.0.0

06 Mar 14:52
Compare
Choose a tag to compare
  1. Added Clair3 version number to the VCF header (#141).
  2. Fixed the numpy.int issue when using newer numpy version (#165, PR contributor @Aaron Tyler).
  3. The new version converts all IUPAC bases to 'N' in both VCF and GVCF output, use --keep_iupac_bases to keep the IUPAC bases (#153).
  4. Added options --use_longphase_for_intermediate_phasing, --use_whatshap_for_final_output_phasing, --use_longphase_for_final_output_phasing, --use_whatshap_for_final_output_haplotagging to disambiguate intermediate phasing and final VCF phasing either using WhatsHap or LongPhase, old options are still usable (#164).
  5. Fixed "shell script interpreter selection problem" when using Clair3 as a host user within a Docker container (#175).

v0.1-r12

20 Aug 03:21
2dd8b44
Compare
Choose a tag to compare
  1. CRAM input is supported (#117).
  2. Bumped up dependencies' version to "Python 3.9" (#96), "TensorFlow 2.8", "Samtools 1.15.1", "WhatsHap 1.4".
  3. VCF DP tag now shows raw coverage for both pileup and full-alignment calls (before r12, sub-sampled coverage was shown for pileup calls if average DP > 144, (#128).
  4. Fixed Illumina representation unification out-of-range error in training (#110).

v0.1-r11.1

13 Jun 03:28
Compare
Choose a tag to compare
v0.1-r11.1 Pre-release
Pre-release

Users, please ignore this pre-release. This pre-release is for Zenodo to pull and archive Clair3 for the first time.

v0.1-r11

04 Apr 10:16
e8c2e50
Compare
Choose a tag to compare
  1. Variant calling ~2.5x faster than v0.1-r10 tested with ONT Q20 data, with feature generation in both pileup and full-alignment now implemented in C (co-contributors @cjw85, @ftostevin-ont, @EpiSlim).
  2. Added the lightning-fast longphase as an option for phasing. Enable using longphase with option --longphase_for_phasing. New option is disabled by default to align with the default behavior of the previous versions, but we recommend enable when calling human variants with ≥20x long-reads).
  3. Added --min_coverage and --min_mq options (#83).
  4. Added --min_contig_size option to skip calling variants in short contigs when using genome assembly as input.
  5. Reads haplotagging after phasing before full-alignment calling now integrated into full-alignment calling to avoid generating an intermediate BAM file.
  6. Supported .csi BAM index for large references (#90). For more speedup details, please check Notes on r11.

v0.1-r11 minor 2 patches are included in all installation options

v0.1-r10

13 Jan 12:43
Compare
Choose a tag to compare
  1. Added a new ONT Guppy5 model (r941_prom_sup_g5014). Click here for some benchmarking results. This sup model is also applicable to reads called using the hac and fast mode. The old r941_prom_sup_g506 model that was fine-tuned from the Guppy3,4 model is obsoleted.

  2. Added --var_pct_phasing option to control the percentage of top ranked heterozygous pile-up variants used for WhatsHap phasing.

v0.1-r9

01 Dec 12:05
Compare
Choose a tag to compare

Added the --enable_long_indel option to output indel variant calls >50bp (#64), Click here to see more benchmarking results.

v0.1-r8

11 Nov 13:59
Compare
Choose a tag to compare
  1. Added the --enable_phasing option that adds a step after Clair3 calling to output variants phased by Whatshap (#63).
  2. Fixed unexpected program termination on successful runs.

v0.1-r7

19 Oct 09:10
6cd8994
Compare
Choose a tag to compare
  1. Increased var_pct_full in ONT mode from 0.3 to 0.7. Indel F1-score increased ~0.2%, but took ~30 minutes longer to finish calling a ~50x ONT dataset.
  2. Expand fall through to next most likely variant if network prediction has insufficient read coverage (#53 commit 09a7d18, contributor @ftostevin-ont), accuracy improved on complex Indels.
  3. Streamized pileup and full-alignment training workflows. Reduce diskspace demand in model training (#55 commit 09a7d18, contributor @ftostevin-ont).
  4. Added mini_epochs option in Train.py, performance slightly improved in training a model for ONT Q20 data using mini-epochs(#60, contributor @ftostevin-ont).
  5. Massively reduced disk space demand when outputting GVCF. Now compressing GVCF intermediate files with lz4, five times smaller with little speed penalty.
  6. Added --remove_intermediate_dirto remove intermediate files as soon as no longer needed (#48).
  7. Renamed ONT pre-trained models with Medaka's naming convention.
  8. Fixed training data spilling over to validation data (#57).

v0.1-r6

04 Sep 13:47
ab47f45
Compare
Choose a tag to compare
  1. Reduced memory footprint at the SortVcf stage(#45).
  2. Reduced ulimit -n (number of files simultaneously opened) requirement (#45, #47).
  3. Added Clair3-Illumina package in bioconda(#42)

v0.1-r5

19 Jul 15:11
Compare
Choose a tag to compare
  1. Modified data generator in model training to avoid memory exhaustion and unexpected segmentation fault by Tensorflow (contributor @ftostevin-ont ).
  2. Simplified dockerfile workflow to reuse container caching (contributor @amblina).
  3. Fixed ALT output for reference calls (contributor @wdecoster).
  4. Fixed a bug in multi-allelic AF computation (AF of [ACGT]Del variants was wrong before r5).
  5. Added AD tag to the GVCF output.
  6. Added the --call_snp_only option to only call SNP only (#40).
  7. Added pileup and full-alignment output validity check to avoid workflow crashing (#32, #38).