- add
--fast
option forclust-mst
to use the more efficient Kssd sketch strategy when computing the all-vs-all genome distances.- The
--fast
option can work together with--append
,--presketched
, and--premsted
options. - the
--drlevel
is used for setting the dimention reduction level for Kssd sketches. Default value is 3, which is corresponding to a dimention reduction of$1 / 2^{(4*3)} = 1/4096$ .
- The
- add
--newick-tree
option to output the Newick tree format forclust-mst
.
- support incrementally clustering by option
--append
accompanied with--presketched
or--premsted
options.
Note:
-
When considering the clustering of the genome set
A+B
using a pre-generated sketchA_sketch
and an appending genome setB
, it is important to note that the sketch parameter for the pre-generated sketchA_sketch
and the appending setB
may differ from that of the whole genome setA+B
. However, the impact of changes in the genome lengths of setB
on the automatically generated parameters will be minimal if they are not significant.- This is because the sketch parameters, including the
$k$ -mer size, sketch size, and containment compress ratio, for the appending genome setB
are the same as those of the pre-generated sketchA_sketch
. Additionally, the automatic parameter generation method, which is carried out using thetune_parameters()
function, depends on whole genome information such as minimum, maximum, and mean genome length. Therefore, the changes in the genome lengths of the appending setB
are unlikely to have a significant effect on the automatically generated parameters if they are not substantial.
- This is because the sketch parameters, including the
-
In the context of genome clustering, the sketches are sorted by unstable sort in a decreasing order of their genome length. Consequently, the order of sketches may undergo slight changes if there are genomes with identical lengths. However, this does not significantly affect the outcome of the clustering process.
- change the parameter parsing by CLI11.
- save the intermediate files (sketch, mst files) in binary format.
- abrogate the
-f
option for loading pre-generated sketch or MST file, replaced by--presketched
and--premsted
option.
More details by clust-mst --help
or clust-greedy --help
.
- add the parameter
-m
to set the minimum genome length (minLen), genomes with lengths less than minLen will be ignored.
- update the
calSize
of gz files for automatically generating$k$ -mer size .
- Update the latest version of robin-hood-hashing to solve the compile error with
g++ 12.0+
.
- Add the
clust-greedy
module for greedy incremental clustering. - Last MST-based clustering module is
clust-mst
module.
- First version of RabbitTClust, large-scaled genome clustering tool based on sketch technique and Minimum Spanning Tree (MST).