Skip to content

Commit

Permalink
Merge pull request #345 from rmcolq/dev
Browse files Browse the repository at this point in the history
Merge dev into master - prepare for release v0.11.0-alpha.0
  • Loading branch information
leoisl authored Aug 17, 2023
2 parents 3291165 + cf1206c commit 94a4fdb
Show file tree
Hide file tree
Showing 71 changed files with 1,314 additions and 2,732 deletions.
14 changes: 7 additions & 7 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Pandora CI

on: [ push, pull_request ]
on: [ pull_request ]

jobs:
test:
Expand All @@ -27,17 +27,17 @@ jobs:
- name: Build and test release build
run: |
mkdir build_release && cd build_release
cmake -DCMAKE_BUILD_TYPE=Release -DHUNTER_JOBS_NUMBER=4 ..
make -j4
ctest -V
cmake -DCMAKE_BUILD_TYPE=Release -DHUNTER_JOBS_NUMBER=2 ..
make -j2
# ctest -V
./pandora --help
cd ..
- name: Build and test debug build
run: |
mkdir build_debug && cd build_debug
cmake -DCMAKE_BUILD_TYPE=Debug -DHUNTER_JOBS_NUMBER=4 ..
make -j4
ctest -V
cmake -DCMAKE_BUILD_TYPE=Debug -DHUNTER_JOBS_NUMBER=2 ..
make -j2
# ctest -V
./pandora --help
cd ..
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -104,5 +104,9 @@ pandora-linux-precompiled*
/example/out/

example/make_prg*
/scripts/measure_performance_from_log/.ipynb_checkpoints/
/build_test/
.ipynb_checkpoints
scripts/compare_pandora_results/pandora_multisample.matrix*
!test/test_cases/sample_example/pangenome.prg.fa

/debugging/
44 changes: 30 additions & 14 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,37 +7,53 @@ this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm

## [Unreleased]

## [0.10.0-alpha.1]
## [0.11.0-alpha.0]

### Changed
This version is a major release that breaks backwards compatibility with previous versions of `pandora`.
It improves `pandora` runtime performance by 15x and RAM usage by 20x;

### Changed
- The `pandora` index changed from a set of files in a directory structure to a single, compressible and indexable `zip`
file (`pandora` indexes now have the suffix `.panidx.zip`). This is now the single file that is produced by the
`pandora index` command and is required as argument to all the other `pandora` commands. This index is self contained in
the sense that it encodes all the information and metadata about it (e.g. which PRGs were used to create it, window and
kmer size, etc). This new index provide the infrastructure for the next features and simplifies working with large
reference pangenome collections, with a few million PRGs. This new index breaks backwards compatibility with previous
`pandora` versions. The structure of this zip archive is as follows:
* `_prgs`: The PRGs themselves used as input to create this index;
* `_prg_names`: The names of the PRGs;
* `_prg_min_path_lengths`: the length of the shortest path through each PRG;
* `_prg_names`: The names of the PRGs used as input to create this index;
* `_prg_max_path_lengths`: the length of the longest path through each PRG;
* `_prg_lengths`: the length of the string representation of each PRG;
* `_minhash`: the minimizer hash data structure;
* `_metadata`: metadata about the index (first line is window size, second is kmer size);
* `_metadata`: metadata about the index;
* `*.gfa`: the several GFA files describing the minimizing kmer graph for each PRG;
* `*.fa`: the string representation of each PRG;
- Minimum C++ standard upgraded from `C++11` to `C++14`;
- We now test whether the genotype confidence of a variant is greater than or equal to the threshold provided by `--gt-conf`. Previously we only tested if it was greater than. [[#320][320]]
- We now test whether the genotype confidence of a variant is greater than or equal to the threshold provided by
`--gt-conf`. Previously we only tested if it was greater than;

### Removed
- Removed CLI parameters `-w` and `-k` from the following `pandora` subcommands: `compare`, `discover`, `map`,
- Removed CLI parameters `-w`, `-k` and `--clean` from the following `pandora` subcommands: `compare`, `discover`, `map`,
`seq2path`;
- Removed `merge_index` subcommand;
- Removed gene-DBG and noise-filtering modules;

### Fixed
- Several refactoring to the `pandora` index implementation;

- Fixed a major bug on finding the longest path through PRGs;
- Several refactorings to the `pandora` index implementation;
- Optimisation of the `pandora` index data structure;

### Added
- A memory-efficient way to load PRGs when indexing, where we don't need to load all PRGs at once to index them, but
just load on demand;
- A memory-efficient way to load PRGs when indexing and mapping, where we don't need to load all PRGs at once to process
them, but just load on demand (also known as lazy loading). This is particularly useful when working with very large
PanRGs;
- Random multimapping of reads if they map equally well to several graphs, reducing mapping bias. Added parameter
`--rng-seed` to `pandora map/compare/discover` commands to make multimapping deterministic, if required;
- A new parameter to deal with auto-updating error rate and kmer model (see `--auto-update-params` parameter in
`pandora map/compare/discover` commands);
- Three new parameters to control when a gene should be filtered out due to too low or too high coverage (see
`--min-abs-gene-coverage`, `--min-rel-gene-coverage` and `--max-rel-gene-coverage` parameters in
`pandora map/compare/discover` commands);


## [0.10.0-alpha.0]

Expand Down Expand Up @@ -170,8 +186,8 @@ their changes meticulously documented here.

- k-mer coverage underflow bug in `LocalPRG` [[#183][183]]

[Unreleased]: https://github.com/rmcolq/pandora/compare/0.10.0-alpha.1...HEAD
[0.10.0-alpha.1]: https://github.com/rmcolq/pandora/compare/0.10.0-alpha.1...0.10.0-alpha.0
[Unreleased]: https://github.com/rmcolq/pandora/compare/0.11.0-alpha.0...HEAD
[0.11.0-alpha.0]: https://github.com/rmcolq/pandora/compare/0.11.0-alpha.0...0.10.0-alpha.0
[0.10.0-alpha.0]: https://github.com/rmcolq/pandora/compare/0.10.0-alpha.0...0.9.2
[0.9.2]: https://github.com/rmcolq/pandora/compare/0.9.2...0.9.1
[0.9.1]: https://github.com/rmcolq/pandora/releases/tag/0.9.1
Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ HunterGate(

# project configuration
set(PROJECT_NAME_STR pandora)
project(${PROJECT_NAME_STR} VERSION "0.10.0.1" LANGUAGES C CXX)
project(${PROJECT_NAME_STR} VERSION "0.11.0" LANGUAGES C CXX)
set(ADDITIONAL_VERSION_LABELS "")
configure_file( include/version.h.in ${CMAKE_BINARY_DIR}/include/version.h )

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,13 +76,13 @@ In this binary, all libraries are linked statically.

* **Download**:
```
wget https://github.com/rmcolq/pandora/releases/download/0.10.0-alpha.1/pandora-linux-precompiled-v0.10.0-alpha.1
wget https://github.com/rmcolq/pandora/releases/download/0.11.0-alpha.0/pandora-linux-precompiled-v0.11.0-alpha.0
```

* **Running**:
```
chmod +x pandora-linux-precompiled-v0.10.0-alpha.1
./pandora-linux-precompiled-v0.10.0-alpha.1 -h
chmod +x pandora-linux-precompiled-v0.11.0-alpha.0
./pandora-linux-precompiled-v0.11.0-alpha.0 -h
```

* **Notes**:
Expand Down
Binary file modified example/out_truth/prgs/pangenome.prg.bin.zip
Binary file not shown.
Binary file modified example/out_truth/prgs/pangenome.prg.fa.panidx.zip
Binary file not shown.
Binary file modified example/out_truth/prgs/pangenome.prg.gfa.zip
Binary file not shown.
Binary file modified example/out_truth/prgs/pangenome.update_DS.zip
Binary file not shown.
Binary file modified example/out_truth/updated_prgs/pangenome_updated.prg.bin.zip
Binary file not shown.
Binary file modified example/out_truth/updated_prgs/pangenome_updated.prg.fa.panidx.zip
Binary file not shown.
Binary file modified example/out_truth/updated_prgs/pangenome_updated.prg.gfa.zip
Binary file not shown.
Binary file modified example/out_truth/updated_prgs/pangenome_updated.update_DS.zip
Binary file not shown.
4 changes: 2 additions & 2 deletions example/run_pandora.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ set -eu

########################################################################################################################
# configs
pandora_version="0.10.0-alpha.1"
pandora_version="0.11.0-alpha.0"
pandora_URL="https://github.com/rmcolq/pandora/releases/download/${pandora_version}/pandora_${pandora_version}"
make_prg_version="0.4.0"
make_prg_version="0.5.0"
make_prg_URL="https://github.com/iqbal-lab-org/make_prg/releases/download/${make_prg_version}/make_prg_${make_prg_version}"
########################################################################################################################

Expand Down
5 changes: 2 additions & 3 deletions include/Maths.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,7 @@ class Maths {
}

template <class Iterator>
inline static typename std::iterator_traits<Iterator>::value_type mean(
Iterator begin, Iterator end)
inline static double mean(Iterator begin, Iterator end)
{
typedef
typename std::iterator_traits<Iterator>::difference_type difference_type;
Expand All @@ -40,7 +39,7 @@ class Maths {
return get_default_value<Iterator>();
}

return Maths::sum(begin, end) / number_of_elements;
return ((double)Maths::sum(begin, end)) / number_of_elements;
}

template <class Iterator>
Expand Down
7 changes: 5 additions & 2 deletions include/compare_main.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
#include "pangenome/pangraph.h"
#include "pangenome/pannode.h"
#include "index.h"
#include "noise_filtering.h"
#include "estimate_parameters.h"
#include "OptionsAggregator.h"
#include "CLI11.hpp"
Expand All @@ -36,12 +35,16 @@ struct CompareOptions {
fs::path vcf_refs_file;
uint8_t verbosity { 0 };
float error_rate { 0.11 };
uint32_t rng_seed { 0 };
uint32_t genome_size { 5000000 };
uint32_t max_diff { 250 };
bool output_vcf { false };
bool illumina { false };
bool clean { false };
float min_absolute_gene_coverage { 3.0 };
float min_relative_gene_coverage { 0.05 };
float max_relative_gene_coverage { 100 };
bool binomial { false };
bool do_not_auto_update_params { false };
uint32_t max_covg { 300 };
bool genotype { false };
bool local_genotype { false };
Expand Down
54 changes: 0 additions & 54 deletions include/de_bruijn/graph.h

This file was deleted.

27 changes: 0 additions & 27 deletions include/de_bruijn/node.h

This file was deleted.

7 changes: 5 additions & 2 deletions include/denovo_discovery/discover_main.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
#include "utils.h"
#include "index.h"
#include "pangenome/pangraph.h"
#include "noise_filtering.h"
#include "estimate_parameters.h"

namespace fs = boost::filesystem;
Expand All @@ -23,13 +22,17 @@ struct DiscoverOptions {
uint32_t threads { 1 };
uint8_t verbosity { 0 };
float error_rate { 0.11 };
uint32_t rng_seed { 0 };
uint32_t genome_size { 5000000 };
uint32_t max_diff { 250 };
bool output_kg { false };
bool illumina { false };
bool clean { false };
bool binomial { false };
bool do_not_auto_update_params { false };
uint32_t max_covg { 600 };
float min_absolute_gene_coverage { 3.0 };
float min_relative_gene_coverage { 0.05 };
float max_relative_gene_coverage { 10 };
uint32_t min_cluster_size { 10 };
uint32_t max_num_kmers_to_avg { 100 };
bool keep_extra_debugging_files { false };
Expand Down
2 changes: 1 addition & 1 deletion include/estimate_parameters.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,6 @@ int find_prob_thresh(std::vector<uint32_t>&);

uint32_t estimate_parameters(std::shared_ptr<pangenome::Graph> pangraph,
const fs::path& outdir, const uint32_t k, float& e_rate, const uint32_t covg,
bool& bin, const uint32_t& sample_id);
bool& bin, const uint32_t& sample_id, bool do_not_auto_update_params);

#endif
2 changes: 1 addition & 1 deletion include/fastaq.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ struct Fastaq {
const uint_least16_t, const std::string header = "");

void add_entry(
const std::string&, const std::string&, const std::string header = "");
const std::string&, const std::string&, const std::string &header = "");

void clear();

Expand Down
2 changes: 1 addition & 1 deletion include/forward_declarations.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
struct MinimizerHit;
typedef std::shared_ptr<MinimizerHit> MinimizerHitPtr;
class MinimizerHits;
typedef std::set<MinimizerHits> MinimizerHitClusters;
class MinimizerHitClusters;
class LocalPRG;
class KmerNode;
typedef std::shared_ptr<KmerNode> KmerNodePtr;
Expand Down
1 change: 0 additions & 1 deletion include/globals.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,5 @@ class PandoraGlobals{
};

#define INDEXING_UPPER_BOUND_DEFAULT 10000000
#define ESTIMATED_INDEX_SIZE_DEFAULT 100000

#endif // PANDORA_GLOBALS_H
Loading

0 comments on commit 94a4fdb

Please sign in to comment.