Merge pull request #3 from ZKI-PH-ImageAnalysis/dev

v0.2.0
ZKI-PH-ImageAnalysis · Aug 16, 2024 · 09df29f · 09df29f
2 parents 5a7974e + 38ba194
commit 09df29f
Show file tree

Hide file tree

Showing 17 changed files with 3,786 additions and 249 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,9 @@
+*.ckpt
+*.slow5
+*.blow5
+*.pod5
+*.tsv
+
 # Scripts
 *.sh
 

diff --git a/README.md b/README.md
@@ -12,44 +12,59 @@ Please cite the following publication if you use `seq2squiggle` in your work:
 
 ### Dependencies
 
-`seq2squiggle` runs under Python >= 3.11 and pytorch >= 2.0.1
+`seq2squiggle` requires Python >= 3.10. 
 
-### Installation with conda/mamba
-Proceed as follows to install `seq2squiggle`
+We recommend to run `seq2squiggle` in a separate conda / mamba environment. This keeps the tool and its dependencies isolated from your other Python environments.
 
 ```
-mamba env create -f envs/seq2squiggle.yml
-mamba activate seq2squiggle
+conda create -n seq2squiggle-env python=3.10
+conda activate seq2squiggle-env 
+```
+
+### Install with pip
+```
+pip install seq2squiggle 
+```
+
+### Install from source
+```
+git clone https://github.com/ZKI-PH-ImageAnalysis/seq2squiggle.git
+cd seq2squiggle
+pip install . 
 ```
 
 ### Download training data and model weights
-**Not implemented yet**
 
-When running `seq2squiggle` in prediction mode, `seq2squiggle` requires compatible pretrained model weights to make predictions. The model file can then be specified using the `--model` command-line parameter. To assist users, if no model file is specified `seq2squiggle` will try to download and use a compatible model file automatically.
+`seq2squiggle` requires compatible pretrained model weights to make predictions, which can be specified using the `--model` command-line parameter.
 
+If you do not provide a model file, `seq2squiggle` will automatically attempt to download a compatible model file to ensure predictions can be made. 
 
 ## Predict signals from FASTA file
 `seq2squiggle` simulates artificial signals based on an input FASTX file. By default, the output is in SLOW5/BLOW5 format. Exporting to the new POD5 format is also supported, though BLOW5 is preferred for its stability. You will need to specify the path to the model through the configuration file.
 
-Generate 10,000 reads from a fasta file.
+For optimal performance, running `seq2squiggle` on a GPU is recommended, especially to speed up inference. However, the tool also works on CPU-only systems, though at a slower inference speed.
+
+### Examples 
+
+Generate 10,000 reads from a fasta file:
 ```
-./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 -n 10000
+seq2squiggle predict example.fasta -o example.blow5 -n 10000
 ```
-Generate reads with a coverage of 30
+Generate reads with a coverage of 30:
 ```
-./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 -c 30
+seq2squiggle predict example.fasta -o example.blow5 -c 30
 ```
-Generate reads with a coverage of 30 and an average read length of 5,000
+Generate reads with a coverage of 30 and an average read length of 5,000:
 ```
-./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 -c 30 -r 5000
+seq2squiggle predict example.fasta -o example.blow5 -c 30 -r 5000
 ```
-Simulate signals from basecalled reads (each single read will be simulated)
+Simulate signals from basecalled reads (each single read will be simulated):
 ```
-./src/seq2squiggle/seq2squiggle.py predict example.fastq -o example.blow5 --read-input
+seq2squiggle predict example.fastq -o example.blow5 --read-input
 ```
-Export as pod5
+Export as pod5:
 ```
-./src/seq2squiggle/seq2squiggle.py predict example.fastq -o example.pod5 --read-input
+seq2squiggle predict example.fastq -o example.pod5 --read-input
 ```
 
 
@@ -58,35 +73,35 @@ Export as pod5
 `seq2squiggle` supports different options for generating the signal data.
 Per default, the noise sampler and duration sampler are used.
 
+### Examples
 
-Generate reads using both the noise sampler and duration sampler. 
+Generate reads using both the noise sampler and duration sampler: 
 ```
-./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5
+seq2squiggle predict example.fasta -o example.blow5
 ```
-
-Generate reads using the noise sampler with an increased factor and duration sampler
+Generate reads using the noise sampler with an increased factor and duration sampler:
 ```
-./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 --noise-std 1.5
+seq2squiggle predict example.fasta -o example.blow5 --noise-std 1.5
 ```
-Generate reads using a static normal distribution for the noise and duration sampler
+Generate reads using a static normal distribution for the noise and duration sampler:
 ```
-./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 --noise-std 1.5 --noise-sampling False
+seq2squiggle predict example.fasta -o example.blow5 --noise-std 1.5 --noise-sampling False
 ```
-Generate reads using only the noise sampler and a static normal distribution for the event length 
+Generate reads using only the noise sampler and a static normal distribution for the event length:
 ```
-./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length -1
+seq2squiggle predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length -1
 ```
-Generate reads using only the noise sampler and ideal event lengths 
+Generate reads using only the noise sampler and ideal event lengths:
 ```
-./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length 10.0
+seq2squiggle predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length 10.0
 ```
-Generate reads using a static normal distribution for the amplitude noise and ideal event lengths
+Generate reads using a static normal distribution for the amplitude noise and ideal event lengths:
 ```
-./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length 10.0 --noise-sampling False --noise-std 1.0
+seq2squiggle predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length 10.0 --noise-sampling False --noise-std 1.0
 ```
-Generate reads using no amplitude noise and ideal event lengths
+Generate reads using no amplitude noise and ideal event lengths:
 ```
-./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length 10.0 --noise-sampling False --noise-std -1
+seq2squiggle predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length 10.0 --noise-sampling False --noise-std -1
 ```
 
 ## Train a new model
@@ -104,9 +119,9 @@ Additionally, we use a small script to standardize the event_noise column:
 
 To preprocess and train a model from scratch:
 ```
-./src/seq2squiggle/seq2squiggle.py preprocess events.tsv train_dir --max-chunks -1 --config my_config.yml
-./src/seq2squiggle/seq2squiggle.py preprocess events_valid.tsv valid_dir --max-chunks -1 --config my_config.yml
-./src/seq2squiggle/seq2squiggle.py train train_dir valid_dir --config my_config.yml --model last.ckpt
+seq2squiggle preprocess events.tsv train_dir --max-chunks -1 --config my_config.yml
+seq2squiggle preprocess events_valid.tsv valid_dir --max-chunks -1 --config my_config.yml
+seq2squiggle train train_dir valid_dir --config my_config.yml --model last.ckpt
 ```
 
 ## Acknowledgement

diff --git a/envs/seq2squiggle.yml b/envs/seq2squiggle.yml
-Original file line number
+Diff line change
@@ -1,3 +1,9 @@
+    *.ckpt
+    *.slow5
+    *.blow5
+    *.pod5
+    *.tsv
     # Scripts
     *.sh
@@ Expand Down @@