Skip to content

Commit

Permalink
Merge pull request #3 from ZKI-PH-ImageAnalysis/dev
Browse files Browse the repository at this point in the history
v0.2.0
  • Loading branch information
denisbeslic authored Aug 16, 2024
2 parents 5a7974e + 38ba194 commit 09df29f
Show file tree
Hide file tree
Showing 17 changed files with 3,786 additions and 249 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
*.ckpt
*.slow5
*.blow5
*.pod5
*.tsv

# Scripts
*.sh

Expand Down
85 changes: 50 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,44 +12,59 @@ Please cite the following publication if you use `seq2squiggle` in your work:

### Dependencies

`seq2squiggle` runs under Python >= 3.11 and pytorch >= 2.0.1
`seq2squiggle` requires Python >= 3.10.

### Installation with conda/mamba
Proceed as follows to install `seq2squiggle`
We recommend to run `seq2squiggle` in a separate conda / mamba environment. This keeps the tool and its dependencies isolated from your other Python environments.

```
mamba env create -f envs/seq2squiggle.yml
mamba activate seq2squiggle
conda create -n seq2squiggle-env python=3.10
conda activate seq2squiggle-env
```

### Install with pip
```
pip install seq2squiggle
```

### Install from source
```
git clone https://github.com/ZKI-PH-ImageAnalysis/seq2squiggle.git
cd seq2squiggle
pip install .
```

### Download training data and model weights
**Not implemented yet**

When running `seq2squiggle` in prediction mode, `seq2squiggle` requires compatible pretrained model weights to make predictions. The model file can then be specified using the `--model` command-line parameter. To assist users, if no model file is specified `seq2squiggle` will try to download and use a compatible model file automatically.
`seq2squiggle` requires compatible pretrained model weights to make predictions, which can be specified using the `--model` command-line parameter.

If you do not provide a model file, `seq2squiggle` will automatically attempt to download a compatible model file to ensure predictions can be made.

## Predict signals from FASTA file
`seq2squiggle` simulates artificial signals based on an input FASTX file. By default, the output is in SLOW5/BLOW5 format. Exporting to the new POD5 format is also supported, though BLOW5 is preferred for its stability. You will need to specify the path to the model through the configuration file.

Generate 10,000 reads from a fasta file.
For optimal performance, running `seq2squiggle` on a GPU is recommended, especially to speed up inference. However, the tool also works on CPU-only systems, though at a slower inference speed.

### Examples

Generate 10,000 reads from a fasta file:
```
./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 -n 10000
seq2squiggle predict example.fasta -o example.blow5 -n 10000
```
Generate reads with a coverage of 30
Generate reads with a coverage of 30:
```
./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 -c 30
seq2squiggle predict example.fasta -o example.blow5 -c 30
```
Generate reads with a coverage of 30 and an average read length of 5,000
Generate reads with a coverage of 30 and an average read length of 5,000:
```
./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 -c 30 -r 5000
seq2squiggle predict example.fasta -o example.blow5 -c 30 -r 5000
```
Simulate signals from basecalled reads (each single read will be simulated)
Simulate signals from basecalled reads (each single read will be simulated):
```
./src/seq2squiggle/seq2squiggle.py predict example.fastq -o example.blow5 --read-input
seq2squiggle predict example.fastq -o example.blow5 --read-input
```
Export as pod5
Export as pod5:
```
./src/seq2squiggle/seq2squiggle.py predict example.fastq -o example.pod5 --read-input
seq2squiggle predict example.fastq -o example.pod5 --read-input
```


Expand All @@ -58,35 +73,35 @@ Export as pod5
`seq2squiggle` supports different options for generating the signal data.
Per default, the noise sampler and duration sampler are used.

### Examples

Generate reads using both the noise sampler and duration sampler.
Generate reads using both the noise sampler and duration sampler:
```
./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5
seq2squiggle predict example.fasta -o example.blow5
```

Generate reads using the noise sampler with an increased factor and duration sampler
Generate reads using the noise sampler with an increased factor and duration sampler:
```
./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 --noise-std 1.5
seq2squiggle predict example.fasta -o example.blow5 --noise-std 1.5
```
Generate reads using a static normal distribution for the noise and duration sampler
Generate reads using a static normal distribution for the noise and duration sampler:
```
./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 --noise-std 1.5 --noise-sampling False
seq2squiggle predict example.fasta -o example.blow5 --noise-std 1.5 --noise-sampling False
```
Generate reads using only the noise sampler and a static normal distribution for the event length
Generate reads using only the noise sampler and a static normal distribution for the event length:
```
./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length -1
seq2squiggle predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length -1
```
Generate reads using only the noise sampler and ideal event lengths
Generate reads using only the noise sampler and ideal event lengths:
```
./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length 10.0
seq2squiggle predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length 10.0
```
Generate reads using a static normal distribution for the amplitude noise and ideal event lengths
Generate reads using a static normal distribution for the amplitude noise and ideal event lengths:
```
./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length 10.0 --noise-sampling False --noise-std 1.0
seq2squiggle predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length 10.0 --noise-sampling False --noise-std 1.0
```
Generate reads using no amplitude noise and ideal event lengths
Generate reads using no amplitude noise and ideal event lengths:
```
./src/seq2squiggle/seq2squiggle.py predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length 10.0 --noise-sampling False --noise-std -1
seq2squiggle predict example.fasta -o example.blow5 --duration-sampling False --ideal-event-length 10.0 --noise-sampling False --noise-std -1
```

## Train a new model
Expand All @@ -104,9 +119,9 @@ Additionally, we use a small script to standardize the event_noise column:

To preprocess and train a model from scratch:
```
./src/seq2squiggle/seq2squiggle.py preprocess events.tsv train_dir --max-chunks -1 --config my_config.yml
./src/seq2squiggle/seq2squiggle.py preprocess events_valid.tsv valid_dir --max-chunks -1 --config my_config.yml
./src/seq2squiggle/seq2squiggle.py train train_dir valid_dir --config my_config.yml --model last.ckpt
seq2squiggle preprocess events.tsv train_dir --max-chunks -1 --config my_config.yml
seq2squiggle preprocess events_valid.tsv valid_dir --max-chunks -1 --config my_config.yml
seq2squiggle train train_dir valid_dir --config my_config.yml --model last.ckpt
```

## Acknowledgement
Expand Down
131 changes: 0 additions & 131 deletions envs/seq2squiggle.yml

This file was deleted.

Loading

0 comments on commit 09df29f

Please sign in to comment.