Skip to content

Commit

Permalink
Merge pull request #27 from jbloomlab/26-miscellaneous_plates
Browse files Browse the repository at this point in the history
add `miscellaneous_plates` to count barcodes for additonal plates
  • Loading branch information
ckikawa authored Feb 15, 2024
2 parents c375109 + 0317b5a commit 2e80ff8
Show file tree
Hide file tree
Showing 58 changed files with 5,186 additions and 346 deletions.
7 changes: 5 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
# CHANGELOG

### version 2.0.1
- Update to `dms_variants` 1.5.0 (addresses [this issue](https://github.com/jbloomlab/seqneut-pipeline/issues/24))
### version 2.1.0
- Add an option to specify `miscellaneous_plates` which are plates that just have their barcodes counted (addresses [this issue](https://github.com/jbloomlab/seqneut-pipeline/issues/26)).

#### version 2.0.1
- Update to `dms_variants` 1.5.0 (addresses [this issue](https://github.com/jbloomlab/seqneut-pipeline/issues/24)).

## version 2.0.0
Full re-write that changes how configuration is specified to automatically do the QC, and uses a newer version of `neutcurve` that fits better. Completely backward-incompatible with version 1.*.
Expand Down
33 changes: 33 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,39 @@ sera_override_defaults:
The above means that for serum `M099d30` we override the `default_serum_qc_thresholds` to exclude virus ` A/Belgium/H0017/2022`, and for serum `Y044d30` we override the defaults to allow a greater fold-change from median for individual replicates, and compute the titer as `nt50`.
Anything not listed here gets handled by the defaults in `default_serum_titer_as` and `default_serum_qc_thresholds`.

### miscellaneous_plates
This is an optional key that can be used specify plates that you just want to count barcodes for, and then analyze those counts outside the main pipeline.
This might be useful for library pooling or QC, for instance---or if you want to look at some failed plates that you don't actually want to fit curves for.

If you do not want to specify any miscellaneous plates either leave this key out or set it to an empty dictionary (`{}`).

The key should look like this:

```
miscellaneous_plates:
<plate_name_1>:
date: <date>
viral_library: <viral library>
neut_standard_set: <standard set>
samples_csv: <filename>
<plate_name_2>:
...
```

The plate name is just the name assigned to the plate.
The `date`, `viral_library`, and `neut_standard_set` keys have the same meaning as for the plates specified under `plates`.

The `samples_csv` should specify the samples to analyze in a CSV that has columns named "well" and "fastq", and optionally other columns as well.

The output is that for each plate, the following files are created:

- `results/miscellaneous_plates/<plate_name>/<well>_counts.csv`: counts of each viral barcode in that well of that plate.
- `results/miscellaneous_plates/<plate_name>/<well>_invalid.csv`: counts of each invalid barcode in that well of that plate.
- `results/miscellaneous_plates/<plate_name>/<well>_fates.csv`: summarizing number of reads that are valid and various types of invalid for each well of that plate.


## Results of running the pipeline
The results of running the pipeline are put in the `./results/` subdirectory of your main repo.
We recommend using the `.gitignore` file in [./test_example/.gitignore] in your main repo to only track key results in your GitHub repo.
Expand Down
80 changes: 60 additions & 20 deletions docs/M099d0_titers.html

Large diffs are not rendered by default.

74 changes: 57 additions & 17 deletions docs/M099d30_titers.html

Large diffs are not rendered by default.

74 changes: 57 additions & 17 deletions docs/Y044d30_titers.html

Large diffs are not rendered by default.

80 changes: 60 additions & 20 deletions docs/Y154d182_titers.html

Large diffs are not rendered by default.

90 changes: 62 additions & 28 deletions docs/aggregate_qc_drops.html

Large diffs are not rendered by default.

264 changes: 165 additions & 99 deletions docs/process_plate11.html

Large diffs are not rendered by default.

272 changes: 177 additions & 95 deletions docs/process_plate2.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/titers.html

Large diffs are not rendered by default.

23 changes: 23 additions & 0 deletions funcs.smk
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,29 @@ import functools
import os


def process_miscellaneous_plates(misc_plates_d):
"""Process the dictionary of miscellaneous_plates."""
misc_plates = {}
req_keys = {"viral_library", "neut_standard_set", "samples_csv"}
for plate, plate_dict in misc_plates_d.items():
misc_plates[plate] = {}
if not req_keys.issubset(plate_dict):
raise ValueError(f"miscellaneous_plate {plate} lacks {req_keys=}")
misc_plates[plate]["viral_library"] = plate_dict["viral_library"]
misc_plates[plate]["neut_standard_set"] = plate_dict["neut_standard_set"]
samples = pd.read_csv(plate_dict["samples_csv"])
if not {"well", "fastq"}.issubset(samples.columns):
raise ValueError(
f"{plate_dict['samples_csv']} lacks columns 'well', 'fastq'"
)
if len(samples) != samples["well"].nunique():
raise ValueError(
f"{plate_dict['samples_csv']} has non-unique entries in 'well' column"
)
misc_plates[plate]["wells"] = samples.set_index("well")["fastq"].to_dict()
return misc_plates


def process_plate(plate, plate_params):
"""Process a plot from the configuration."""

Expand Down
35 changes: 35 additions & 0 deletions seqneut-pipeline.smk
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,11 @@ samples = pd.concat(
assert samples["sample"].nunique() == samples["fastq"].nunique() == len(samples)
samples = samples.set_index("sample").to_dict(orient="index")

if "miscellaneous_plates" in config:
miscellaneous_plates = process_miscellaneous_plates(config["miscellaneous_plates"])
else:
miscellaneous_plates = {}


# --- Snakemake rules -------------------------------------------------------------------

Expand Down Expand Up @@ -247,10 +252,40 @@ rule build_docs:
"scripts/build_docs.py"


rule miscellaneous_plate_count_barcodes:
"""Count barcodes for a well in a miscellaneous plate."""
input:
fastq=lambda wc: miscellaneous_plates[wc.misc_plate]["wells"][wc.well],
viral_library=lambda wc: viral_libraries[
miscellaneous_plates[wc.misc_plate]["viral_library"]
],
neut_standard_set=lambda wc: neut_standard_sets[
miscellaneous_plates[wc.misc_plate]["neut_standard_set"]
],
output:
counts="results/miscellaneous_plates/{misc_plate}/{well}_counts.csv",
invalid="results/miscellaneous_plates/{misc_plate}/{well}_invalid.csv",
fates="results/miscellaneous_plates/{misc_plate}/{well}_fates.csv",
params:
illumina_barcode_parser_params=config["illumina_barcode_parser_params"],
conda:
"envs/count_barcodes.yml"
log:
"results/logs/miscellaneous_plate_count_barcodes_{misc_plate}_{well}.txt",
script:
"scripts/count_barcodes.py"


seqneut_pipeline_outputs = [
rules.aggregate_titers.output.titers,
rules.aggregate_titers.output.pickle,
rules.aggregate_qc_drops.output.plate_qc_drops,
rules.aggregate_qc_drops.output.sera_qc_drops,
rules.build_docs.output.docs,
*[
f"results/miscellaneous_plates/{plate}/{well}_{suffix}"
for plate in miscellaneous_plates
for well in miscellaneous_plates[plate]["wells"]
for suffix in ["counts.csv", "invalid.csv", "fates.csv"]
],
]
4 changes: 4 additions & 0 deletions test_example/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,7 @@ results/aggregated_titers/*
results/qc_drops/*
!results/qc_drops/plate_qc_drops.yml
!results/qc_drops/sera_qc_drops.yml

!results/miscellaneous_plates
results/miscellaneous_plates/*/*
!results/miscellaneous_plates/*/*counts.csv
7 changes: 7 additions & 0 deletions test_example/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -114,3 +114,10 @@ sera_override_defaults:
<<: *default_serum_qc_thresholds
max_fold_change_from_median: 4
titer_as: nt50

miscellaneous_plates:
random_plate_1:
date: 2023-08-01
viral_library: pdmH1N1_lib2023_loes
neut_standard_set: loes2023
samples_csv: data/miscellaneous_plates/random_plate_1.csv
37 changes: 37 additions & 0 deletions test_example/data/miscellaneous_plates/random_plate_1.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
well,fastq
B1,fastqs/Plate2_Noserum2_S98_R1_001.fastq.gz
B2,fastqs/M099_d30_conc1_S106_R1_001.fastq.gz
B3,fastqs/M099_d30_conc2_S114_R1_001.fastq.gz
B4,fastqs/M099_d30_conc3_S122_R1_001.fastq.gz
B5,fastqs/M099_d30_conc4_S130_R1_001.fastq.gz
B6,fastqs/M099_d30_conc5_S138_R1_001.fastq.gz
B7,fastqs/M099_d30_conc6_S146_R1_001.fastq.gz
B8,fastqs/M099_d30_conc7_S154_R1_001.fastq.gz
B9,fastqs/M099_d30_conc8_S162_R1_001.fastq.gz
B10,fastqs/M099_d30_conc9_S170_R1_001.fastq.gz
B11,fastqs/M099_d30_conc10_S178_R1_001.fastq.gz
B12,fastqs/Plate2_Noserum10_S186_R1_001.fastq.gz
C1,fastqs/Plate2_Noserum3_S99_R1_001.fastq.gz
C2,fastqs/M099_d0_conc1_S107_R1_001.fastq.gz
C3,fastqs/M099_d0_conc2_S115_R1_001.fastq.gz
C4,fastqs/M099_d0_conc3_S123_R1_001.fastq.gz
C5,fastqs/M099_d0_conc4_S131_R1_001.fastq.gz
C6,fastqs/M099_d0_conc5_S139_R1_001.fastq.gz
C7,fastqs/M099_d0_conc6_S147_R1_001.fastq.gz
C8,fastqs/M099_d0_conc7_S155_R1_001.fastq.gz
C9,fastqs/M099_d0_conc8_S163_R1_001.fastq.gz
C10,fastqs/M099_d0_conc9_S171_R1_001.fastq.gz
C11,fastqs/M099_d0_conc10_S179_R1_001.fastq.gz
C12,fastqs/Plate2_Noserum11_S187_R1_001.fastq.gz
D1,fastqs/Plate2_Noserum4_S100_R1_001.fastq.gz
D2,fastqs/Y154_d182_conc1_S108_R1_001.fastq.gz
D3,fastqs/Y154_d182_conc2_S116_R1_001.fastq.gz
D4,fastqs/Y154_d182_conc3_S124_R1_001.fastq.gz
D5,fastqs/Y154_d182_conc4_S132_R1_001.fastq.gz
D6,fastqs/Y154_d182_conc5_S140_R1_001.fastq.gz
D7,fastqs/Y154_d182_conc6_S148_R1_001.fastq.gz
D8,fastqs/Y154_d182_conc7_S156_R1_001.fastq.gz
D9,fastqs/Y154_d182_conc8_S164_R1_001.fastq.gz
D10,fastqs/Y154_d182_conc9_S172_R1_001.fastq.gz
D11,fastqs/Y154_d182_conc10_S180_R1_001.fastq.gz
D12,fastqs/Plate2_Noserum12_S188_R1_001.fastq.gz
22 changes: 11 additions & 11 deletions test_example/results/aggregated_titers/titers.csv
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ serum,virus,titer,titer_bound,titer_sem,n_replicates,titer_as
M099d0,A/Bangladesh/2221/2021,217,interpolated,72.28,6,midpoint
M099d0,A/Bangladesh/3210810034/2021,198.1,interpolated,88.9,4,midpoint
M099d0,A/Bangladesh/8002/2021,269.4,interpolated,95.3,6,midpoint
M099d0,A/Bangladesh/8036/2021,296.7,interpolated,59.05,6,midpoint
M099d0,A/Bangladesh/8036/2021,292.7,interpolated,59.07,6,midpoint
M099d0,A/Belgium/H0017/2022,615,interpolated,31.96,6,midpoint
M099d0,A/Belgium/H0038/2022,863.8,interpolated,185.5,6,midpoint
M099d0,A/Brisbane/02/2018,636.5,interpolated,117.3,7,midpoint
Expand All @@ -19,18 +19,18 @@ M099d0,A/India-Pune-Nivcov2221170/2022,190.9,interpolated,11.18,4,midpoint
M099d0,A/India/Pun-NIV312851/2021,217.1,interpolated,57.87,6,midpoint
M099d0,A/Michigan/19/2021,369.9,interpolated,103.9,6,midpoint
M099d0,A/Michigan/45/2015,864.2,interpolated,110.5,8,midpoint
M099d0,A/Newcastle/2/2022,295.2,interpolated,58.12,6,midpoint
M099d0,A/Newcastle/2/2022,290.7,interpolated,58.56,6,midpoint
M099d0,A/Niger/10217/2021,692.2,interpolated,73.26,6,midpoint
M099d0,A/Nimes/871/2021,925.3,interpolated,211.2,6,midpoint
M099d0,A/Paris/30353/2021,618.1,interpolated,57.68,6,midpoint
M099d0,A/Paris/30353/2021,682.7,interpolated,59.3,6,midpoint
M099d0,A/Paris/31196/2021,879,interpolated,120.1,6,midpoint
M099d0,A/Perth/1/2022,350.7,interpolated,71.92,6,midpoint
M099d0,A/SouthAfrica/R14850/2021,431.9,interpolated,62.91,6,midpoint
M099d0,A/SouthAfrica/R16462/2021,754,interpolated,173.7,6,midpoint
M099d0,A/Sydney/43/2022,207.3,interpolated,16.18,5,midpoint
M099d0,A/Togo/0274/2021,770.9,interpolated,102.7,6,midpoint
M099d0,A/Togo/0274/2021,770.9,interpolated,104.3,6,midpoint
M099d0,A/Togo/0304/2021,895.7,interpolated,125.2,6,midpoint
M099d0,A/Togo/845/2020,852.5,interpolated,156.9,6,midpoint
M099d0,A/Togo/845/2020,852.5,interpolated,159.2,6,midpoint
M099d0,A/Utah/27/2022,204.8,interpolated,42.8,6,midpoint
M099d0,A/Washington/23/2020,403.6,interpolated,78.43,6,midpoint
M099d0,A/Wisconsin/588/2019,210.8,interpolated,68.73,5,midpoint
Expand Down Expand Up @@ -119,22 +119,22 @@ Y154d182,A/Ghana/2080/2020,2655,interpolated,306.7,3,midpoint
Y154d182,A/Hawaii/70/2019,3738,interpolated,509.2,4,midpoint
Y154d182,A/India-PUN-NIV328484/2021,273,interpolated,52.05,3,midpoint
Y154d182,A/India-Pune-Nivcov2221170/2022,175.3,interpolated,2.339,2,midpoint
Y154d182,A/India/Pun-NIV312851/2021,227.1,interpolated,73.54,3,midpoint
Y154d182,A/Michigan/19/2021,209.6,interpolated,68.03,3,midpoint
Y154d182,A/India/Pun-NIV312851/2021,235.1,interpolated,72.44,3,midpoint
Y154d182,A/Michigan/19/2021,209.6,interpolated,68.42,3,midpoint
Y154d182,A/Michigan/45/2015,6241,interpolated,2602,4,midpoint
Y154d182,A/Newcastle/2/2022,215.5,interpolated,37.81,3,midpoint
Y154d182,A/Niger/10217/2021,2855,interpolated,479,3,midpoint
Y154d182,A/Nimes/871/2021,3396,interpolated,590.4,3,midpoint
Y154d182,A/Norway/25089/2022,206.1,interpolated,61.5,3,midpoint
Y154d182,A/Norway/25089/2022,197.6,interpolated,62.76,3,midpoint
Y154d182,A/Paris/30353/2021,1877,interpolated,196.6,3,midpoint
Y154d182,A/Paris/31196/2021,4817,interpolated,1565,3,midpoint
Y154d182,A/Perth/1/2022,277.1,interpolated,32.92,3,midpoint
Y154d182,A/Perth/1/2022,272.6,interpolated,23.99,3,midpoint
Y154d182,A/SouthAfrica/R14850/2021,199.1,interpolated,4.83,3,midpoint
Y154d182,A/SouthAfrica/R16462/2021,3079,interpolated,93.92,3,midpoint
Y154d182,A/Sydney/43/2022,228.7,interpolated,40.81,3,midpoint
Y154d182,A/Sydney/43/2022,228.7,interpolated,40.82,3,midpoint
Y154d182,A/Togo/0274/2021,3465,interpolated,553.3,3,midpoint
Y154d182,A/Togo/0304/2021,3515,interpolated,458.7,3,midpoint
Y154d182,A/Togo/845/2020,1788,interpolated,284.9,3,midpoint
Y154d182,A/Utah/27/2022,227.5,interpolated,15.43,3,midpoint
Y154d182,A/Utah/27/2022,257.4,interpolated,17.18,3,midpoint
Y154d182,A/Washington/23/2020,623.5,interpolated,17.6,3,midpoint
Y154d182,A/Wisconsin/588/2019,1094,interpolated,262.2,3,midpoint
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
barcode,count
CGTTTAAACAATGAAG,1430
AGTGTCCCTAAGAGGC,951
TGAGGATAATCACAAG,737
CTGCACGAGAGACTTC,730
AACTATAGATCTAGAA,603
AAATAAGTACGCAAAT,601
GAAGAATGGTTTTCTG,593
GATCCGTACTTTGATT,557
GTTTGACAATCACTAC,545
TACGGACATTCTTAAC,540
TAATAAGCCAGCAAGA,536
TAAGCCATAAATCAAT,522
ATACCTCAACCTTGAA,509
GTCCGTTGATAAAGAG,498
ACAAAGTCTCGAGAAG,494
CGCCTAATGTTAATAA,484
ACAAAAGTACCTCTAC,475
TATATTAGTAACATAA,469
CCAGTTCCCTTCGATG,466
AGGTCAAGACCACAGG,460
AGCAGCCTGAAAATAT,455
GCAACGCCAAATAATT,448
GAAGAAACTATAACCA,442
GCATGGATCCTTTACT,434
TTGGGGAAATATATAA,427
CACCTAGGATCGCACT,411
CCTTTCTCAAAACATA,408
TCATATAAAGAAAAGG,406
GTAACATTATACGATT,404
ACAATGTGACTCACCC,393
TTGGGCACTAAATTAA,388
CCTCAAAATAACAAGC,381
AAACTAAAAGCAAGGG,378
GATCTAATAATACGGC,366
AGAGAAAAAACAGTGA,363
CAGCAAAAGCATCACC,359
GAACGATTGTAATTTT,350
GGATAAGAAAACTACT,344
CATCAACCGCCATTTC,343
CATGTGAATTCGCCCA,341
AACGTTAACAAATGAA,335
TCGAGAACACCCATAA,335
AGCCCCGTGAGAAGCA,331
GATAGAAATACCAGGA,331
AAGAGAAATATTCGCT,330
GACTATGGTCTAAAAA,328
ACATAAGAACCCTATA,322
ACCTTACTAAATCCTG,320
GTAAAGCAAATCCATT,320
GTTAAACGATCTATAG,314
GACACAGAACCCATGC,306
GTAGTGCATCATTGAT,302
TAGCAGATGTATCAAT,300
TGGAATCGTCACCGAT,300
ACGACATGATCAAACG,297
CGTAATACATTTAAGA,290
AACTCCGCAGACACTG,284
CAAATATATCTTCATG,284
GTACAAACCTGCAAAT,275
ATCCGATTTAAAGGCA,274
TACTCAACAAGATAAA,256
GACCAACTGTGGTACA,251
ACATGAATTCAGACGG,250
CTAGCAGATTGTATAA,248
GTTCCTTTAAGCCAAA,248
TTCACTAAGATTTCAT,248
ACATTTCCCTCGATAT,245
GTAGCTATAACTAATA,244
TTGAAAAAATCATAAA,241
ACGGGGCCCAGGTAAT,238
AAAAAACGCATGTAGA,236
GAGCTCTAAAGCAACA,235
TTGTCCCGAGACAACA,235
TACCGTATAATTAAAA,234
ACAGCAAATACTCACA,233
TAATGAGCTTTATGGT,230
TCTGTTCCGGCCCGAA,229
AAATCTACCGCATTAT,228
GACAGCAATGCATACA,226
CACCATTAAAGAGGTA,222
AGTAAACATGCATTGG,219
AACTCTAAAGATATAA,218
CAATTAGAAATACATA,214
TTATGATCTAAACAGA,214
AGAGATAATAACAAAA,211
GCCGGAGGGCATTTTC,208
TCCTTGTAATTCAACT,206
GAACTGGCGTCAATCA,204
GCCCAAGTAGGTGCAA,204
CAAGAACCCAAATATA,197
AATGAAAGTTAGCATT,195
GAGACGTACGAAATTA,194
ATAGAATCGCAAATTA,193
CCCTATGCTGCGTATT,190
TAGTATAATAGAGCAG,186
GACTCAATAATCACAC,183
AAAAAATTTATGACAA,178
TAATTACATTGCGGTT,173
AACCACCGAGTGACCG,166
CTATTAATCATGCAAA,160
CGGCGTATCGTTCACA,157
GGTCCATCTCAGATCG,157
TACCCTGCAAGCCACT,150
ATGTCCATAAAAAATA,146
TGGAAAAGATGTAATA,138
TTACCGTCTACGCATA,138
AACGAATGAATTTCTT,123
TAAATAACTCGTATTT,118
TCTGCTAAACTAAGTA,111
TTATCTGTAGAGCGCT,111
CGGATAAAAATGATAT,109
AACGACAAACAGTAAG,88
AATAAGTATACGGGAT,85
AGTCCTATCCTCAAAT,84
GCAATCCCGCAATTTG,73
GCTGCGCCTAACATAA,73
ACGGAATCCCCTGAGA,53
CAGTTCTGCGACCAGC,48
CTTTAAATTATAGTCT,35
CATACAGAGTTTGTTG,18
Loading

0 comments on commit 2e80ff8

Please sign in to comment.