-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update #20 kmeans and automated elbow method (with option to pre-defi…
…ne which cluster for speed)
- Loading branch information
Isaac Shamie
committed
Nov 1, 2020
1 parent
89eb0b1
commit bcf8449
Showing
28 changed files
with
709 additions
and
157 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+4 KB
Analysis/simulations/binomial_growth/results/sweep_v02/._growth_before_after.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
## Introduction | ||
In order to determine if we have a sufficient number of reads, cells, and power to detect CHIP expansion, we generated simulated data based on mitochondrial mutation rate, cell division rates, and clone sizes. | ||
We hope that this analysis will help answer two questions: | ||
a) Is the mutation rate of the MTs high enough to detect different clones? | ||
b) Can we properly distinguish clonal expansion in certain dominant CHIP clones from noise that may occur in the non-CHIP donor samples (unless other clones do expand, in that case we would want to learn about that as well). | ||
|
||
|
||
## Methods | ||
|
||
|
||
## Results | ||
|
||
|
||
## Future Directions |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
""" | ||
This script auto-generates the snakemake workflow docs. | ||
Adapted from https://github.com/vanheeringen-lab/seq2science/blob/master/docs/scripts/rule_description.py | ||
""" | ||
import yaml | ||
import os | ||
import re | ||
|
||
|
||
from src.config import ROOT_DIR | ||
os.chdir(ROOT_DIR) | ||
|
||
final_md = ( | ||
"""\ | ||
# Per rule explanation | ||
This is an automatically generated list of all supported rules, their docstrings, and command. At the start of each \ | ||
workflow run a list is printed of which rules will be run. And while the workflow is running it prints which rules are \ | ||
being started and finished. This page is here to give an explanation to the user about what each rule does, and for \ | ||
developers to find what is, and isn't yet supported. | ||
""" | ||
) | ||
|
||
path = "workflows/" | ||
rules_file = "simulation.snakefile" | ||
|
||
def get_dirty_docstrings(string): | ||
splitter = re.compile("rule (.*):[\s\S]*?\"\"\"([\s\S]*?)\"\"\"", re.MULTILINE) | ||
docstrings = {} | ||
for match in splitter.finditer(string): | ||
docstrings[match.group(1)] = match.group(2) | ||
return docstrings | ||
|
||
|
||
def cleanup_docstring(dirty): | ||
clean = {} | ||
for rule, docstring in dirty.items(): | ||
firstline = docstring.split("\n")[1] | ||
|
||
indentation = len(firstline) - len(firstline.lstrip()) | ||
docstring = docstring.replace(" " * indentation, "") | ||
docstring = docstring.replace(" " * (indentation - 4), "") | ||
docstring = docstring.strip("\n") | ||
clean[rule] = docstring | ||
|
||
return clean | ||
|
||
|
||
def cleanup_shell(dirty): | ||
clean = {} | ||
for rule, shell in dirty.items(): | ||
firstline = shell.split("\n")[1] | ||
|
||
indentation = len(firstline) - len(firstline.lstrip()) | ||
docstring = "\n".join([shell_line.replace(" " * indentation, "", 1) for shell_line in shell.split("\n")]) | ||
docstring = docstring.strip("\n") | ||
clean[rule] = docstring | ||
|
||
return clean | ||
|
||
|
||
def get_dirty_shell(string): | ||
splitter = re.compile("rule (.*):[\s\S]*?shell:[\s\S]*?\"\"\"[\s\S]([\s\S]*?)\"\"\"", re.MULTILINE) | ||
shell_cmds = {} | ||
for substring in string.split("\n\n\n"): | ||
for match in splitter.finditer(substring): | ||
shell_cmds[match.group(1)] = match.group(2) | ||
return shell_cmds | ||
|
||
|
||
all_rules_doc = {} | ||
all_rules_shell = {} | ||
#for rules_file in os.listdir(path): | ||
|
||
with open(path + rules_file, 'r') as file: | ||
text = file.read() | ||
shell_cmd = cleanup_shell(get_dirty_shell(text)) | ||
all_rules_shell.update(shell_cmd) | ||
|
||
docstrings = cleanup_docstring(get_dirty_docstrings(text)) | ||
all_rules_doc.update(docstrings) | ||
|
||
for rule in sorted(all_rules_doc.keys()): | ||
docstring = all_rules_doc[rule] | ||
|
||
final_md += f"#### {rule}\n" | ||
final_md += f"{docstring}\n" | ||
if rule in all_rules_shell: | ||
final_md += "```\n" | ||
final_md += f"{all_rules_shell[rule]}\n" | ||
final_md += "```\n" | ||
final_md += f"\n" | ||
|
||
with open("docs/workflow/all_rules.md", "w") as text_file: | ||
text_file.write(final_md) |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Per rule explanation | ||
|
||
This is an automatically generated list of all supported rules, their docstrings, and command. At the start of each workflow run a list is printed of which rules will be run. And while the workflow is running it prints which rules are being started and finished. This page is here to give an explanation to the user about what each rule does, and for developers to find what is, and isn't yet supported. | ||
|
||
#### all | ||
Call variants using one of a few types. | ||
|
||
:type type = {'thresh', 'GMM', 'mgatk'} | ||
Modes: | ||
thresh: Sets a simple threshold for number of cells and reads a | ||
variant needs to be called a variant. | ||
GMM: Gaussian mixture model for quality of the variants. | ||
mgatk: This will use strand concordance, allele quality, and | ||
number of reads to determine the correct threshold. This only | ||
works information on aligned read strands in scATAC. | ||
|
||
Include a model for known variants, such as transversion | ||
|
||
#### model_call_lineages | ||
Looks at performance for cell-clone relationship | ||
and MTvariant-clone relationship | ||
|
||
If a tree and not cluster labels are given, will assume MT lineages | ||
are based on variants, | ||
|
||
#### model_lineage_growth_estimate | ||
Will estimate the growth rate based on the number of cells in | ||
the before and after experiments. | ||
|
||
#### performance_growth_estimate | ||
Compare the growth of the largest clones between the disease and | ||
non-disease samples. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
bam_f: /data2/isshamie/mito_lineage/data/processed/2020_09_07_Croker_mito/CD34_mt_fiveprime/outs/possorted_genome_bam.bam | ||
bam_dir: /data2/isshamie/mito_lineage/data/processed/2020_09_07_Croker_mito/CD34_mt_fiveprime/outs/ | ||
outdir: results/ | ||
feature_barcodes: /data2/isshamie/mito_lineage/data/processed/2020_09_07_Croker_mito/CD34_mt_fiveprime/outs/filtered_feature_bc_matrix/barcodes.tsv |
Oops, something went wrong.