-
Notifications
You must be signed in to change notification settings - Fork 2
/
commands.txt
61 lines (44 loc) · 1.65 KB
/
commands.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
## Commands
This section lists command(s) run by sequenza workflow
* Running sequenza
Sequenza produces the most likely allele-specific copy numbers for given values of B-allele frequency and depth ratio
Preprocessing:
```
Rscript PREPROCESS_SCRIPT -s VARSCAN_SNP_FILE -c VARSCAN_CNV_FILE -y TRUE -p PREFIX
```
Prepearing data file using Varscan results:
```
set -euo pipefail
Rscript SEQUENZA_SCRIPT -s SEQZ_FILE -r REFERENCE -z GENOME_SIZE
-w WINDOW_SIZE
-g GAMMA
-p PREFIX
-l PLOIDY_FILE (Optional)
-f FEMALE_FLAG (Optional)
-t CANCER_TYPE (Optional)
-n MIN_READS_NORMAL (Optional)
-a MIN_READS_BAF (OPtional)
zip -qr PREFIX_results.zip sol* PREFIX*
```
Running analysis:
```
...
In this section Sequenza runs for a range of gamma values (fragment shown):
cellularity = []
ploidy = []
no_segments = []
for g in gammas:
print(g)
solutions = pd.read_table(os.path.join("gammas", g, "~{prefix}" + '_alternative_solutions.txt'))
row = solutions.loc[solutions['SLPP'].idxmax()]
cellularity.append(float(row['cellularity']))
ploidy.append(float(row['ploidy']))
path_seg = os.path.join("gammas", g, "~{prefix}" + '_Total_CN.seg')
no_segments.append(len(open(path_seg).readlines()) - 1)
gamma_solutions = pd.DataFrame({"gamma": gammas,
"cellularity": cellularity,
"ploidy": ploidy,
"no_segments": no_segments})
gamma_solutions.to_csv('gamma_solutions.csv', index=False)
...
```