Skip to content

Commit

Permalink
refine oversized seed penalty
Browse files Browse the repository at this point in the history
  • Loading branch information
jluebeck committed May 24, 2023
1 parent cf0f61d commit 0a8a2ff
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 11 deletions.
2 changes: 1 addition & 1 deletion PrepareAA.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
import check_reference
import cnv_prefilter

__version__ = "0.1537.1"
__version__ = "0.1537.2"

PY3_PATH = "python3" # updated by command-line arg if specified
metadata_dict = {} # stores the run metadata (bioinformatic metadata)
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Performs preliminary steps (alignment, seed detection, & seed filtering) require

AmpliconSuite-pipeline supports hg19, GRCh37, GRCh38 (hg38), and mouse genome mm10 (GRCm38). The tool also supports analysis with a human-viral hybrid reference genome we provide, "GRCh38_viral", which can be used to detect oncoviral hybrid focal amplifications and ecDNA in cancers with oncoviral infections.

**Current version: 0.1537.1**
**Current version: 0.1537.2**

[comment]: # (Versioning based on major_version.days_since_initial_commit.minor_version. Initial commit: March 5th, 2019)

Expand Down
18 changes: 9 additions & 9 deletions cnv_prefilter.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,23 +159,23 @@ def prefilter_bed(bedfile, ref, centromere_dict, chr_sizes, cngain, outdir):
init_cns = arm2cns[a]
med_cn = compute_cn_median(init_cns, arm2lens[a])
for x in init_cns:
# ignore seeds over 30 Mbp
long_seed_region_penalty_mult = 1.
# ignore CN segments over 30 Mbp
if x[2] - x[1] > 30000000:
continue

ccg = cngain
# penalize segments over 20 Mbp
elif x[2] - x[1] > 20000000:
long_seed_region_penalty_mult = 2.0

continuous_high_hits = continuous_high_region_ivald[x[0]][x[1]:x[2]]
if continuous_high_hits:
long_seed_region_penalty_mult = 1.
for y in continuous_high_hits:
if y.end - y.begin > 20000000:
long_seed_region_penalty_mult = max(3.0, long_seed_region_penalty_mult)

elif y.end - y.begin > 10000000:
# penalize seeds that overlap a high-CN region of 10 Mbp or more
if y.end - y.begin > 10000000:
long_seed_region_penalty_mult = max(1.5, long_seed_region_penalty_mult)

ccg*=long_seed_region_penalty_mult

ccg = cngain * long_seed_region_penalty_mult
if x[3] > med_cn + ccg - 2:
cn_filt_entries.append(x)

Expand Down

0 comments on commit 0a8a2ff

Please sign in to comment.