Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chimeric_alignments error during infer_breakpoint_graph #7

Open
Tina04021997 opened this issue Apr 15, 2024 · 8 comments
Open

chimeric_alignments error during infer_breakpoint_graph #7

Tina04021997 opened this issue Apr 15, 2024 · 8 comments

Comments

@Tina04021997
Copy link

Thank you all for developing CoRAL. I have a question regarding the error message during infer_breakpoint_graph step and wonder if there are any suggestions you could kindly have:

Error message:
Traceback (most recent call last):
File "/tscc/nfs/home/tiy002/software/CoRAL/src/infer_breakpoint_graph.py", line 1637, in
b2bn.find_amplicon_intervals()
File "/tscc/nfs/home/tiy002/software/CoRAL/src/infer_breakpoint_graph.py", line 219, in find_amplicon_intervals
self.find_interval_i(ai, ccid)
File "/tscc/nfs/home/tiy002/software/CoRAL/src/infer_breakpoint_graph.py", line 354, in find_interval_i
if i in self.chimeric_alignments_seg[chr]:
KeyError: 'chr1'

The last few lines of the log file:
[root:DEBUG] #TIME 332.1212 Updated amplicon interval ['chr12', 56855263, 57175262, -1]
[root:DEBUG] #TIME 332.1212 Updated amplicon interval ['chr12', 57555264, 57950263, -1]
[root:DEBUG] #TIME 332.1212 Updated amplicon interval ['chr12', 68545270, 69760270, -1]
[root:DEBUG] #TIME 332.1213 Updated amplicon interval ['chr12', 110075295, 110390294, -1]
[root:DEBUG] #TIME 332.1213 Begin processing amplicon interval 0
[root:DEBUG] #TIME 332.1213 Amplicon interval ['chr1', 766225, 1086326, -1]
[root:DEBUG] #TIME 332.1213 Start BFS on amplicon interval 0.
[root:DEBUG] #TIME 332.1213 BFS queue: [0]
[root:DEBUG] #TIME 332.1213 Next amplicon interval 0: ['chr1', 766225, 1086326, -1].
[root:DEBUG] #TIME 332.1214 Reset connected component ID to 0

CoRAL Command:
python3 /tscc/nfs/home/tiy002/software/CoRAL/src/infer_breakpoint_graph.py --lr_bam $PATH/TR14.mkdup.bam --seed $PATH/TR14_CNV_SEEDS.bed --output_prefix $PATH/TR14 --cnseg $PATH/TR14.mkdup.cns --log_fn TR14_trace.log

CNVkit command:
cnvkit.py batch -m wgs -r /tscc/nfs/home/tiy002/data_repo/GRCh38/GRCh38_cnvkit_filtered_ref.cnn -p 16 -d $PATH/cnvkit $PATH/TR14.mkdup.bam

Additional info:

  1. Both CNV_SEEDS.bed and cns files were generated via cnv_seed.py
  2. This is a tumor cell line long-read seq sample (PacBio) without matched normal
  3. Key steps for long-read seq data preprocessing (using PacBio tool smrtlink):
    a) BAM to FASTQ
    b) FASTQ alignment to BAM (PacBio CCS mode)
    c) Sort and index and perform mark duplicate
@MJoseMo
Copy link

MJoseMo commented Jul 24, 2024

@Tina04021997
I have a similar problem when running CoRaL.py reconstruct, I was wondering if you were able to fix it.
Traceback (most recent call last):

File «/ngs/coral_soft/CoRAL/src/CoRAL.py», line 150, in <module> reconstruct_mode(args) File «/ngs/coral_soft/CoRAL/src/CoRAL.py», line 23, in reconstruir_modo infer_breakpoint_graph.reconstruir(args) File «/ngs/coral_soft/CoRAL/src/infer_breakpoint_graph.py», line 1630, in reconstruir b2bn.find_amplicon_intervals() File «/ngs/coral_soft/CoRAL/src/infer_breakpoint_graph.py», line 224, in find_amplicon_intervals self.find_interval_i(ai, ccid) File «/ngs/coral_soft/CoRAL/src/infer_breakpoint_graph.py», line 362, in find_interval_i if i in self.chimeric_alignments_seg[chr]: KeyError: 'chr5'

@Jenny-Tamboli
Copy link

Hi Team CoRAL,

I have encountered almost similar issues as well while running reconstruct mode. I am using Nanopore reads aligned with NGMLR (which is in contrast to minimap2 mentioned in the article). The log file (error message attached) shows some key error corresponding to parsing the CIGAR string in bam files.

`Performing reconstruction with options:
mode: reconstruct
lr_bam: /data/cephfs-1/work/projects/ecdnapanc/carcinoma_celllines/Panc1_20230711_sorted.bam
cnv_seed: /data/cephfs-1/work/projects/ecdnapanc/CoRAL/outputs/seed_mode/Panc1_20230711_sorted_CNV_SEEDS.bed
output_prefix: /data/cephfs-1/work/projects/ecdnapanc/CoRAL/outputs/reconstruct_mode/Panc1_20230711_sorted
cn_seg: /data/cephfs-1/work/projects/ecdnapanc/carcinoma_celllines/cnvkit/calls/Panc1_20230711_sorted.call.cns
output_bp: False
skip_cycle_decomp: False
output_all_path_constraints: False
min_bp_support: 1.0
cycle_decomp_alpha: 0.01
cycle_decomp_time_limit: 7200
cycle_decomp_threads: None
postprocess_greedy_sol: False
log_fn: None

Traceback (most recent call last):
File "/data/cephfs-1/work/projects/ecdnapanc/CoRAL/src/CoRAL.py", line 162, in
reconstruct_mode(args)
File "/data/cephfs-1/work/projects/ecdnapanc/CoRAL/src/CoRAL.py", line 25, in reconstruct_mode
b2bn = infer_breakpoint_graph.reconstruct_graph(args)
File "/data/cephfs-1/work/projects/ecdnapanc/CoRAL/src/infer_breakpoint_graph.py", line 1356, in reconstruct_graph
b2bn.fetch()
File "/data/cephfs-1/work/projects/ecdnapanc/CoRAL/src/infer_breakpoint_graph.py", line 171, in fetch
self.chimeric_alignments[r] = cigar_parsing.alignment_from_satags(self.chimeric_alignments[r], rl)
File "/data/cephfs-1/work/projects/ecdnapanc/CoRAL/src/cigar_parsing.py", line 255, in alignment_from_satags
qs, qe, al = cigar2pos_ops[op](t[3], t[2], read_length)
KeyError: 'SMDMIMDMDMDMIMDMDMDMDMDMDMDMDMDMDMDMDMDMIMIMDMDMIMIMDMDMDMDMDMDMDMDMDMIMDMDMDMDMIMIMDMDMDMDMIMIMIMIMIMIMIMDMIMDMDMDMDMDMDMDMDMDMIMDMDMDMDMDMDMIMDMIMDMIMDMDMIMDMIMDMDMDMIMIMIMDMDMDMIMIMDMDMDMDMIMIMDMIMDMDMIMIMDMIMDMDMDMIMDMIMIMDMIMDMDMS'
`
I am also attaching log messages from infer_breakpoint_graph.log file.

`[root:INFO] Python version 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21)
[GCC 9.4.0]

[root:INFO] #TIME 0.0046 Commandline: /data/cephfs-1/work/projects/ecdnapanc/CoRAL/src/CoRAL.py reconstruct --lr_bam /data/cephfs-1/work/projects/ecdnapanc/carcinoma_celllines/Panc1_20230711_sorted.bam --cnv_seed /data/cephfs-1/work/projects/ecdnapanc/CoRAL/outputs/seed_mode/Panc1_20230711_sorted_CNV_SEEDS.bed --output_prefix /data/cephfs-1/work/projects/ecdnapanc/CoRAL/outputs/reconstruct_mode/Panc1_20230711_sorted --cn_seg /data/cephfs-1/work/projects/ecdnapanc/carcinoma_celllines/cnvkit/calls/Panc1_20230711_sorted.call.cns
[root:DEBUG] #TIME 0.1048 Parsed 1 seed amplicon intervals.
[root:DEBUG] #TIME 0.1051 Seed interval ['chr19', 38636506, 40851628]
[root:INFO] #TIME 0.1051 Opened LR bam files.
[root:DEBUG] #TIME 0.1494 Total num LR copy number segments: 1533.
[root:DEBUG] #TIME 0.1497 Use 22 LR copy number segments.
[root:DEBUG] #TIME 0.1498 Total length of LR copy number segments: 12454852.
[root:DEBUG] #TIME 0.1498 Average LR copy number: -0.094312.
[root:INFO] #TIME 36.1310 LR normal cov = 13.925882.
[root:DEBUG] #TIME 36.1311 Reset min_cluster_cutoff to 13.925882.
[root:INFO] #TIME 36.1312 Completed parsing CN segment files.
[root:INFO] #TIME 746.4804 Fetched 375886 chimeric reads.
`

Could you please guide how to solve this issue?
Thanks in advance!

@MJoseMo
Copy link

MJoseMo commented Sep 24, 2024

Hi @Jenny-Tamboli, I think it could only be used with winnowmap. We aligned our data with winnowmap and printed different variables of the script infer_breakpoint_graph.py. With winnowmap, the read's name is printed (column 1) but not with pbmm2, maybe because of how winnowmap defines the CIGAR string. I have also tried this program by providing the seed and bed files and it also produces errors.

@Jenny-Tamboli
Copy link

Hi @MJoseMo, Thank you very much for the swift response. I will try using winnowmap. But if I understood it correctly, despite using winnowmap, you still got errors. Is it?

@mattjones315
Copy link
Collaborator

Hi all,

Thanks so much for using CoRAL and posting this issue!

Though in our study we used minimap2 for all alignments, there's no reason that alternative aligners will not work (such as winnowmap or NGMLR). I think that one issue could be the MarkDuplicates step - if you run CoRAL without this last filter (i.e., just alignment & sorting), does this fix the issue? If I understand your workflow correctly, you did not do any PCR amplifications I'm not sure marking/removing duplicates is necessary for this analysis. Apologies if I misunderstood something.

Best,
Matt

@MJoseMo
Copy link

MJoseMo commented Sep 29, 2024

Hi,
I initially assumed that aligment of long-read sequences was performed with winnowmap, as it is indicated in align_nanopore_reads.sh (script folder). However, I noticed that minimap is mentioned in the draft paper.
@Jenny-Tamboli in the software's description, it states that CNV and seed files could be provided without running the script call_cnvs.sh or the seed command. However, I encountered an error when these files are proportioned without running these previous steps.

@madagiurgiu25
Copy link

madagiurgiu25 commented Oct 6, 2024

Hi,

I got similar issue when using NGMLR (0.2.7) aligner. However with minimap2 (2.28-r1209) it worked. This is I think because for the SA:Z tag, minimap2 cleans the CIGAR and removes small INDELS artifacts in the long-read data, whereas NGMLR does not (in this case the SA:Z the full CIGAR is included). CoRAL expects a cleaned CIGAR for processing the SA reads.

@jluebeck
Copy link
Member

Hi everyone,

We have just released a testing version for CoRAL that supports both PacBio data and bams produced by aligners that make alternately formatted CIGARs (e.g. NGMLR). We would be greatful if you would be able to give it a try with the data you encountered that has issues. Please let me know if this gives errors for you and if so which aligner & long read technology you are using.

cd CoRAL
git fetch --all
git checkout pacbio_dev
git pull origin pacbio_dev

Thank you!
Jens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants