Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid literal for int() with base 10: '1307=1' and UserWarning: Problem parsing transcript with ID 'transcript/10670' #28

Open
philge opened this issue Mar 10, 2022 · 3 comments

Comments

@philge
Copy link

philge commented Mar 10, 2022

Hi,

I am getting issues like below when I run TranscriptClean
"Correcting transcripts...
invalid literal for int() with base 10: '1307=1'
invalid literal for int() with base 10: '15=1'
invalid literal for int() with base 10: '1094=1'
invalid literal for int() with base 10: '1094=1'
invalid literal for int() with base 10: '1093=1'
invalid literal for int() with base 10: '1509=1'
invalid literal for int() with base 10: '511=1'
invalid literal for int() with base 10: '91=15588'
invalid literal for int() with base 10: '77=19737'
invalid literal for int() with base 10: '77=19737'
.."

Also,
"/data_disk2/software/TranscriptClean-2.0.3/TranscriptClean.py:339: UserWarning: Problem parsing transcript with ID 'transcript/10670'
warnings.warn("Problem parsing transcript with ID '" +
/data_disk2/software/TranscriptClean-2.0.3/TranscriptClean.py:339: UserWarning: Problem parsing transcript with ID 'transcript/10345'
warnings.warn("Problem parsing transcript with ID '" +
/data_disk2/software/TranscriptClean-2.0.3/TranscriptClean.py:339: UserWarning: Problem parsing transcript with ID 'transcript/11633'
warnings.warn("Problem parsing transcript with ID '" +
/data_disk2/software/TranscriptClean-2.0.3/TranscriptClean.py:339: UserWarning: Problem parsing transcript with ID 'transcript/11869'
warnings.warn("Problem parsing transcript with ID '" +
/data_disk2/software/TranscriptClean-2.0.3/TranscriptClean.py:339: UserWarning: Problem parsing transcript with ID 'transcript/23980'
warnings.warn("Problem parsing transcript with ID '" +
/data_disk2/software/TranscriptClean-2.0.3/TranscriptClean.py:339: UserWarning: Problem parsing transcript with ID 'transcript/224'
warnings.warn("Problem parsing transcript with ID '" +"

Can you please help me to fix the issue?

Thanks
Philge

@fairliereese
Copy link
Member

Hi,
I'm pretty sure that the "problem parsing transcript..." warnings are being caused by the "invalid literal..." errors, but prior versions of TranscriptClean have buried the stack trace of the thrown errors in a try / except block. If you install the latest version, I am certain it will still throw an error but it will be more informative. Would you be able to run it with the latest commits and copy the output from that run here? Should make it easier to debug.

@TJiangBio
Copy link

@fairliereese Hi, I have the same ERROR and ran with the latest commits. Here's the output:

Traceback (most recent call last):
File "xxx/miniconda3/envs/isoseq/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "xxx/miniconda3/envs/isoseq/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "xxx/TranscriptClean/newest_version/TranscriptClean/TranscriptClean.py", line 576, in run_chunk
buffer_size=options.buffer_size)
File "xxx/TranscriptClean/newest_version/TranscriptClean/TranscriptClean.py", line 364, in batch_correct
options, refs)
File "xxx/TranscriptClean/newest_version/TranscriptClean/TranscriptClean.py", line 409, in correct_transcript
refs.sjAnnot)
File "xxx/TranscriptClean/newest_vestion/TranscriptClean/TranscriptClean.py", line 336, in transcript_init
transcript = Transcript(sam_fields, genome, sjAnnot)
File "xxx/newest_verstion/TranscriptClean/transcript.py", line 48, in init
self.NM, self.MD = self.getNMandMDFlags(genome)
File "xxx/TranscriptClean/newest_verstion/TranscriptClean/transcript.py", line 289, in getNMandMDFlags
operations, counts = self.splitCIGAR()
File "xxx/TranscriptClean/newest_verstion/TranscriptClean/transcript.py", line 126, in splitCIGAR
return splitCIGARstr(self.CIGAR) # alignTypes, counts
File "xxx/TranscriptClean/newest_verstion/TranscriptClean/transcript.py", line 586, in splitCIGARstr
counts = [int(i) for i in counts]
File "xxx/TranscriptClean/newest_verstion/TranscriptClean/transcript.py", line 586, in
counts = [int(i) for i in counts]
ValueError: invalid literal for int() with base 10: '130=20054'

Could you please help to fix this issue? Thanks.

@fairliereese
Copy link
Member

fairliereese commented Mar 27, 2023

Thanks for running with the newest commits, I now know what line is throwing the error but am still not entirely sure what's causing it. If you would be able to send me a snippet of your input SAM file (that still causes an error when you try to run it), that would be really helpful.

Alternatively, you can send me the cigar string of one of your transcripts that you know is causing the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants