Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: invalid literal for int() with base 10 #11

Open
liupfskygre opened this issue Feb 5, 2022 · 4 comments
Open

ValueError: invalid literal for int() with base 10 #11

liupfskygre opened this issue Feb 5, 2022 · 4 comments

Comments

@liupfskygre
Copy link

Hi, Ann,
I got an error when running metapop installed from pip with the following command:
metapop --input_samples ./bamfile --reference ./reference --norm tp-notp-166-metapop_ctfile.txt --threads 60

the installation should be fine since i run the toy dataset and it successfully done.

Following is error info, do you have any suggestions on how to fix it.

Thanks.
Pengfei

#error info
File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/site-packages/metapop/metapop_mine_reads.py", line 450, in do_mine_reads
res = access_read_ranges(selections_to_read, threads, output_directory)
File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/site-packages/metapop/metapop_mine_reads.py", line 202, in access_read_ranges
res = pool.map(read_one_range, ranges)
File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
ValueError: invalid literal for int() with base 10: 'KQGRI2_20_08_k141_904564'

@liupfskygre
Copy link
Author

liupfskygre commented Feb 5, 2022

I rerun the command again and with the following errors similar to above ones,

Reference base at each position will be the consensus of all files.
Getting codon usage bias...
Finalizing SNPs...
Updating genes with consensus bases...
Updating genomes with consensus bases...
MetaPop SNP refinement finished at: 05/02/2022 11:40:48
Linking SNPs starting at: 05/02/2022 11:40:48...multiprocessing.pool.RemoteTraceback:

Traceback (most recent call last):
  File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/site-packages/metapop/metapop_mine_reads.py", line 143, in read_one_range
    leftmost = int(segs[3].decode())
ValueError: invalid literal for int() with base 10: 'TGLS2_1908_Scaff092085'


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/PTPE2/Software/miniconda3/envs/metapop/bin/metapop", line 8, in <module>
    sys.exit(main())
  File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/site-packages/metapop/metapop_main.py", line 300, in main
    linked_file = metapop.metapop_mine_reads.do_mine_reads(output_directory_base, threads)
  File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/site-packages/metapop/metapop_mine_reads.py", line 450, in do_mine_reads
    res = access_read_ranges(selections_to_read, threads, output_directory)
  File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/site-packages/metapop/metapop_mine_reads.py", line 202, in access_read_ranges
    res = pool.map(read_one_range, ranges)
  File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/PTPE2/Software/miniconda3/envs/metapop/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
ValueError: invalid literal for int() with base 10: 'TGLS2_1908_Scaff092085'

@liupfskygre
Copy link
Author

Hi, Ann,
I checked things in more detail.
I checked the metapop_mine_reads.py, and see segs[3] is defined as ref_base = segs[3],

fh = open(file)
	for line in fh:
		if line.endswith("True\n"):
			segs = line.strip().split("\t")
			#contig_pos = segs[0]
			contig = segs[1]
			pos = int(segs[2])
			ref_base = segs[3]
			source = segs[9]
			snps = segs[10]
			contig_gene = segs[11]
			#if OC == 1, strand = forward, else strand = reverse
			OC = int(segs[14])
			codon = int(segs[15])
			pos_in_codon = int(segs[16])
			
			linked_data[source][contig][contig_gene][codon][OC].append([pos, ref_base, snps, pos_in_codon])
				
	fh.close()

I guess the file is refer to the genic_snps.tsv file in the MetaPop/07.Cleaned_SNPs dir with the header, right?

contig_pos	contig	pos	ref_base	depth	a_ct	t_ct	c_ct	g_ct	source	snps	contig_gene	start	end	OC	codon	pos_in_codon	link

if so,

then ref_base =segs[3] should be one base 'A', 'T', 'C', 'G', right?

in my case, it becomes something else.

and even with ATCG, int(segs[3]) will raise an error, int('T')

so, what is the file here refer to, and how could this been fixed?

thanks,
Pengfei

@metaGmetapop
Copy link
Owner

Hi Pengfei - let me pass these errors on to Kenji. He's the mastermind behind the new code. We'll get back to you soon!

@KGerhardt
Copy link

That line caused the same error for another user. The problem was that the mapping tool he had used, BBmap, took more information from the deflines of his reads than the sequence ID, and the additional information contained whitespaces.

The split to create segs in the mine_reads script is done by issuing a call to samtools, reading the output into python, and splitting the line on whitespace. If there are more whitespaces than expected, then the position of the read in the reference genome is shifted past the 4th position in the split line.

We have a new version of the code up already that fixes this problem. The split happens on tabs (samtools output is tab-separated)) instead of separating on any whitespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants