Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbp_eclip models do not accept files without 'chr' string #195

Open
PedroBarbosa opened this issue Oct 25, 2019 · 4 comments
Open

rbp_eclip models do not accept files without 'chr' string #195

PedroBarbosa opened this issue Oct 25, 2019 · 4 comments
Labels

Comments

@PedroBarbosa
Copy link

Hi @Avsecz ,

Using vcf, gtf and fasta without 'chr' names throws the following error:

  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/bin/kipoi", line 8, in <module>
    sys.exit(main())
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/kipoi/__main__.py", line 105, in main
    command_fn(args.command, sys.argv[2:])
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/kipoi_veff/__main__.py", line 11, in cli_main
    kipoi_veff.cli.cli_main(command, raw_args)
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/kipoi_veff/cli.py", line 458, in cli_main
    command_fn(args.command, raw_args[1:])
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/kipoi_veff/cli.py", line 222, in cli_score_variants
    model_outputs=model_outputs)
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/kipoi_veff/snv_predict.py", line 795, in score_variants
    return_predictions=return_predictions)
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/kipoi_veff/snv_predict.py", line 620, in predict_snvs
    for i, batch in enumerate(tqdm(it)):
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/tqdm/std.py", line 1081, in __iter__
    for obj in iterable:
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/kipoi_utils/external/torch/data.py", line 154, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/kipoi_utils/external/torch/data.py", line 154, in <listcomp>
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/mnt/nfs/lobo/MCFONSECA-NFS/pedro.barbosa/.kipoi/models/rbp_eclip/dataloader.py", line 228, in __getitem__
    out['inputs']['seq'] = np.squeeze(self.seq_extractor([interval]), axis=0)
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/genomelake/extractors.py", line 26, in __call__
    self._extract(intervals, data, **kwargs)
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/genomelake/extractors.py", line 94, in _extract
    interval.stop)
  File "pysam/libcfaidx.pyx", line 303, in pysam.libcfaidx.FastaFile.fetch
KeyError: "sequence 'chr1' not present"
  0%|          | 0/29 [00:17<?, ?it/s]
@Avsecz
Copy link
Contributor

Avsecz commented Oct 25, 2019

Hi, are you sure the vcf and gtf files also contain chromosome names without 'chr'?

@PedroBarbosa
Copy link
Author

Hi,

Yes. Command as following
kipoi veff score_variants rbp_eclip/$line -o ${4}_${line}_rbp_eclip.vcf --dataloader_args='{"gtf_file":"$3", "fasta_file":"$2"}' -i "$1"

Chromosomes in VCF:
1,11,12,14,15,16,17,2,20,21,22,3,4,5,6,7,8,9,X
Chromosomes in fasta:

1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
3
4
5
6
7
8
9
MT
X
Y
GL000192.1
GL000225.1
GL000194.1
GL000193.1
GL000200.1
GL000222.1
GL000212.1
GL000195.1
GL000223.1
GL000224.1
GL000219.1
GL000205.1
GL000215.1
GL000216.1
GL000217.1
GL000199.1
GL000211.1
GL000213.1
GL000220.1
GL000218.1
GL000209.1
GL000221.1
GL000214.1
GL000228.1
GL000227.1
GL000191.1
GL000208.1
GL000198.1
GL000204.1
GL000233.1
GL000237.1
GL000230.1
GL000242.1
GL000243.1
GL000241.1
GL000236.1
GL000240.1
GL000206.1
GL000232.1
GL000234.1
GL000202.1
GL000238.1
GL000244.1
GL000248.1
GL000196.1
GL000249.1
GL000246.1
GL000203.1
GL000197.1
GL000245.1
GL000247.1
GL000201.1
GL000235.1
GL000239.1
GL000210.1
GL000231.1
GL000229.1
GL000226.1
GL000207.1

Chromosomes in gtf:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Y
M
GL000241.1
GL000193.1
GL000220.1
GL000237.1
GL000212.1
GL000220.1
GL000212.1
GL000220.1
GL000212.1
GL000202.1
GL000228.1
GL000199.1
GL000192.1
GL000220.1
GL000192.1
GL000195.1
GL000205.1
GL000195.1
GL000193.1
GL000205.1
GL000195.1
GL000204.1
GL000220.1
GL000193.1

Pedro

@Avsecz
Copy link
Contributor

Avsecz commented Oct 25, 2019

Hm. Can you try removing the .fai file?

@PedroBarbosa
Copy link
Author

Same outcome. This is the first variant to be teste (needed to remove INFO field so other error appears):

1 981984 541161 C T . . .

I think when kipoi generates intervals surrounding the variant automatically generates chr in the bed file, which then does not exist in the gtf. When I add chr string in all files it works (as opposed to other moelds (e.g. kipoiSplice4) that only worked without chr).

INFO [kipoi_veff.snv_predict] Using variant-centered sequence generation.
Using TensorFlow backend.
/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/keras/engine/saving.py:350: UserWarning: Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.
  warnings.warn('Error in loading the saved optimizer '
  0%|          | 0/59 [00:00<?, ?it/s]INFO:2019-10-26 15:37:17,147:genomelake] Running landmark extractors..
/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/concise/utils/position.py:55: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
  ("strand", gtf.strand)])
/home/pedro.barbosa/software/miniconda3/envs/kipoi-rbp_eclip/lib/python3.6/site-packages/concise/utils/position.py:62: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
  ("strand", gtf.strand)])
INFO:2019-10-26 15:37:33,903:genomelake] Done!
chr1	981934	982035	0

@haimasree haimasree added the bug label Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants