Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update genome annotation file with multiple samples? #4

Open
jasonleongbio opened this issue Feb 3, 2023 · 1 comment · May be fixed by #28
Open

Update genome annotation file with multiple samples? #4

jasonleongbio opened this issue Feb 3, 2023 · 1 comment · May be fixed by #28

Comments

@jasonleongbio
Copy link

Hi, sorry for having posted so many issues recently.

I was simply wondering whether it's possible to update the old genome annotation file with BAM files from multiple samples. I have multiple samples from the same species (which is a non-model organism), so what I tried was to update the genome annotation file with the BAM file from one single-cell sample at a time. However, when I tried to update the already-updated gff file with the BAM file from a different single-cell sample, I got the following error message (i.e., original gtf/gff → updated version 1 → updated version 2; it failed at the conversion step from updated version 1 to updated version 2) :

INFO     Iterating over reads to determine SPAT pileups:   0%|           | [20:53<?]
2023-02-01 22:28:29,771 - INFO - Merging SPAT outputs.
2023-02-01 22:28:30,058 - INFO - Creating gff db.
2023-02-01 22:28:30,060 - INFO - Calling peaks for forward strand with MACS2.
2023-02-01 22:28:30,069 - INFO - Calling peaks for reverse strand with MACS2.
2023-02-01 22:28:30,069 - INFO - Populating features
2023-02-01 22:28:30,069 - INFO - Populating features
2023-02-01 22:28:30,227 - INFO - Clearing cache.ons: 2000 features
Traceback (most recent call last):
  File "/~~~home_dir~~~/tools/miniconda3/envs/peaks2utr/bin/peaks2utr", line 8, in <module>
    sys.exit(main())
  File "/~~~home_dir~~~/tools/miniconda3/envs/peaks2utr/lib/python3.9/site-packages/peaks2utr/__init__.py", line 49, in main
    asyncio.run(_main())
  File "/~~~home_dir~~~/tools/miniconda3/envs/peaks2utr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/~~~home_dir~~~/tools/miniconda3/envs/peaks2utr/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/~~~home_dir~~~/tools/miniconda3/envs/peaks2utr/lib/python3.9/site-packages/peaks2utr/__init__.py", line 102, in _main
    db, _, _ = await asyncio.gather(
  File "/~~~home_dir~~~/tools/miniconda3/envs/peaks2utr/lib/python3.9/site-packages/peaks2utr/preprocess.py", line 120, in create_db
    await sync_to_async(gffutils.create_db)(gff_in, gff_db, force=True, verbose=True, disable_infer_genes=True, disable_infer_transcripts=True)
  File "/~~~home_dir~~~/tools/miniconda3/envs/peaks2utr/lib/python3.9/site-packages/asgiref/sync.py", line 448, in __call__
    ret = await asyncio.wait_for(future, timeout=None)
  File "/~~~home_dir~~~/tools/miniconda3/envs/peaks2utr/lib/python3.9/asyncio/tasks.py", line 442, in wait_for
    return await fut
  File "/~~~home_dir~~~/tools/miniconda3/envs/peaks2utr/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/~~~home_dir~~~/tools/miniconda3/envs/peaks2utr/lib/python3.9/site-packages/asgiref/sync.py", line 490, in thread_handler
    return func(*args, **kwargs)
  File "/~~~home_dir~~~/tools/miniconda3/envs/peaks2utr/lib/python3.9/site-packages/gffutils/create.py", line 1292, in create_db
    c.create()
  File "/~~~home_dir~~~/tools/miniconda3/envs/peaks2utr/lib/python3.9/site-packages/gffutils/create.py", line 507, in create
    self._populate_from_lines(self.iterator)
  File "/~~~home_dir~~~/tools/miniconda3/envs/peaks2utr/lib/python3.9/site-packages/gffutils/create.py", line 589, in _populate_from_lines
    self._insert(f, c)
  File "/~~~home_dir~~~/tools/miniconda3/envs/peaks2utr/lib/python3.9/site-packages/gffutils/create.py", line 530, in _insert
    cursor.execute(constants._INSERT, feature.astuple())
sqlite3.InterfaceError: Error binding parameter 11 - probably unsupported type.

The code I ran was:

peaks2utr --extend-utr -p 25 -o <GENOME_ANNOTATION>.update2.gff3 /path/to/<GENOME_ANNOTATION>.update1.gff3 /path/to/the/second/replicate/sample/outs/possorted_genome_bam.bam

Thank you so much

@haessar
Copy link
Owner

haessar commented Feb 22, 2023

Interesting, I've not encountered this before. Allowing multiple BAM files as input is on the feature wish-list so watch this space. I wonder, can you verify that /path/to/<GENOME_ANNOTATION>.update1.gff3 is indeed in the correct format (i.e. not an empty file)? This is one of the reasons we use genometools as a post-processing step (see issue #3 ) - it performs validation to ensure everything is correctly formatted. If you have access to genometools, could you try running "gt gff3 -sort -retainids -tidy /path/to/<GENOME_ANNOTATION>.update1.gff3" to ensure no error is thrown.

@haessar haessar linked a pull request Jan 12, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants