Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

Commit

Permalink
Merge pull request #6 from trstickland/master
Browse files Browse the repository at this point in the history
Dockerfile and addition of optional commands
  • Loading branch information
trstickland authored Jul 8, 2019
2 parents ec9f95c + 1d2a2a8 commit 2567fa4
Show file tree
Hide file tree
Showing 11 changed files with 88 additions and 28 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ install:
env:
- GENOMETOOLS_PATH='/usr/bin/gt'
script:
- "./run_tests.sh"
- ". ./run_tests.sh"
31 changes: 31 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
FROM ubuntu:bionic

MAINTAINER [email protected]

ENV BASHRC /etc/bash.bashrc
ENV BUILD_DIR /gffmunger-build
ENV CONF_DIR /etc/gffmunger

RUN apt-get update -qq
RUN apt-get install -y genometools git python3 python-setuptools python3-biopython python3-pip

# RUN pip3 install dumper gffutils pyyaml

RUN grep GENOMETOOLS_PATH ${BASHRC} || bash -c "echo; echo 'export GENOMETOOLS_PATH=\"/usr/bin/gt\"'; echo" >> ${BASHRC}

COPY . ${BUILD_DIR}

COPY ./*config.yml ${CONF_DIR}/

RUN pip3 install ${BUILD_DIR} && \
bash -c "cd ${BUILD_DIR} && . ./run_tests.sh --verbose"

RUN bash -c "echo; echo 'alias gffmunger=\"gffmunger --config ${CONF_DIR}/gffmunger-config.yml\"'; echo" >> ${BASHRC}

VOLUME /var/data

CMD echo "Usage: docker run -v \`pwd\`:/var/data -it <IMAGE_NAME> bash" && \
echo "" && \
echo "This will place you in a shell with your current working directory accessible as /var/data." && \
echo "For help, type" && \
echo " gffmunger --help"
27 changes: 15 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# GFF munger

Munges GFF3 files exported from Chado

[![Build Status](https://travis-ci.org/sanger-pathogens/gffmunger.svg?branch=master)](https://travis-ci.org/sanger-pathogens/gffmunger)
Expand All @@ -18,7 +19,9 @@ Munges GFF3 files exported from Chado
* [Feedback/Issues](#feedbackissues)

## Introduction
Munges GFF3 files exported from Chado (http://www.gmod.org/) database to make them suitable for loading into WebApollo. Currently this involved transferring annotations from polypeptide features to the feature (e.g. mRNA) from which the polypeptide derives.
Munges GFF3 files exported from Chado (http://www.gmod.org/) database to make them suitable for loading into WebApollo.

Currently supports very few functions, but provides a possible framework for additional functionality.

## Installation
There are a number of ways to install GFF munger and details are provided below. If you encounter an issue when installing GFF munger please contact your local system administrator. If you encounter a bug please log it [here](https://github.com/sanger-pathogens/gffmunger/issues) or email us at [email protected].
Expand All @@ -36,25 +39,25 @@ Install GFF munger:

`conda install -c bioconda gffmunger`

### Debian/Ubuntu (Trusty/Xenial)
To install Python3 on Ubuntu, as root run:
```
apt-get update -qq
apt-get install -y git python3 python3-setuptools python3-biopython python3-pip
pip3 install git+git://github.com/sanger-pathogens/gffmunger.git
```
### Running the tests
The test can be run from the top level directory:
The test can be run from the top level directory:
```
./run_tests.sh
```
## Usage

## Synopsis

```
gffmunger [--input chado_export.gff3.gz] [--fasta chado_export.fasta] [--output webapollo_compatible.gff3] [--quiet|--verbose]
gffmunger [command1 ... commandN] [--input chado_export.gff3.gz] [--fasta chado_export.fasta] [--output webapollo_compatible.gff3] [--quiet|--verbose]
```

Without `--input`, will read from standard input; without `--output`, will write new GFF3 to standard output. If `--fasta` is not used, then will attempt to read FASTA data from the input GFF3 file.
### Commands

*move_polypeptide_annot* (default) transfers annotations from polypeptide features to the feature (e.g. mRNA) from which the polypeptide derives.

### Input/output options

Without `--input`, will read from standard input; without `--output`, will write new GFF3 to standard output. If `--fasta` is not used, then will read FASTA data (if present) from the input GFF3 file.

## License
GFF munger is free software, licensed under [GPLv3](https://github.com/sanger-pathogens/gffmunger/blob/master/LICENSE).
Expand Down
29 changes: 23 additions & 6 deletions gffmunger/GFFMunger.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,16 @@
from pyfaidx import Fasta

class GFFMunger:

def __init__(self,options):

self.known_commands = ['move_polypeptide_annot', 'null']

# CLI options
if None == options:
# passing None allow class to be constructed without options being defined; intended for example to permit tests
# *not* recommended for normal usage, for which the 'gffmunger' script is provided
self.commands = self.known_commands.copy()
self.verbose = False
self.quiet = False
self.novalidate = False
Expand All @@ -31,6 +35,7 @@ def __init__(self,options):
self.gt_path_arg = None
else:
# this should be the normal case
self.commands = options.commands
self.verbose = options.verbose
self.quiet = options.quiet
self.novalidate = options.no_validate
Expand All @@ -55,11 +60,18 @@ def setLogLevel(level):
else:
setLogLevel(logging.WARNING)

# check command(s)
for n,c in enumerate(self.commands):
if c in self.known_commands:
self.logger.info('Munge command '+str(n+1)+': '+c)
else:
raise ValueError('Munge command "'+c+'" not recognized')

# options from configuration file
config_filename = os.path.join(os.path.dirname(os.path.realpath(__file__)), '..', self.config_file)
try:
config_fh = open(config_filename, 'r');
self.config = yaml.load(config_fh)
self.config = yaml.safe_load(config_fh)
except Exception:
self.logger.critical("Can't read configuration file"+config_filename)
raise
Expand Down Expand Up @@ -148,17 +160,22 @@ def run(self):
self.import_fasta(self.fasta_file_arg)
# read GFF3 metadta (and poss. other bits) into text buffer(s)
self.extract_GFF3_components(self.gff3_input_filename)
# transfer annotations from polypeptide features to the feature they derived from
self.move_annotations()

if 'move_polypeptide_annot' in self.commands:
self.logger.info('transferring polypeptide feature annotations')
# transfer annotations from polypeptide features to the feature they derived from
self.move_polypeptide_annotations()

# write new GFF3 to file or stdout
self.export_gff3()
# if GFF3 file was written, validate it if required
if self.output_file is not None and not self.novalidate:
self.validate_GFF3(self.output_file)
# whack temporary files

except Exception:
self.clean_up()
raise

self.clean_up()


Expand Down Expand Up @@ -377,7 +394,7 @@ def append(new, buf=''):



def move_annotations(self):
def move_polypeptide_annotations(self):
"""moves annotations from the polypeptide feature to the feature from which it derives (e.g. mRNA)"""
num_polypeptide=0
# this list caches all modified Feature objects
Expand Down
Empty file modified gffmunger/tests/data/SAMPLE_INCL_FASTA.gff3.gz
100755 → 100644
Empty file.
Empty file modified gffmunger/tests/data/SMALL_SAMPLE_INCL_FASTA.gff3.gz
100755 → 100644
Empty file.
8 changes: 4 additions & 4 deletions gffmunger/tests/io_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def test_000_gff3_with_fasta_io(self):
with warnings.catch_warnings():
warnings.filterwarnings("ignore", "unclosed file <_io\.TextIOWrapper", ResourceWarning, "gffutils", 668 )
try:
gffmunger.move_annotations()
gffmunger.move_polypeptide_annotations()
except:
self.fail("Failed to process valid GFF3 file "+test_gff_file)
warnings.resetwarnings()
Expand All @@ -76,7 +76,7 @@ def test_020_gff3_i0(self):
self.assertIsInstance(newmunger.import_fasta(), pyfaidx.Fasta)
self.assertIsInstance(newmunger.import_fasta(test_fasta_file), pyfaidx.Fasta)
try:
newmunger.move_annotations()
newmunger.move_polypeptide_annotations()
except:
self.fail("Failed to process valid GFF3 file "+test_gff_no_fasta)
warnings.resetwarnings()
Expand All @@ -96,9 +96,9 @@ def test_050_gff_error_handling(self):
try:
oldloglevel = yet_another_munger.logger.level
yet_another_munger.logger.setLevel(logging.CRITICAL)
yet_another_munger.move_annotations()
yet_another_munger.move_polypeptide_annotations()
yet_another_munger.logger.setLevel(oldloglevel)
except AssertionError:
self.fail("AssertionError should not be raised by GFFMunger.move_annotations() when processing annotations in "+broken_gff_file)
self.fail("AssertionError should not be raised by GFFMunger.move_polypeptide_annotations() when processing annotations in "+broken_gff_file)
warnings.resetwarnings()
yet_another_munger.clean_up()
Empty file modified gffutils-dumper.py
100755 → 100644
Empty file.
Empty file modified run_tests.sh
100755 → 100644
Empty file.
14 changes: 11 additions & 3 deletions scripts/gffmunger
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ version = ''
try:
version = pkg_resources.get_distribution("gffmunger").version
except pkg_resources.DistributionNotFound:
version = '0.0.1'
version = '0.0.2'

# looking for this file
config_filename = 'gffmunger-config.yml'
Expand All @@ -28,9 +28,17 @@ if os.path.exists(config_sys_path):
if config_file_path is None or not os.path.exists(config_file_path):
config_file_path = os.path.normpath(os.path.join(os.path.dirname(os.path.realpath(__file__)),'..',config_filename))

parser = argparse.ArgumentParser( description = 'A munger of GFF files. Proper description to follow.',
usage = __file__+' [options]', formatter_class=argparse.ArgumentDefaultsHelpFormatter
parser = argparse.ArgumentParser( description = "Munges GFF files. Use one or more of the following commands:\n"# 80 chars --->|
+ " move_polypeptide_annot transfer annotations from polypeptides to the\n"
+ " feature (e.g. mRNA) they derive from\n"
+ " null do nothing\n",
#usage = __file__+' [command1 .. commandN] [options]',
#formatter_class = argparse.ArgumentDefaultsHelpFormatter,
formatter_class = argparse.RawTextHelpFormatter
)

parser.add_argument('commands', default = ['move_polypeptide_annot'], metavar='command', type=str, nargs='*', help = "Command(s) defining how the GFF should be munged")

parser.add_argument('--verbose', action='store_true', default = False, help = 'Turn on debugging [%(default)s]')
parser.add_argument('--quiet', '-q', action='store_true', default = False, help = 'Suppress messages & warnings [%(default)s]')
parser.add_argument('--no-validate', '-n', action='store_true', default = False, help = 'Do not validate the input GFF3 [%(default)s]')
Expand Down
5 changes: 3 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
def read(fname):
return open(os.path.join(os.path.dirname(__file__), fname)).read()

version = '0.0.1'
version = '0.0.2'
if os.path.exists('VERSION'):
version = open('VERSION').read().strip()

Expand All @@ -27,7 +27,8 @@ def read(fname):
install_requires=[
'biopython >= 1.68',
#'pyfastaq >= 3.12.0'
'gffutils'
'gffutils', # no version requirements known; tested with 0.9
'pyyaml' # no version requirements known; tested with 5.1.1
],
license='GPLv3',
classifiers=[
Expand Down

0 comments on commit 2567fa4

Please sign in to comment.