GitHub - abiggerhammer/bbtools: Tools and other possibly useful things for use with BioBricks and other aspects of synthetic biology.

abiggerhammer / bbtools Public

Notifications You must be signed in to change notification settings
Fork 3
Star 6

Tools and other possibly useful things for use with BioBricks and other aspects of synthetic biology.

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
sbdtp		sbdtp
README		README
allbricks-04092009.tar.bz2		allbricks-04092009.tar.bz2
errors-04092009.log		errors-04092009.log
getattributes.py		getattributes.py
secondarygetattributes.py		secondarygetattributes.py

Repository files navigation

06/09/2009, 11:20am:
====================
fasta2datastore.py works, with robust logging. python fasta2datastore -h for instructions; note that you will need to
have AppEngine (i.e., whatever directory contains google/appengine/...) in your PYTHONPATH. 

06/09/2009:
===========
Working on a rather hackish little script, fasta2datastore.py, which will populate a datastore using the local
development server; this can then be exported and uploaded en masse to appspot. I don't recommend that anyone else use 
the approach I'm using, btw; I shouldn't be using it myself. :P

05/09/2009:
===========
sbdtp/properties.py defines a custom property, DNAProperty, for use with Google AppEngine. It stores DNA sequences using
Bio.Seq from biopython. Note that for some reason, Bio.Seq doesn't actually validate whether the sequence conforms to the
chosen Alphabet, but this property does take care of that, using its .validate() method.

AppEngine doesn't provide a facility for filtering on custom properties, but I'm going to modify the fulltext search tool
(see http://github.com/DocSavage/appengine-search/tree/master) so that we can search for substrings in a sequence, using
the .find() method of Bio.Seq.

04/09/2009:
===========
getattributes.py and secondarygetattributes.py are a couple of python scripts for scraping the DAS server at
partsregistry.org (or any DAS server, really). 

allbricks-DDMMYYYY.tar.bz2 is the features and sequence scraped for all bricks that these scripts could find on DDMMYYYY.

Usage:

$ python getattributes.py http://path/to/DAS/entry_points

This will pull the entry points list from the DAS server and then fetch the features and sequence for each entry point,
putting them into subdirectories (features/ and sequences/) of whatever directory you run the script in.

secondarygetattributes.py is BioBricks-specific. For some reason, not every valid BioBrick appears in the entry_points
list, but a lot of them appear as features. This script finds named BioBricks in features files and grabs their features
and sequences from the DAS server.

secondarygetattributes.py takes no arguments, just run the script.

Dependencies: lxml (easy_install lxml if your system has easy_install, http://codespeak.net/lxml if not)

Known issues:
 - lxml chokes on certain characters in text elements, e.g. '<' and '>'. This means that secondarygetattributes can't do
   anything with those files, so we may still be missing some data.
 - Not currently checking for any kind of HTTP errors, so any response other than 200 and the script dies. If this happens,
   just restart it; it won't bother to redownload anything that it already has both the features and sequence for.
 - Tags to use for identifying BioBrick names (e.g. SEGMENT and TYPE) are hardcoded; this should really be a commandline
   option.
 - Lots of duplicated code which really ought to be refactored out into a module.
 - Some form of commandline help / usage notes would be polite.