Skip to content

Tools for using Picard and GATK with the Google Genomics API.

License

Notifications You must be signed in to change notification settings

jakeakopp/gatk-tools-java

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gatk-tools-java Build Status Coverage Status

Tools for using Picard and GATK with Genomics API.

  • Common classes for getting Reads from GA4GH Genomics API and exposing them as SAMRecord "Iterable" resource.

  • Implementation of a custom reader that can be plugged into Picard tools to handle reading of the input data specified via a url and coming from GA4GH API.

  • A set of shell scripts (src/main/scripts) that demonstrate how to run Picard tools with Ga4GH custom reader.

  • Requires htsjdk version 1.128 and greater and Picard latest version (past this commit https://github.com/iliat/picard/commit/ebe987313d799d58b0673351b95d3ca91fed82bf).

  • You can download Picard from: http://broadinstitute.github.io/picard/ and build it according to the instructions.

Build:
To build with ant: ant gatk-tools-java-jar.

Note that examples below assume you have built with ant, it produces dist/gatk-tools-java-1.0.jar The following examples assume you have picard folder side by side with gatk-tools-java.

The typical command line would look like:

java -jar \  
-Dsamjdk.custom_reader=https://www.googleapis.com/genomics,<location of gatk-tools-java jar> \  
-Dga4gh.client_secrets=<location of client_secrets.json>  \   
dist/picard.jar <ToolName> \  
INPUT=<input url>  

E.g

java -jar \
-Dsamjdk.custom_reader=https://www.googleapis.com/genomics,com.google.cloud.genomics.gatk.htsjdk.GA4GHReaderFactory,\
`pwd`/dist/gatk-tools-java-1.0.jar \  
-Dga4gh.client_secrets=client_secrets.json \  
../picard/dist/picard.jar ViewSam \  
INPUT=https://www.googleapis.com/genomics/v1beta2/readgroupsets/CK256frpGBD44IWHwLP22R4/  

The test read group set used here is the ex1_sorted.bam that can be found in testdata/ folder.
The data has been uploaded to the cloud project: https://console.developers.google.com/storage/browser/gatk-tools-java/.

The dataset id is: 15448427866823121459 and the read group set id is CK256frpGBD44IWHwLP22R4.

To build with Maven: mvn compile mvn bundle:bundle.
Note that Maven build produces gatk-tools-java-1.1-SNAPSHOT.jar.

  • For Picard tools that have not yet been instrumented to work with a custom reader, you can use Ga4GHPicardRunner. It is a wrapper around Picard tools that allows for INPUTS into Picard tools to be ga4gh:// urls by consuming the data via the API and using pipes to send it to Picard tool.

About

Tools for using Picard and GATK with the Google Genomics API.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 97.2%
  • Shell 2.8%