Skip to content
This repository has been archived by the owner on Mar 8, 2022. It is now read-only.

Picard ELC Library for concurrent DNA computing. Performance proposals.

Notifications You must be signed in to change notification settings

GoodforGod/ConcurrentPicardELC

Repository files navigation

Custom Picard ELC Implementation

A set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats.

All credits to Broad Institute [ORIGINAL REPO]

What and Why

Main goal was to redesign Estimated Library Complexity from sequential to concurrent implementation, also SortingCollection was rewritten in concurrent & thread-safe way.

ELC Library Changes

There are 3 different concurrent implementation, the best and the most performance one [STREAM], and the two least code-clean and may be less performance.

Also [EXECUTOR] version contains all Predicates, abstract collection and etc, which are used in all Concurrent implementations.

P.S. All help and support classes and predicates were grouped in [EXECUTOR] intentionally.

Estimated Library Complexity [ORIGINAL]

Custom Estimated Library Complexity [STREAM | POOL | EXECUTOR]

Sorting Collection

Sorting Collection [ORIGINAL]

Custom Sorting Collection [CUSTOM]

Other

Abstraction wrapper over PeekableIterator, used to read and filter sorted files in async mode. [QueueProducer]

Performance Results

First Test

Heap Size Bam File Size JDK Processor
256mb 464mb 1.8.0_121 (64bit) i3-4030u 1x2x2

Result

Implementation Name Avegange Time (ms) Amount of iteratins MaxPairsInMemoty
Default 55541.009 5 169196
ConcurrentExecutorELC 37775.565 5 169196
ConcurrentPoolELC 39255.594 5 169196
ConcurrentStreamedELC 40355.021 5 169196
  • The best custom implementation is 47% faster.

Second Test

Heap Size Bam File Size JDK Processor
4gb 16.9gb 1.8.0_121 (64bit) i3-4030u 1x2x2

Result

Implementation Name Avegange Time (ms) Amount of iteratins
Default 1972577.978 5
ConcurrentExecutorELC 1253195.879 5
  • Custom impelemntation is 57.4% faster.

Building Picard

  • To build a fully-packaged, runnable Picard jar with all dependencies included, run:
    ./gradlew shadowJar
  • The resulting jar will be in build/libs. To run it, the command is:
    java -jar build/libs/picard.jar
    
    or
    
    java -jar build/libs/picard-<VERSION>-all.jar 
  • To build a jar containing only Picard classes (without its dependencies), run:
    ./gradlew jar
  • To clean the build directory, run:
    ./gradlew clean

Credits

[ORIGINAL REPO]

About

Picard ELC Library for concurrent DNA computing. Performance proposals.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published