BioDart is an experimental effort to develop a Bioinformatics Toolkit in pure dart. The goal of this project is to leverage the performance, ease of use and especially the portability of dart to create tools that can be used across multiple platforms with minimal to no dependencies.
- Dart only implementation to make the code easy to mantain and understand.
- Fully available everywhere dart runs including linux, macos, windows, web and mobile.
- Minimal dependencies to avoid dependency hell.
- Optimized for performance.
- Type-safe implementations of common bioinformatics algorithms and data structures.
Dart is a relatively modern language, with a strong type system, and a focus on performance. It is also a language that is easy to learn and use, and has a strong community backing it. This repo exists to try and ease the dependency hell that bioinformatics tools often face.
Important
The contents of this repo are still in the very early stages of development and are not yet ready for production use. Refer to the Roadmap Section for more information on the current state of the project.
Note
This project is currently maintained only by me, mirroring the requirements of personal projects, but features can be requested and contributions are welcome. Feel free to get in contact if you would like to help development.
In the spirit of the project, the only (third party) dependency this package has, is on the readers package, required to provide a source and format-agnostic IO framework. This package is also designed and maintained by me and is available here.
Main module containing common genomics data structures and algorithms.
Utility module to aid in benchmarking of biodart features. Provides method to download and cache large files. The module will be expanded in the near future to keep up with benchmarking and testing needs.
Module for reading Hi-C contact maps. Currently it supports only V8 files (format specification), though support for V9 and older formats is being actively worked on.
Warning
The hic package was fully broken by the change in paradigm of the readers package. I am working on a fix for that.
Module for reading and writing FASTA files. This module is currently in a semi-usable state and is being actively worked on. See the subdirectory's example
folder for more information.
Note
When dealing with the same task, biodart
s implementation of sequence parsing appears to match to BioPython's
performance. For more information, please check the benchmark
package and benchmark/scripts
.
- Fundamental data structures
- GenomicRange, Strand, Chromosome implementations
- GenomeReference implementation
- Format readers
- FASTA
- Header and sequence identifier parsing
- Nucleotide sequence reading
- Multiline sequence handling
- Read Structure Validation
- Full Unit Tests.
- FASTQ
- Header and sequence identifier parsing
- Nucleotide sequence reading
- Quality score parsing
- Quality score validation
- Full Unit Tests.
- SAM/BAM: Roadmap not yet created
- VCF: Roadmap not yet created
- GFF: Roadmap not yet created
- BED: Roadmap not yet created
- [-] Hi-C formats broken
- [-] V8 format support broken
- Legacy version support
- V9 format implementation
- FASTA
- Common algorithms
- Sequence alignment: Roadmap not yet created
- Testing
- Unit test coverage
- Integration testing
- Performance benchmarks