Skip to content

Latest commit

 

History

History
175 lines (138 loc) · 4.86 KB

README.md

File metadata and controls

175 lines (138 loc) · 4.86 KB

qcbc

qcbc is a python package to quality control synthetic barcode sequences for orthogonal sequencing-based assays such as:

Installation

The latest release can be installed with

pip install qcbc

The development version can be installed with

pip install git+https://github.com/pachterlab/qcbc

Run qcbc on your own barcode list Open In Colab

Usage

qcbc consists of four subcommands:

$ qcbc
usage: qcbc [-h] [--verbose] <CMD> ...

qcbc 0.0.2: Format sequence specification files

positional arguments:
  <CMD>
    ambiguous  find barcodes with shared subsequence
    content    compute base distribution (A,T,C,G counts/frequencies)
    homopolymer
               compute homopolymer distribution (length > 2)
    pdist      compute pairwise distance
    volume     compute size of barcode space

Barcode files are expected to contain both the barcode sequence and a name associated with the barcode, separated by a tab. For example

$ cat barcodes.txt
AGCAGTTACAG tag1
CTTGTACCCAG tag2

$ cat -t barcodes.txt 
CATGGAGGCG^Itag1
AGCAGTTACAG^Itag2

Note that cat -t file.txt converts <tabs> into ^I and can be used to verify that the file is properly setup.

qcbc ambiguous: find barcodes with shared subsequence

Find barcodes that share subsequences of a given length.

qcbc ambiguous -l <length> <bc_file>
  • optionally, -rc can be used to check the reverse complement of the subsequences.
  • <length> corresponds to the subsequence length used to evaluate ambiguity between barcodes.
  • <bc_file> corresponds to the barcode file.

Examples

# check ambiguous barcodes by subsequences of length 6
$ qcbc ambiguous -l 3 barcodes.txt
CAG	tag1,tag1,tag2
TAC	tag1,tag2

qcbc content: compute base distribution

Compute the base distribution within each barcode.

qcbc content <bc_file>
  • optionally, specify -- frequency to return the base distribution fraction
  • optionally, specify --entropy to return the entropy of the base distribution fraction relative to the max entropy.
  • <bc_file> corresponds to the barcode file.

Examples

$ qcbc content -e barcodes.txt
name	seq	ent
tag1	AGCAGTTACAG	0.67
tag2	CTTGTACCCAG	0.67

qcbc homopolymer: compute homopolymer distribution

Find the number of homopolymers of length two or greater.

qcbc homopolymer <bc_file>
  • <bc_file> corresponds to the barcode file.

Examples

$ qcbc homopolymer barcodes.txt
name  seq homopolymer_length
tag1	AGCAGTTACAG	1,0,0,0,0,0,0,0,0,0
tag2	CTTGTACCCAG	1,1,0,0,0,0,0,0,0,0

qcbc pdist: compute pairwise distance

Compute the pairwise hamming distance between barcodes.

qcbc pdist <bc_file>
  • optionally, -rc can be used to check the reverse complement of the subsequences.
  • <bc_file> corresponds to the barcode file.

Examples

$ qcbc pdist barcodes.txt
AGCAGTTACAG	tag1	CTTGTACCCAG	tag2	8.0

qcbc volume: compute size of barcode space

Compute the fraction of barcode space occupied by the given barcodes.

qcbc volume <bc_file>
  • <bc_file> corresponds to the barcode file.

Examples

$  qcbc volume barcodes.txt
2 out of 4,194,304 possible unique barcodes representing 0.0000%

Contributing

Thank you for wanting to improve qcbc. If you have a bug that is related to qcbc please create an issue. The issue should contain

  • the qcbc command ran,
  • the error message, and
  • the qcbc and python version.

If you'd like to add assays sequence specifications or make modifications to the qcbc tool please do the following:

  1. Fork the project.
# Press "Fork" at the top right of the GitHub page
  1. Clone the fork and create a branch for your feature
git clone https://github.com/<USERNAME>/qcbc.git
cd qcbc
git checkout -b cool-new-feature
  1. Make changes, add files, and commit
# make changes, add files, and commit them
git add path/to/file1.py path/to/file2.py
git commit -m "I made these changes"
  1. Push changes to GitHub
git push origin cool-new-feature
  1. Submit a pull request

If you are unfamilar with pull requests, you find more information on the GitHub help page.