From aae0201685c2b49c1afcf62f57faf989918a32c3 Mon Sep 17 00:00:00 2001 From: Rob Patro Date: Sun, 18 Feb 2024 19:53:06 -0500 Subject: [PATCH] update readme --- README.md | 58 ++----------------------------------------------------- 1 file changed, 2 insertions(+), 56 deletions(-) diff --git a/README.md b/README.md index 1f9a9fd..2c90a21 100644 --- a/README.md +++ b/README.md @@ -1,58 +1,4 @@ # oarfish: Accurate, fast, and versatile transcript quantification from long-read RNA-seq data -`oarfish` is a program, written in `rust`, for quantifying transcript-level expression from long-read (i.e. Oxford nanopore cDNA and direct RNA and PacBio) sequencing technologies. `oarfish` currently requires a sample of sequencing reads aligned to the *transcriptome* (not the genome). It handles multi-mapping reads through the use of probabilistic allocation via an expectation-maximization (EM) algorithm. Further, it employs many filters to help discard alignments that may reduce quantification accuracy. Currently, the set of filters applied in `oarfish` are directly derived from the [`NanoCount`](https://github.com/a-slide/NanoCount)[^Gleeson] tool; both the filters that exist, and the way their values are set (with the exception of the `--three-prime-clip` filter, which is not set by default in `oarfish` but is in `NanoCount`). - -Additionally, as a core component of `oarfish` we are developing novel coverage models that further improve quantification estimates by taking into account the coverage profile of the long mapped reads over the transcripts. A detailed writeup of these coverage models is currently being drafted. In the meantime, if you make use of `oarfish`, please cite this repository. - -The usage can be provided by passing `-h` at the command line. -``` -accurate transcript quantification from long-read RNA-seq data - -Usage: oarfish [OPTIONS] --alignments --output - -Options: - --quiet be quiet (i.e. don't output log messages that aren't at least warnings) - --verbose be verbose (i.e. output all non-developer logging messages) - -a, --alignments path to the file containing the input alignments - -o, --output location where output quantification file should be written - -t, --threads maximum number of cores that the oarfish can use to obtain binomial probability [default: 1] - -h, --help Print help - -V, --version Print version - -filters: - --filter-group - [possible values: no-filters, nanocount-filters] - -t, --three-prime-clip - maximum allowable distance of the right-most end of an alignment from the 3' transcript end [default: 4294967295] - -f, --five-prime-clip - maximum allowable distance of the left-most end of an alignment from the 5' transcript end [default: 4294967295] - -s, --score-threshold - fraction of the best possible alignment score that a secondary alignment must have for consideration [default: 0.95] - -m, --min-aligned-fraction - fraction of a query that must be mapped within an alignemnt to consider the alignemnt valid [default: 0.5] - -l, --min-aligned-len - minimum number of nucleotides in the aligned portion of a read [default: 50] - -n, --allow-negative-strand - allow both forward-strand and reverse-complement alignments - -coverage model: - --model-coverage apply the coverage model - -b, --bins number of bins to use in coverage model [default: 10] - -EM: - --max-em-iter - maximum number of iterations for which to run the EM algorithm [default: 1000] - --convergence-thresh - maximum number of iterations for which to run the EM algorithm [default: 0.001] - -q, --short-quant - location of short read quantification (if provided) -``` - -The input should be a `bam` format file, with reads aligned using `minimap2` against the _transcriptome_. That is, `oarfish` does not currently handle spliced alignment to the genome. Further, the output alignments should be name sorted (the default order produced by `minimap2` should be fine). - -The output is a tab separated file listing the quantified targets, as well as information about their length and other metadata. The `num_reads` column provides the estimate of the number of reads originating from each target. - - -### References - -[^Gleeson]: Josie Gleeson, Adrien Leger, Yair D J Prawer, Tracy A Lane, Paul J Harrison, Wilfried Haerty, Michael B Clark, Accurate expression quantification from nanopore direct RNA sequencing with NanoCount, Nucleic Acids Research, Volume 50, Issue 4, 28 February 2022, Page e19, [https://doi.org/10.1093/nar/gkab1129](https://doi.org/10.1093/nar/gkab1129) +Briefly, `oarfish` is a program, written in `rust`, for quantifying transcript-level expression from long-read (i.e. Oxford nanopore cDNA and direct RNA and PacBio) sequencing technologies. +For more details on `oarfish`, please refer to the [`oarfish` documentation](https://combine-lab.github.io/oarfish).