-
Notifications
You must be signed in to change notification settings - Fork 23
assemble_tag_contigs
assemble_tag_contigs assembles overlapping BED type entries into tag contigs if the clone count (assumed to be appended to the Q_ID field) of each entry is higher than the times the entry was mapped to the genome (this information is assumed to be the SCORE field). A tag contig score is calculated as the highest density of overlapping tags (assuming that these tags are uniquely mapping):
tags ----------- Q_ID: ID_2
------------- Q_ID: ID_3
--------------- Q_ID: ID_1
------------ Q_ID: ID_2
tag contig =============================
scores 22555566666444411333322222222
tag contig score = 6
Thus, tags are only assembled into tag contigs if their expression level is higher than their repetitiveness; a tag sequenced 10 times, i.e. with a clone count of 10, will only be considered if was found in the genome fewer than 10 times, i.e. have a SCORE below 10.
A forthrunning number prefixed with TC is added as Q_ID for the resulting tag contigs.
... | assemble_tag_contigs [options]
[-? | --help] # Print full usage description.
[-C | --check] # Check the integrety of the records.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following BED entries in the file test.bed
:
chr1 172975115 172975152 ID_3 10 +
chr1 134930538 134930571 ID_282 6 +
chr1 134930538 134930574 ID_934 6 +
chr1 173041573 173041606 ID_85 7 +
chr1 173041573 173041617 ID_12 5 +
chr1 173032543 173032573 ID_5 13 +
chr1 149524795 149524851 ID_982 593 +
chr1 149524796 149524852 ID_982 593 +
chr1 149524797 149524853 ID_982 593 +
chr1 149524798 149524854 ID_982 593 +
We can read in these entries with read_bed and then assembles these to tag contigs with assemble_tag_contigs and output the tag contigs in BED format with write_bed:
read_bed -i test.bed | assemble_tag_contigs | write_bed -x
chr1 134930538 134930574 TC00000000 202 +
chr1 149524795 149524854 TC00000001 4 +
chr1 173041573 173041617 TC00000002 14 +
Martin Asser Hansen - Copyright (C) - All rights reserved.
November 2008
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
assemble_tag_contigs is part of the Biopieces framework.