Skip to content

assemble_tag_contigs

Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: assemble_tag_contigs

Description

assemble_tag_contigs assembles overlapping BED type entries into tag contigs if the clone count (assumed to be appended to the Q_ID field) of each entry is higher than the times the entry was mapped to the genome (this information is assumed to be the SCORE field). A tag contig score is calculated as the highest density of overlapping tags (assuming that these tags are uniquely mapping):

tags         ----------- Q_ID: ID_2
               ------------- Q_ID: ID_3
                   --------------- Q_ID: ID_1
                              ------------ Q_ID: ID_2
tag contig   =============================                 
scores       22555566666444411333322222222

tag contig score = 6

Thus, tags are only assembled into tag contigs if their expression level is higher than their repetitiveness; a tag sequenced 10 times, i.e. with a clone count of 10, will only be considered if was found in the genome fewer than 10 times, i.e. have a SCORE below 10.

A forthrunning number prefixed with TC is added as Q_ID for the resulting tag contigs.

Usage

... | assemble_tag_contigs [options]

Options

[-?         | --help]                #  Print full usage description.
[-C         | --check]               #  Check the integrety of the records.
[-I <file!> | --stream_in=<file!>]   #  Read input from stream file  -  Default=STDIN
[-O <file>  | --stream_out=<file>]   #  Write output to stream file  -  Default=STDOUT
[-v         | --verbose]             #  Verbose output.

Examples

Consider the following BED entries in the file test.bed:

chr1   172975115   172975152   ID_3    10   +
chr1   134930538   134930571   ID_282   6   +
chr1   134930538   134930574   ID_934   6   +
chr1   173041573   173041606   ID_85    7   +
chr1   173041573   173041617   ID_12    5   +
chr1   173032543   173032573   ID_5    13   +
chr1   149524795   149524851   ID_982 593   +
chr1   149524796   149524852   ID_982 593   +
chr1   149524797   149524853   ID_982 593   +
chr1   149524798   149524854   ID_982 593   +

We can read in these entries with read_bed and then assembles these to tag contigs with assemble_tag_contigs and output the tag contigs in BED format with write_bed:

read_bed -i test.bed | assemble_tag_contigs | write_bed -x

chr1   134930538   134930574   TC00000000   202   +
chr1   149524795   149524854   TC00000001   4     +
chr1   173041573   173041617   TC00000002   14    +

See also

read_bed

write_bed

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

[email protected]

November 2008

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

assemble_tag_contigs is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally