Skip to content

TDF Format

Jim Robinson edited this page May 29, 2021 · 16 revisions

The ".tdf" file format is an indexed, compressed binary file format to support the display of numeric data. It was developed simultaneously with, and is similar in purpose, to the UCSC "bigWig" format. For must use cases we now recommend the "bigWig" format is it is widely used across many tools, while "tdf" is limited in use to IGV.

TDF files are created with igvtools. Input formats include wig and bedgraph as well as the IGV specific ".igv" and ".cn" formats. TDF files representing alignment coverage can also be created directly from .bam files using igvtools.

General layout

Header
Tiles
Datasets
Groups
Master Index

Header

Field Description Type Value
magic TDF magic number int TDF\0
version format version number int
indexPosition file position of index section int
indexSize size in bytes of index section int
headerSize size in bytes of the remainder of the header section int
nWindowFunctions number of window functions int
List of window functions
windowFunction window function name string mean median min max percentile2 percentile10 percentile90 percentile98 stddev count density
End list of window functions
trackType string
trackLine UCSC style track line string
nTracks number of tracks int
List of track names (n = nTracks)
trackName name of track string
*End list
genomeId genome identifier (e.g. hg19) string
flags Flags int

Tiles

This section contains tiles of data. A tile represents a region of the genome at a specific zoom (resolution) level. Each tile is referenced by a tile index entry of a dataset.

Field Description Type Value
type tile format string fixedStep variableStep bed bedWithName
Remainder according to type

type=fixedStep

Field Description Type Value
nPositions Number of genomic positions int
start genomic start position (zero based) int
span genomic span for each data point float
List of data points. Track order first. (n= nTracks X nPositions)
datum data value for track and position float
End list

type=variableStep

Field Description Type Value
tileStart genomic position for start of tile int
span genomic span for each data point float
nPositions Number of genomic positions int
List of data start positions
start genomic start position (zero based) int
End list
List of data points. Track order first. (n= nTracks X nPositions)
datum data value for track and position float
End list

type=bed

Field Description Type Value
nPositions Number of genomic positions int
List of data start positions. (n= nPositions)
start genomic start position (zero based) int
End list
List of data end positions. (n= nPositions)
end genomic end position int
End list
nSamples Number of samples. Ignored
List of data points. Track order first. (n= nTracks X nPositions)
datum data value for track and position float
End list
Optional feature names (type = bedWithName)
List of feature names (n=nPositions)
name feature name string
End list

Dataset

A dataset is a container for tiles of data at a given zoom level. Tiles are referenced by file position.

Field Description Type Value
nAttributes Number or attributes int
List of attributes
key Attribute key string
value Attribute value string
End list
dataType ignored string
tileWidth Width of each tile in base pairs float
nTiles Number of tiles int
List of tile entriee
position File position for start of tile long
size Size of tile in bytes int
End list

Group

A Group is a container of key-value pairs, essentially a dictionary. A TDF file can in theory have an arbitrary number of groups referenced from the group index. In practice only a single group, the "root group" with name "/", has been used. The root group contains meta data and statistics for the file as a whole. See below for common attributes.

Field Description Type Value
nAttributes Number or attributes int
List of attributes
key Attribute key string
value Attribute value string
End list

Master Index

Field Description Type Value
nDatasets Number of datasets int
List of datasets
name dataset name string
position dataset file position long
nBytes size of dataset in bytes int
End list
nGroups Number of groups int
List of groups
name name of group string
position file position of group int
nBytes size of group in bytes int
End list

Root group attributes

TDF files created with igvtools typically include the following attributes in the root group (group name = "/"). The data type for all attributes is "string".

Name Description
2nd Percentile 2nd percentile value of all data in this file
10th Percentile
90th Percentile
98th Percentile
Maximum
Mean
Median
Minimum
chromosomes Comma delimited list of all chromosomes/contig/sequence names in this file
maxZoom The maximum pre-computed zoom level
totalCount For alignment coverage files only - total number of alignments.
Clone this wiki locally