-
Notifications
You must be signed in to change notification settings - Fork 14
sequence command
Nacho edited this page Jun 30, 2015
·
1 revision
The 'sequence' command allows you to process FastQ sequence files both in a local scenario or in a Hadoop cluster.
Assuming you are in the hpg-bigdata folder, type the following command to see the available sequence sub-commands for the Hadoop scenario:
$ build/bin/hpg-bigdata.sh sequence
Usage: hpg-bigdata.sh sequence <subcommand> [options]
Subcommands:
convert Converts FastQ files to different big data formats such as Avro
stats Calculates different stats from sequencing data
For a local scenario, use the script hpg-bigdata-local.sh:
$ build/bin/hpg-bigdata-local.sh sequence
Usage: hpg-bigdata-local.sh sequence <subcommand> [options]
Subcommands:
convert Converts FastQ files to different big data formats such as Avro
Converts FastQ files to different big data formats such as Avro according to the GA4GH schema models.
Hadoop scenario:
$ build/bin/hpg-bigdata.sh sequence convert -h
Usage: hpg-bigdata.sh sequence convert [options]
Options:
-x, --compression STRING Accepted values: snappy, deflate, bzip2, xz, null. Default: snappy [snappy]
-L, --log-level STRING Set the level log, values: debug, info, warning, error, fatal [info]
-h, --help This parameter prints this help [false]
--conf STRING Set the configuration file [null]
-v, --verbose BOOLEAN This parameter set the level of the logging [false]
* -i, --input STRING HDFS input file in FastQ format [null]
* -o, --output STRING HDFS output file to store the FastQ sequences according to the GA4GH/Avro model [null]
Example:
$ hadoop fs -mkdir /test
$ hadoop fs -copyFromLocal build/data/test.fq /test
$ hadoop fs -ls /test
Found 1 items
-rw-r--r-- 1 jtarraga supergroup 29290 2015-06-30 15:52 /test/test.fq
$ hadoop fs -mkdir /out
$ build/bin/hpg-bigdata.sh sequence convert -i /test/test.fq -o /out/test.fq.avro
...
...
$ hadoop fs -ls /out/test.fq.avro
Found 2 items
-rw-r--r-- 1 jtarraga supergroup 0 2015-06-30 15:54 /out/test.fq.avro/_SUCCESS
-rw-r--r-- 1 jtarraga supergroup 9912 2015-06-30 15:54 /out/test.fq.avro/part-r-00000.avro
Local scenario:
$ build/bin/hpg-bigdata-local.sh sequence convert -h
Usage: hpg-bigdata-local.sh sequence convert [options]
Options:
--conf STRING Set the configuration file [null]
-x, --compression STRING Accepted values: snappy, deflate, bzip2, xz, null. Default: snappy [snappy]
-v, --verbose BOOLEAN This parameter set the level of the logging [false]
-h, --help This parameter prints this help [false]
* -i, --input STRING Local input file in FastQ format [null]
-L, --log-level STRING Set the level log, values: debug, info, warning, error, fatal [info]
* -o, --output STRING Local output file to store the FastQ sequences according to the GA4GH/Avro model [null]
Example:
$ mkdir /tmp/out
$ build/bin/hpg-bigdata-local.sh sequence convert -i build/data/test.fq -o /tmp/out/test.fq.avro
$ ls -ltr /tmp/out/test.fq.avro
-rw-rw-r-- 1 jtarraga jtarraga 9924 jun 30 16:00 /tmp/out/test.fq.avro
Hadoop scenario:
$ build/bin/hpg-bigdata.sh sequence stats -h
Usage: hpg-bigdata.sh sequence stats [options]
Options:
* -o, --output STRING Local output directory to save stats results in JSON format [null]
* -i, --input STRING HDFS input file containing the FastQ sequences stored according to the GA4GH/Avro model) [null]
-L, --log-level STRING Set the level log, values: debug, info, warning, error, fatal [info]
-h, --help This parameter prints this help [false]
--conf STRING Set the configuration file [null]
-k, --kmers INTEGER Compute k-mers (according to the indicated length) [0]
-v, --verbose BOOLEAN This parameter set the level of the logging [false]
Example:
$ mkdir /tmp/out-fastq-stats
$ build/bin/hpg-bigdata.sh sequence stats -i /out/test.fq.avro/part-r-00000.avro -o /tmp/out-fastq-stats/ --kmers 7
...
...
$ ls -ltr /tmp/out-fastq-stats/
total 8
-rw-r--r-- 1 jtarraga jtarraga 5813 jun 30 16:07 stats.json
$ cat /tmp/out-fastq-stats/stats.json
{"num_reads": 100, "num_A": 3662, "num_T": 3756, "num_G": 2567, "num_C": ...
...