Update Readme

marouni · marouni · commit 002bbe562bf9 · 2014-11-02T14:30:56.000+01:00
diff --git a/README.md b/README.md
@@ -1,27 +1,24 @@
-pcap2seq
+Overview
 ========
 
-Converts pcap files to Hadoop sequence files.
-Pcap is a binary file format that stores network traffic capture (using tcpdump or wireshark). The pcap format consists of all the captured packets (up to a certain length) plus packet headers. 
+Converts [pcap files](http://wiki.wireshark.org/Development/LibpcapFileFormat) to Hadoop Sequence files.
 
-Processing pcap files directly with Hadoop is inefficent since pcap files are not splittable, so a single hadoop worker will work on a single file even if the fill spans multiple blocks.
+Processing pcap files with Hadoop MapReduce is inefficent since pcap files are not splittable, so a single hadoop worker processes the whole pcap file even if the file spans multiple blocks.
 
-Converting pcap to sequence file format creates a splittable file that can be processed using multiple hadoop workers. 
+Converting pcap to sequence file format creates a splittable and compressable file that can be processed using multiple hadoop workers. 
 
-For more info about pcap file format : http://wiki.wireshark.org/Development/LibpcapFileFormat
 
 Build
 ========
-The project can be built with gradle.
+The project requires [gradle](http://www.gradle.org/downloads)
 To build it, clone the repository then run :
-
-  gradle clean jar
-
+```
+gradle clean jar
+```
 Execute
 ========
 The build process creates a jar file in build/libs/
 
-
 Run the jar using hadoop binary with three arguments :
 
 1 - input pcap file (A local file on the machine)
@@ -32,8 +29,9 @@ Run the jar using hadoop binary with three arguments :
 For no compression set this argument to 'none'
 
 Example :
-
-  hadoop jar pcap2seq-1.2.jar file.pcap file.seq org.apache.hadoop.io.compress.BZip2Codec
+```
+hadoop jar pcap2seq-1.2.jar file.pcap file.seq org.apache.hadoop.io.compress.BZip2Codec
+```
 
 Converts file.pcap to file.seq with block level compression using GZIP. The output file will be stored in HDFS.