Skip to content

Commit 002bbe5

Browse files
committed
Update Readme
1 parent 4f15c94 commit 002bbe5

File tree

1 file changed

+11
-13
lines changed

1 file changed

+11
-13
lines changed

README.md

+11-13
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,24 @@
1-
pcap2seq
1+
Overview
22
========
33

4-
Converts pcap files to Hadoop sequence files.
5-
Pcap is a binary file format that stores network traffic capture (using tcpdump or wireshark). The pcap format consists of all the captured packets (up to a certain length) plus packet headers.
4+
Converts [pcap files](http://wiki.wireshark.org/Development/LibpcapFileFormat) to Hadoop Sequence files.
65

7-
Processing pcap files directly with Hadoop is inefficent since pcap files are not splittable, so a single hadoop worker will work on a single file even if the fill spans multiple blocks.
6+
Processing pcap files with Hadoop MapReduce is inefficent since pcap files are not splittable, so a single hadoop worker processes the whole pcap file even if the file spans multiple blocks.
87

9-
Converting pcap to sequence file format creates a splittable file that can be processed using multiple hadoop workers.
8+
Converting pcap to sequence file format creates a splittable and compressable file that can be processed using multiple hadoop workers.
109

11-
For more info about pcap file format : http://wiki.wireshark.org/Development/LibpcapFileFormat
1210

1311
Build
1412
========
15-
The project can be built with gradle.
13+
The project requires [gradle](http://www.gradle.org/downloads)
1614
To build it, clone the repository then run :
17-
18-
gradle clean jar
19-
15+
```
16+
gradle clean jar
17+
```
2018
Execute
2119
========
2220
The build process creates a jar file in build/libs/
2321

24-
2522
Run the jar using hadoop binary with three arguments :
2623

2724
1 - input pcap file (A local file on the machine)
@@ -32,8 +29,9 @@ Run the jar using hadoop binary with three arguments :
3229
For no compression set this argument to 'none'
3330

3431
Example :
35-
36-
hadoop jar pcap2seq-1.2.jar file.pcap file.seq org.apache.hadoop.io.compress.BZip2Codec
32+
```
33+
hadoop jar pcap2seq-1.2.jar file.pcap file.seq org.apache.hadoop.io.compress.BZip2Codec
34+
```
3735

3836
Converts file.pcap to file.seq with block level compression using GZIP. The output file will be stored in HDFS.
3937

0 commit comments

Comments
 (0)