You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+11-13
Original file line number
Diff line number
Diff line change
@@ -1,27 +1,24 @@
1
-
pcap2seq
1
+
Overview
2
2
========
3
3
4
-
Converts pcap files to Hadoop sequence files.
5
-
Pcap is a binary file format that stores network traffic capture (using tcpdump or wireshark). The pcap format consists of all the captured packets (up to a certain length) plus packet headers.
4
+
Converts [pcap files](http://wiki.wireshark.org/Development/LibpcapFileFormat) to Hadoop Sequence files.
6
5
7
-
Processing pcap files directly with Hadoop is inefficent since pcap files are not splittable, so a single hadoop worker will work on a single file even if the fill spans multiple blocks.
6
+
Processing pcap files with Hadoop MapReduce is inefficent since pcap files are not splittable, so a single hadoop worker processes the whole pcap file even if the file spans multiple blocks.
8
7
9
-
Converting pcap to sequence file format creates a splittable file that can be processed using multiple hadoop workers.
8
+
Converting pcap to sequence file format creates a splittable and compressable file that can be processed using multiple hadoop workers.
10
9
11
-
For more info about pcap file format : http://wiki.wireshark.org/Development/LibpcapFileFormat
12
10
13
11
Build
14
12
========
15
-
The project can be built with gradle.
13
+
The project requires [gradle](http://www.gradle.org/downloads)
16
14
To build it, clone the repository then run :
17
-
18
-
gradle clean jar
19
-
15
+
```
16
+
gradle clean jar
17
+
```
20
18
Execute
21
19
========
22
20
The build process creates a jar file in build/libs/
23
21
24
-
25
22
Run the jar using hadoop binary with three arguments :
26
23
27
24
1 - input pcap file (A local file on the machine)
@@ -32,8 +29,9 @@ Run the jar using hadoop binary with three arguments :
32
29
For no compression set this argument to 'none'
33
30
34
31
Example :
35
-
36
-
hadoop jar pcap2seq-1.2.jar file.pcap file.seq org.apache.hadoop.io.compress.BZip2Codec
32
+
```
33
+
hadoop jar pcap2seq-1.2.jar file.pcap file.seq org.apache.hadoop.io.compress.BZip2Codec
34
+
```
37
35
38
36
Converts file.pcap to file.seq with block level compression using GZIP. The output file will be stored in HDFS.
0 commit comments