- Apache Maven - Version 3.3.9
- Apache Spark - Version 2.1.0
mvn package
The execution of the application is split in two different phases:
- Analysis of the input dataset and generation of the seed graph and the probability distributions of the connections properties
- Data generation with either Barabási–Albert or Kronecker algorithms
$SPARK_HOME/bin/spark-submit csb.jar seed -a $DATA/dataset_01/alert -b $DATA/dataset_01/conn.log
The output will be:
- An aug.log file which represents the join between the input log files
- A set of .ser files which represent the probability distributions of the connections properties
- seed_vertices and seed_edges folders which contain respectively the serialized representation of the vertices and edges of the seed graph
The phase 2 is composed of three separate steps:
- Synthetic graph generation, which is done using either one the algorithms listed below
- Properties generation, which can be skipped with the -x or the --exclude-prop options
- Graph saving using Spark serialization methods (Neo4j serialization is under development)
- Veracity computation of degree, in-degree, out-degree and PageRank metrics
The following will run the Barabási–Albert algorithm with 10 iterations:
$SPARK_HOME/bin/spark-submit csb.jar synth ba -m all 10 0.2
Note: the KronFit algorithm is currently under development, so a static version of the seed matrix (seed.mtx) of the provided dataset is included in the same folder.
The following will run the stochastic Kronecker algorithm with 15 iterations using the seed.mtx matrix:
$SPARK_HOME/bin/spark-submit csb.jar synth kro -m all 15 $DATA/dataset_01/seed.mtx