Skip to content

Running InfoSphere Streams benchmark

Zubair Nabi edited this page Apr 2, 2015 · 10 revisions

Before you begin, create the dataset required by the InfoSphere Streams benchmark: [Create dataset for InfoSphere Streams benchmark](Create dataset for InfoSphere Streams benchmark)

Overview

The StreamsEmailBenchmark project contains the InfoSphere streams application for processing the emails.

Prerequisites

  1. Copy your serialized/compressed dataset (obtained using StreamsPrepareDataset) to StreamsEmailBenchmark/data

Note: Naming convention should be filename0.av to filename<parallelism>.av

For instance, if you want to process two files in parallel, they should be named, filename0.av and filename1.av

Compile

To build the application, go to the root directory of StreamsEmailBenchmark, and type make all PARALLELISM=<parallelism> at the command line.

Execution

To run the application:

  1. Make sure a streams instance is created and started
  2. To submit the job to the streams instance: streamtool submitjob output/Main/Distributed/Main.adl -P filename=<input_file_name> -P windowTime=<flush_interval_for_metrics_in_secs> -P printWindowMetrics=<yes_or_no>

Results Collection

  • Metrics will be dumped to stdout in case of standalone execution and to the logs in case of distributed execution
  • CPU Time can be obtained by visually inspecting the SPL graph in Streams Studio
Clone this wiki locally