Skip to content

tinyield/parallel-sequences-benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Glossary

  1. Overview
  2. Usage
  3. Benchmarks Overview
    1. All Match
    2. Distinct
    3. First
    4. FlatMap and Reduce
    5. SlowSin
    6. Max
  4. Performance Comparison

Overview

In this document you'll find information on how to use the project in terms of how to run it in your local machine, a brief description of each benchmark's pipeline, and a performance comparison in performance increase relative to the sequential version of the algorithm.

Usage

To run these benchmarks on you local machine just run:

mvn clean install

And then:

java -jar target/benchmarks.jar

Benchmarks Overview

All Match - allMatch(sequence, predicate)

Benchmarks the allMatch() operation when performed sequentially or with any number of threads.

Collection Sizes: [5,50,500,5000,50000,500000,5000000,50000000] Active Processor Count: [1,2,4,8]

Pipeline:

sequenceOfEvenIntegers -> allMatch(isEven)

Distinct - distinct(sequence)

Benchmarks the distinct operation when performed sequentially or with any number of threads.

Collection Sizes: [5,50,500,5000,50000,500000] Active Processor Count: [1,2,4,8]

Pipelines:

  • Distinct:
sequenceOfEvenIntegers -> distinct() -> count()
  • Distinct Parallel Collectors:
sequenceOfEvenIntegers -> distinct() -> collect() -> size()

First - first(sequence, predicate)

Benchmarks the usage of the findFirst() operator. This benchmark was run with two types of Sequences:

  1. One where the match would be found in the middle.
  2. One where the match would be found in the last element.

Collection Sizes: [5,50,500,5000,50000,500000,5000000,50000000] Active Processor Count: [1,2,4,8]

Pipeline:

sequenceOfAllButOneEvenIntegers -> filter(isOdd) -> findFirst()

FlatMap and Reduce

Benchmarks the flatmap and the reduce operations in conjunction, when performed sequentially or with any number of threads.

Collection Sizes: [5,50,500,5000,50000,500000,5000000,50000000] Active Processor Count: [1,2,4,8]

sequenceOfSequencesOfIntegers -> flatmap() -> reduce(sum)

SlowSin

Benchmarks a private operation of Apache Commons Math 3.6 in the class "FastMathCalc". The operation is called slowSin and for x between 0 and pi/4 it computes sine using Taylor expansion.

Collection Sizes: [5,50,500,5000,50000,500000,5000000,50000000] Active Processor Count: [1,2,4,8]

sequenceOfSequencesOfIntegers -> mapToDouble(slowSin) -> reduce(max)

Max

Benchmarks the use of the operation reduce in conjunction with Integer::max, using as sources Arrays.stream() and IntStream.of.

Collection Sizes: [5,50,500,5000,50000,500000,5000000,50000000] Active Processor Count: [1,2,4,8]

sourceOfIntegers -> reduce(max)

Performance Comparison

The results presented in the following table are the gain (or loss) attained bye using Stream.parallel() when comparing to sequential processing of the same benchmark using the Stream API. In other words, the value of sequential processing for all benchmarks is 1.0, and all the results presented here are in relation to that basis.

Benchmark Time Complexity Threads 5 Elements 50 Elements 500 Elements 5000 Elements 50000 Elements 500000 Elements
All Match Linear 2 -1,0 -0,9 -0,7 -0,5 0,4 0,7
4 -1,0 -1,0 -0,9 -0,5 0,7 1,9
8 -1,0 -1,0 -1,0 -0,7 0,7 2,8
Distinct Linear 2 -0,9 -0,9 -0,9 -0,8 -0,7 -0,9
4 -1,0 -1,0 -0,8 -0,7 -0,7 -0,8
8 -1,0 -1,0 -0,8 -0,7 -0,6 -0,8
Distinct & Collect Linear 2 -1,0 -0,9 -0,8 -0,8 -0,7 -0,7
4 -1,0 -0,9 -0,8 -0,7 -0,6 -0,6
8 -1,0 -1,0 -0,8 -0,7 -0,6 -0,6
First In the Middle Linear 2 -1,0 -1,0 -0,9 -0,8 -0,5 -0,4
4 -1,0 -1,0 -1,0 -0,8 -0,3 0,1
8 -1,0 -1,0 -1,0 -0,8 -0,3 0,3
First In the End Linear 2 -1,0 -0,9 -0,8 -0,7 0,0 0,2
4 -1,0 -1,0 -0,9 -0,7 0,5 1,1
8 -1,0 -1,0 -1,0 -0,8 0,5 1,3
Flatmap & Reduce Linear 2 -0,9 -0,7 -0,5 0,0 0,2 0,2
4 -1,0 -0,9 -0,6 0,2 0,2 0,3
8 -1,0 -1,0 -0,7 0,1 0,2 0,3
ArraysMax Linear 2 -0,9 -0,9 -0,6 -0,2 0,3 0,1
4 -1,0 -1,0 -0,8 -0,1 0,4 0,2
8 -1,0 -1,0 -0,9 -0,3 0,4 0,2
IntStream Max Linear 2 -1,0 -0,9 -0,7 -0,4 0,0 0,3
4 -1,0 -1,0 -0,9 -0,7 0,1 0,5
8 -1,0 -1,0 -1,0 -0,8 0,0 0,6
Slow Sin Linear 2 -0,6 -0,2 0,7 1,0 1,0 1,0
4 -0,9 -0,2 1,5 2,2 2,3 2,2
8 -0,9 -0,4 1,7 3,3 3,5 3,6

These benchmarks clearly show that Stream.parallel() is not always advantageous, it becomes advantageous the more elements we are processing, the more computationally heavy processing an element is and the more threads we have, although it also seems to have a threshold from which point it start to be an advantage.

After further investigating this point we came across an article by Brian Goetz titled "Optimize stream pipelines for parallel processing" , namely the section about "The NQ Model", where N is the number of data elements, and Q is the amount of work performed per element. This section describes how we are more likely to see performance gains due to parallelization the larger the product of NQ, going as far as to say that to see any significant speedups the relation must be NQ > 10000. With that said we didn't observe this relation in pipelines where the computational cost per element was too low, in other words, although it is true that N*Q > 10000, either the threshold is too low for some pipelines, or we are missing some minimum values for N and for Q.

In regards to the number of threads, we did not see the speedup occurring with lower values of N*Q with the increment of threads, we did however see a larger speedup the more threads we had working on the pipeline.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages