Parallel Data Processing in PowerShell

PowerShell v2.0+ module for parallel data processing. Split-Pipeline splits the input, processes parts by parallel pipelines, and outputs results. It may work without collecting the whole input, large or infinite.

Quick Start

Step 1: Get and install.

SplitPipeline is distributed as the PowerShell Gallery module SplitPipeline. In PowerShell 5.0 or with PowerShellGet you can install it by this command:

Install-Module SplitPipeline

SplitPipeline is also available as the NuGet package SplitPipeline. Download it to the current location as the directory "SplitPipeline" by this command:

Invoke-Expression "& {$((New-Object Net.WebClient).DownloadString('https://github.com/nightroman/PowerShelf/raw/master/Save-NuGetTool.ps1'))} SplitPipeline"

and copy the directory SplitPipeline to a PowerShell module directory, see $env:PSModulePath, normally like this:

C:/Users/<User>/Documents/WindowsPowerShell/Modules/SplitPipeline

Alternatively, download it by NuGet tools or directly. In the latter case save it as ".zip" and unzip. Use the package subdirectory "tools/SplitPipeline".

Step 2: In a PowerShell command prompt import the module:

Import-Module SplitPipeline

Step 3: Take a look at help:

help about_SplitPipeline
help -full Split-Pipeline

Step 4: Try these three commands performing the same job simulating long but not processor consuming operations on each item:

1..10 | . {process{ $_; sleep 1 }}
1..10 | Split-Pipeline {process{ $_; sleep 1 }}
1..10 | Split-Pipeline -Count 10 {process{ $_; sleep 1 }}

Output of all commands is the same, numbers from 1 to 10 (Split-Pipeline does not guarantee the same order without the switch Order). But consumed times are different. Let's measure them:

Measure-Command { 1..10 | . {process{ $_; sleep 1 }} }
Measure-Command { 1..10 | Split-Pipeline {process{ $_; sleep 1 }} }
Measure-Command { 1..10 | Split-Pipeline -Count 10 {process{ $_; sleep 1 }} }

The first command takes about 10 seconds.

Performance of the second command depends on the number of processors which is used as the default split count. For example, with 2 processors it takes about 6 seconds.

The third command takes about 2 seconds. The number of processors is not very important for such sleeping jobs. The split count is important. Increasing it to some extent improves overall performance. As for intensive jobs, the split count normally should not exceed the number of processors.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Module/en-US		Module/en-US
Src		Src
Tests		Tests
.build.ps1		.build.ps1
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
Release-Notes.md		Release-Notes.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Data Processing in PowerShell

Quick Start

About

Releases

Packages

Languages

License

Zuldan/SplitPipeline

Folders and files

Latest commit

History

Repository files navigation

Parallel Data Processing in PowerShell

Quick Start

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages