Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelism of FeatureUnion for xarray_filters #22

Open
PeterDSteinberg opened this issue Oct 11, 2017 · 0 comments
Open

Parallelism of FeatureUnion for xarray_filters #22

PeterDSteinberg opened this issue Oct 11, 2017 · 0 comments
Assignees

Comments

@PeterDSteinberg
Copy link
Contributor

I just closed issue #10 based on parameterizing chained MLDataset transformations, deferring the FeatureUnion discussion there to this separate issue.

  • FeatureUnion in scikit-learn is an transformer that uses the scikit-learn parallelism (within one machine) to run a transform for each column of a feature matrix.
  • dask_searchcv has FeatureUnion based on dask.distributed (single- or multi-node parallelism) that follows the same usage patterns.
  • FeatureUnion an important relative to elm / xarray_filters goals because most of the rest of our parallelism relates to tools for multiple models where a Pipeline-like instance is the embarassingly parallel task being automated. Some important workflows for our climate science and satellite imagery use cases may be slow in the processing of each column step(s) where FeatureUnion can speed things up, e.g. a Pipeline with a histogram or Gaussian process on each column individually as a preprocessing step.
  • Also note that FeatureUnion is associated with scikit-learn and generally people think of it then in ML contexts, but the parallelism approach to FeatureUnion also has benefits outside of ML, e.g. preprocessing each column of a large array before visualization or summary stats. This is a documentation need for us in however we wrap FeatureUnion in xarray_filters/elm: make sure this it is explained for usage in- or outside of ML contexts.
@PeterDSteinberg PeterDSteinberg changed the title How does dask_searchcv's parallelism of FeatureUnion for xarray_filters Parallelism of FeatureUnion for xarray_filters Oct 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants