Skip to content
GaelVaroquaux edited this page Jul 12, 2011 · 5 revisions

Past sprints

1st April 2011


People present

Please add skills/interests or planned task, to facilitate the sprint organization and pairing of people on tasks. To share knowledge as much as possible, it would be ideal to have pair-like programming of 2 people on a task, with different skills.

At Logilab, Paris (from 9H to 19H):

  • Gaël Varoquaux: task: code review, pair programming on specific task where needed.
  • Julien Miotte
  • Feth Arezki: could help with coding (w/ the logger?), LaTeX. Interested in learning about scikit.
  • Nelle Varoquaux: task: minibatch k-means
  • Fabian Pedregosa
  • Vincent Michel: task: code review, pair programming. features: ward's clustering.
  • Luis Belmar-Letelier
  • Thouis Jones: task: BallTree cython wrapper, documentation, whatever.

At MIT, Boston:

  • Alexandre Gramfort: task: code review and pair programming
  • Demian Wassermann: task: Gaussian Processes with sparse data
  • Satra Ghosh: task: Ensemble Learning, random forests
  • Nico Pinto
  • Pietro Berkes

At IRC (from around 9am Brasília time (GMT-3):

  • Alexandre Passos: task: dirichlet process mixture of gaussian models (In progress)
  • Vlad Niculae: task: matrix factorization (In progress)
  • Marcel Caraciolo: task: help in docs and bug fixes (beginner in the project).

Paris coding Sprint, 8-9 Sept. 2010


INRIA research center in Saclay-Ile de France, also in channel #scikit-learn, on Room to be determined.

Some ideas:

  • extend the tutorial with features selection, cross-validation, etc
  • design a sphinx template for the main web page [here] is a temptative design, but was not translated into a sphinx template.
  • Group lasso with coordinate descent in GLM module
  • Covariance estimators (Ledoit-Wolf) -> Regularized LDA
  • Add transform in LDA
  • PCA with fit + transform
  • preprocessing routines (center, standardize) with fit transform
  • K-means with Pybrain heuristic
  • Make Pipeline object work for real
  • FastICA

Anything you can think of, such as:

  • Spectral Clustering + manifold learning (MDS/PCA, Isomap, Diffusion maps, tSNE)
  • Canonical Correlation Analysis
  • Kernel PCA
  • Gaussian Process regression

0.4 Coding Sprint, 16 & 17 June 2010


channel #scikit-learn, on If you do not have an IRC client or behind a firewall, check out

Some ideas:

  • adapt the plotting features from the em module into gmm module.
  • incorporate more datasets : the diabetes from the lars R package, featured datasets from , etc.
  • anything from the issue tracker.
  • extend the tutorial with features selection, cross-validation, etc
  • profile and improve the performance of the gmm module.
  • submit some new classifier
  • refactor the ann module (artificial neural networks) to conform to the API in the rest of the modules, or submit a new ann module.
  • make it compatible with python3 (shouldn't be hard now that there's a numpy python3 relase)
  • design a sphinx template for the main web page [here] is a temptative design, but was not translated into a sphinx template.
  • anything you can think of.

Documentation Week, 14-18 March 2010


channel #learn, on If you do not have an IRC client or behind a firewall, check out

Possible Tasks:

  • Document our design choices (methods in each class, convention for estimated parameters, etc.). Most of this is in ApiDiscussion.
  • Documentation for neural networks (nonexistent)
  • Examples. We currently only have a few of them. Expand and integrate them into the web page.
  • Write a Tutorial.
  • Write a FAQ.
  • Documentation and Examples for Support Vector Machines. What's in the web is totally outdated. Integrate the documentation from gumpy, see ticket:27 (assigned: Fabian Pedregosa)
  • Review documentation.
  • Customize the sphinx generated html.
  • Create some cool images/logos for the web page.
  • Create some benchmark plots.

Code sprint in Paris, 3 March 2010

Terminated, see


  • Alexandre Gramfort
  • Olivier Grisel
  • Vincent Michel
  • Fabian Pedregosa
  • Bertrand Thirion
  • Gaël Varoquaux


Implement a few targeted functionalities for penalized regressions.

Target functionalities

  1. GLMnet
  2. Bayesian Regression (Ridge, ARD)
  3. Univariate feature selection function

Edouard: Most of things we need are already in datamind, the main main issue is to cut the dependance with FFF(nipy)

Extras, if time permits:

  1. LARS

Proposed workflow

Pair programming:

  1. GLMNet (AG, OG)
  2. Bayesian regression (FP, VM)
  3. Feature selection (BT, GV)
  4. LARS: Whoever is finished first.

Place in the repository

  1. I think GLMNet goes well in scikits.learn.glm.

Edouard: The GLM term is confusing: Indeed in GLMNet the G means "generalized", however in neuroimaging people understand "general" which is in fact a linear model

  1. Bayessian regression: scikits.learn.bayes . It's short and explicit.

Edouard: Again the term Bayes might not lead to a clear organization of algorithms.

Past sprints

Paris coding Sprint, 8-9 Sept. 2010


INRIA research center in Saclay-Ile de France, also in channel #scikit-learn, on Room to be determined.

Some ideas:

  • extend the tutorial with features selection, cross-validation, etc
  • design a sphinx template for the main web page [here] is a temptative design, but was not translated into a sphinx template.
  • Group lasso with coordinate descent in GLM module
  • Covariance estimators (Ledoit-Wolf) -> Regularized LDA
  • Add transform in LDA
  • PCA with fit + transform
  • preprocessing routines (center, standardize) with fit transform
  • K-means with Pybrain heuristic
  • Make Pipeline object work for real
  • FastICA

Anything you can think of, such as:

  • Spectral Clustering + manifold learning (MDS/PCA, Isomap, Diffusion maps, tSNE)
  • Canonical Correlation Analysis
  • Kernel PCA
  • Gaussian Process regression

0.4 Coding Sprint, 16 & 17 June 2010


channel #scikit-learn, on If you do not have an IRC client or behind a firewall, check out

Some ideas:

  • adapt the plotting features from the em module into gmm module.
  • incorporate more datasets : the diabetes from the lars R package, featured datasets from , etc.
  • anything from the issue tracker.
  • extend the tutorial with features selection, cross-validation, etc
  • profile and improve the performance of the gmm module.
  • submit some new classifier
  • refactor the ann module (artificial neural networks) to conform to the API in the rest of the modules, or submit a new ann module.
  • make it compatible with python3 (shouldn't be hard now that there's a numpy python3 relase)
  • design a sphinx template for the main web page [here] is a temptative design, but was not translated into a sphinx template.
  • anything you can think of.

Documentation Week, 14-18 March 2010


channel #learn, on If you do not have an IRC client or behind a firewall, check out

Possible Tasks:

  • Document our design choices (methods in each class, convention for estimated parameters, etc.). Most of this is in ApiDiscussion.
  • Documentation for neural networks (nonexistent)
  • Examples. We currently only have a few of them. Expand and integrate them into the web page.
  • Write a Tutorial.
  • Write a FAQ.
  • Documentation and Examples for Support Vector Machines. What's in the web is totally outdated. Integrate the documentation from gumpy, see ticket:27 (assigned: Fabian Pedregosa)
  • Review documentation.
  • Customize the sphinx generated html.
  • Create some cool images/logos for the web page.
  • Create some benchmark plots.

Code sprint in Paris, 3 March 2010

Terminated, see


  • Alexandre Gramfort
  • Olivier Grisel
  • Vincent Michel
  • Fabian Pedregosa
  • Bertrand Thirion
  • Gaël Varoquaux


Implement a few targeted functionalities for penalized regressions.

Target functionalities

  1. GLMnet
  2. Bayesian Regression (Ridge, ARD)
  3. Univariate feature selection function

Edouard: Most of things we need are already in datamind, the main main issue is to cut the dependance with FFF(nipy)

Extras, if time permits:

  1. LARS

Proposed workflow

Pair programming:

  1. GLMNet (AG, OG)
  2. Bayesian regression (FP, VM)
  3. Feature selection (BT, GV)
  4. LARS: Whoever is finished first.

Place in the repository

  1. I think GLMNet goes well in scikits.learn.glm.

Edouard: The GLM term is confusing: Indeed in GLMNet the G means "generalized", however in neuroimaging people understand "general" which is in fact a linear model

  1. Bayessian regression: scikits.learn.bayes . It's short and explicit.

Edouard: Again the term Bayes might not lead to a clear organization of algorithms.

  1. Feature selection: featsel? selection ? I'm not sure about this one.

AG : maybe univ?

Edouard: Maybe it is to early to decide the structure of the repository during your coding sprint. I think this organization should follow discussion we had we Fabian, Gael and Bertand. Next I tried to synthesize those discussions, however its just a proposition and many things are missing:

If there's code that we want to share and it does not fit into any of these schemes, it's ok to put it into sandbox/ (it does not yet exist)

  1. Feature selection: featsel? selection ? I'm not sure about this one.

AG : maybe univ?

Edouard: Maybe it is to early to decide the structure of the repository during your coding sprint. I think this organization should follow discussion we had we Fabian, Gael and Bertand. Next I tried to synthesize those discussions, however its just a proposition and many things are missing: