Skip to content
GaelVaroquaux edited this page Sep 9, 2011 · 5 revisions

Past sprints

23, 24 August 2011

We are organizing a coding sprint the days before EuroScipy 2011

People and tasks

  • Olivier Grisel: review code (esp. related to Vlad's GSoC), doc improvements, maybe work on finalizing Power Iteration Clustering or the text feature extraction
  • Gael Varoquaux: merging pull requests
  • Vlad Niculae: merge remaining DictionaryLearning code, doc improvements, maybe work on SGD matrix fact. w/ someone?
  • Satra Ghosh: work on the ensemble/tree/random forest (only on the 24th)
  • Brian Holt: tree and random forest code, improve test coverage, doc improvements
  • Bertrand Thirion: reviewing GMM and related stuff or manifold learning (probably 24th only).
  • Ralf Gommers: work on joblib (only 24th, from ~12.00)
  • Vincent Michel: work on bi-clustering, doc improvements, code review.
  • Mathieu Blondel: multi-class reductions (only 24th, GMT+9)
  • Fabian Pedregosa : strong-rules for coordinate descent, grouped lasso or related stuff, py3k support.
  • Alexandre Gramfort : reviewing commits and sending negative comments to harass Fabian while he is away because he kind of likes that
  • Jean Kossaifi
  • Virgile Fritsch (only 24th): working on issues (pairwise distances, incompatibility with scipy 0.8, ...) and pull requests merging.

Places

  • In Paris: at ENS, in the physics department (24 rue Lhomond), probably in some classrooms on the 3rd floor.

Scipy 2011 sprinting: July 15-16

Location At the scipy conference (Austin)

People and tasks

  • Gael Varoquaux: review code, merge
  • Marcel Caraciolo: review code, easyfix issues.
  • David Warde-Farley: review

1st April 2011

Places

People present

Please add skills/interests or planned task, to facilitate the sprint organization and pairing of people on tasks. To share knowledge as much as possible, it would be ideal to have pair-like programming of 2 people on a task, with different skills.

At Logilab, Paris (from 9H to 19H):

  • Gaël Varoquaux: task: code review, pair programming on specific task where needed.
  • Julien Miotte
  • Feth Arezki: could help with coding (w/ the logger?), LaTeX. Interested in learning about scikit.
  • Nelle Varoquaux: task: minibatch k-means
  • Fabian Pedregosa
  • Vincent Michel: task: code review, pair programming. features: ward's clustering.
  • Luis Belmar-Letelier
  • Thouis Jones: task: BallTree cython wrapper, documentation, whatever.

At MIT, Boston:

  • Alexandre Gramfort: task: code review and pair programming
  • Demian Wassermann: task: Gaussian Processes with sparse data
  • Satra Ghosh: task: Ensemble Learning, random forests
  • Nico Pinto
  • Pietro Berkes

At IRC (from around 9am Brasília time (GMT-3):

  • Alexandre Passos: task: dirichlet process mixture of gaussian models (In progress)
  • Vlad Niculae: task: matrix factorization (In progress)
  • Marcel Caraciolo: task: help in docs and bug fixes (beginner in the project).

Paris coding Sprint, 8-9 Sept. 2010

Place:

INRIA research center in Saclay-Ile de France, also in channel #scikit-learn, on irc.freenode.org. Room to be determined.

Some ideas:

  • extend the tutorial with features selection, cross-validation, etc
  • design a sphinx template for the main web page [here http://www.flickr.com/photos/fseoane/4573612893/] is a temptative design, but was not translated into a sphinx template.
  • Group lasso with coordinate descent in GLM module
  • Covariance estimators (Ledoit-Wolf) -> Regularized LDA
  • Add transform in LDA
  • PCA with fit + transform
  • preprocessing routines (center, standardize) with fit transform
  • K-means with Pybrain heuristic
  • Make Pipeline object work for real
  • FastICA

Anything you can think of, such as:

  • Spectral Clustering + manifold learning (MDS/PCA, Isomap, Diffusion maps, tSNE)
  • Canonical Correlation Analysis
  • Kernel PCA
  • Gaussian Process regression

0.4 Coding Sprint, 16 & 17 June 2010

Place:

channel #scikit-learn, on irc.freenode.org. If you do not have an IRC client or behind a firewall, check out http://webchat.freenode.net/

Some ideas:

  • adapt the plotting features from the em module into gmm module.
  • incorporate more datasets : the diabetes from the lars R package, featured datasets from http://archive.ics.uci.edu/ml/datasets.html , etc.
  • anything from the issue tracker.
  • extend the tutorial with features selection, cross-validation, etc
  • profile and improve the performance of the gmm module.
  • submit some new classifier
  • refactor the ann module (artificial neural networks) to conform to the API in the rest of the modules, or submit a new ann module.
  • make it compatible with python3 (shouldn't be hard now that there's a numpy python3 relase)
  • design a sphinx template for the main web page [here http://www.flickr.com/photos/fseoane/4573612893/] is a temptative design, but was not translated into a sphinx template.
  • anything you can think of.

Documentation Week, 14-18 March 2010

Place:

channel #learn, on irc.freenode.org. If you do not have an IRC client or behind a firewall, check out http://webchat.freenode.net/

Possible Tasks:

  • Document our design choices (methods in each class, convention for estimated parameters, etc.). Most of this is in ApiDiscussion.
  • Documentation for neural networks (nonexistent)
  • Examples. We currently only have a few of them. Expand and integrate them into the web page.
  • Write a Tutorial.
  • Write a FAQ.
  • Documentation and Examples for Support Vector Machines. What's in the web is totally outdated. Integrate the documentation from gumpy, see ticket:27 (assigned: Fabian Pedregosa)
  • Review documentation.
  • Customize the sphinx generated html.
  • Create some cool images/logos for the web page.
  • Create some benchmark plots.

Code sprint in Paris, 3 March 2010

Terminated, see http://fseoane.net/blog/2010/scikitslearn-coding-spring-in-paris/

Participants

  • Alexandre Gramfort
  • Olivier Grisel
  • Vincent Michel
  • Fabian Pedregosa
  • Bertrand Thirion
  • Gaël Varoquaux

Goals

Implement a few targeted functionalities for penalized regressions.

Target functionalities

  1. GLMnet
  2. Bayesian Regression (Ridge, ARD)
  3. Univariate feature selection function

Edouard: Most of things we need are already in datamind, the main main issue is to cut the dependance with FFF(nipy)

Extras, if time permits:

  1. LARS

Proposed workflow

Pair programming:

  1. GLMNet (AG, OG)
  2. Bayesian regression (FP, VM)
  3. Feature selection (BT, GV)
  4. LARS: Whoever is finished first.

Place in the repository

  1. I think GLMNet goes well in scikits.learn.glm.

Edouard: The GLM term is confusing: Indeed in GLMNet the G means "generalized", however in neuroimaging people understand "general" which is in fact a linear model

  1. Bayessian regression: scikits.learn.bayes . It's short and explicit.

Edouard: Again the term Bayes might not lead to a clear organization of algorithms.

Past sprints

Paris coding Sprint, 8-9 Sept. 2010

Place:

INRIA research center in Saclay-Ile de France, also in channel #scikit-learn, on irc.freenode.org. Room to be determined.

Some ideas:

  • extend the tutorial with features selection, cross-validation, etc
  • design a sphinx template for the main web page [here http://www.flickr.com/photos/fseoane/4573612893/] is a temptative design, but was not translated into a sphinx template.
  • Group lasso with coordinate descent in GLM module
  • Covariance estimators (Ledoit-Wolf) -> Regularized LDA
  • Add transform in LDA
  • PCA with fit + transform
  • preprocessing routines (center, standardize) with fit transform
  • K-means with Pybrain heuristic
  • Make Pipeline object work for real
  • FastICA

Anything you can think of, such as:

  • Spectral Clustering + manifold learning (MDS/PCA, Isomap, Diffusion maps, tSNE)
  • Canonical Correlation Analysis
  • Kernel PCA
  • Gaussian Process regression

0.4 Coding Sprint, 16 & 17 June 2010

Place:

channel #scikit-learn, on irc.freenode.org. If you do not have an IRC client or behind a firewall, check out http://webchat.freenode.net/

Some ideas:

  • adapt the plotting features from the em module into gmm module.
  • incorporate more datasets : the diabetes from the lars R package, featured datasets from http://archive.ics.uci.edu/ml/datasets.html , etc.
  • anything from the issue tracker.
  • extend the tutorial with features selection, cross-validation, etc
  • profile and improve the performance of the gmm module.
  • submit some new classifier
  • refactor the ann module (artificial neural networks) to conform to the API in the rest of the modules, or submit a new ann module.
  • make it compatible with python3 (shouldn't be hard now that there's a numpy python3 relase)
  • design a sphinx template for the main web page [here http://www.flickr.com/photos/fseoane/4573612893/] is a temptative design, but was not translated into a sphinx template.
  • anything you can think of.

Documentation Week, 14-18 March 2010

Place:

channel #learn, on irc.freenode.org. If you do not have an IRC client or behind a firewall, check out http://webchat.freenode.net/

Possible Tasks:

  • Document our design choices (methods in each class, convention for estimated parameters, etc.). Most of this is in ApiDiscussion.
  • Documentation for neural networks (nonexistent)
  • Examples. We currently only have a few of them. Expand and integrate them into the web page.
  • Write a Tutorial.
  • Write a FAQ.
  • Documentation and Examples for Support Vector Machines. What's in the web is totally outdated. Integrate the documentation from gumpy, see ticket:27 (assigned: Fabian Pedregosa)
  • Review documentation.
  • Customize the sphinx generated html.
  • Create some cool images/logos for the web page.
  • Create some benchmark plots.

Code sprint in Paris, 3 March 2010

Terminated, see http://fseoane.net/blog/2010/scikitslearn-coding-spring-in-paris/

Participants

  • Alexandre Gramfort
  • Olivier Grisel
  • Vincent Michel
  • Fabian Pedregosa
  • Bertrand Thirion
  • Gaël Varoquaux

Goals

Implement a few targeted functionalities for penalized regressions.

Target functionalities

  1. GLMnet
  2. Bayesian Regression (Ridge, ARD)
  3. Univariate feature selection function

Edouard: Most of things we need are already in datamind, the main main issue is to cut the dependance with FFF(nipy)

Extras, if time permits:

  1. LARS

Proposed workflow

Pair programming:

  1. GLMNet (AG, OG)
  2. Bayesian regression (FP, VM)
  3. Feature selection (BT, GV)
  4. LARS: Whoever is finished first.

Place in the repository

  1. I think GLMNet goes well in scikits.learn.glm.

Edouard: The GLM term is confusing: Indeed in GLMNet the G means "generalized", however in neuroimaging people understand "general" which is in fact a linear model

  1. Bayessian regression: scikits.learn.bayes . It's short and explicit.

Edouard: Again the term Bayes might not lead to a clear organization of algorithms.

  1. Feature selection: featsel? selection ? I'm not sure about this one.

AG : maybe univ?

Edouard: Maybe it is to early to decide the structure of the repository during your coding sprint. I think this organization should follow discussion we had we Fabian, Gael and Bertand. Next I tried to synthesize those discussions, however its just a proposition and many things are missing:

If there's code that we want to share and it does not fit into any of these schemes, it's ok to put it into sandbox/ (it does not yet exist)

  1. Feature selection: featsel? selection ? I'm not sure about this one.

AG : maybe univ?

Edouard: Maybe it is to early to decide the structure of the repository during your coding sprint. I think this organization should follow discussion we had we Fabian, Gael and Bertand. Next I tried to synthesize those discussions, however its just a proposition and many things are missing: