Upcoming events

Sprint planning

Granada 19th-21th Dec 2011

We are organizing a coding sprint after the NIPS 2011 conference.

People, tasks and funding

For this sprint, we are trying to gather funding for contributors to fly in. Please list your name and who is funding your trip.

Gael Varoquaux: Funding: INRIA
Bertrand Thirion: Funding: INRIA
Fabian Pedregosa: Funding: INRIA
Alex Gramfort: Funding: INRIA
Olivier Grisel: Funding: Google + tinyclues
Jake Vanderplas: Funding: Google + tinyclues
David Warde-Farley: Funding: LISA
Gilles Louppe: Funding: University of Liège
Lars Buitinck: Funding: Google + tinyclues
Vlad Niculae: Funding: Google + tinyclues
Andreas Mueller: Funding: Google + tinyclues
Mathieu Blondel: Funding: Google + tinyclues + private
Nicolás Della Penna: private.

Places

Granada University, Instituto de la Paz y los Conflictos, Centro de Documentación Científica de la Universidad de Granada, first floor Campoamor classroom (map) , from 10:00 until 18:00

Contributors might find useful the coding guidelines .

Tasks

Top priorities are merging: pull requests, fixing easyfix issues and improving documentation consistency.

In addition to the tasks listed below, it is useful to consider any issue in this list : https://github.com/scikit-learn/scikit-learn/issues

Merge in Randomized linear models (branch 'randomized_lasso' on GaelVaroquaux's github (Gael Varoquaux and Alex Gramfort working on this)

Easy

Improve test coverage: Run 'make test-coverage' after installing the coverage module, find low hanging fruits to improve coverage, and add tests. Try to test the logic, and not simple aim for augmenting the number of lines covered.
Py3k support: First test joblib on Python3, then scikit-learn. Both generate sources that are python3 compatible, but these have not been tested.

Branch merging

Improving and merging existing pull requests is the number one priority: https://github.com/scikit-learn/scikit-learn/pulls

There is a lot of very good code lying there, it often just needs a small amount of polishing

Not requiring expertise in machine learning

Rationalize images in documentation: we have 56Mo of images generated in the documentation (doc/_build/html/_images). First we should save jpg instead of pngs: it shrinks this directory to 45Mo (not a huge gain, granted). Second there is many times the same file saved. I need to understand what is going on, and fix that.

Affinity propagation using sparse matrices: the affinity propagation algorithm (scikits.learn.cluster.affinity_propagation_) should be able to work on sparse input affinity matrices without converting them to dense. A good implementation should make this efficient on very large data.

Machine learning tasks

Improve the documentation: You understand some aspects machine-learning. You can help making the scikit rock without writing a line of code: http://scikit-learn.org/dev/developers/index.html#documentation. See also Documentation-related issues in the issue tracker.

Text feature extraction (refactoring / API simplification) + hashing vectorizer: Olivier Grisel

Nearest Neighbors Classification/Regression : allowing more flexible Bayesian priors (currently only a flat prior is used); implementing alternative distance metrics: Jake Vanderplas

K-means improvements

Participants: @mblondel

Code clean up
Speed improvements: don't reallocate clusters, track clusters that didn't change, triangular inequality
L1 distance: use L1 distance in e step and median (instead of mean) in m step
Fuzzy K-means: k-means with fuzzy cluster membership (not the same as GMM)
Move argmin and average operators to pairwise module (for L1/L2)
Support chunk size argument in argmin operator
Merge @ogrisel's branch
Add a score function (opposite of the kmeans objective)
Sparse matrices
fit_transform
more output options in transform (hard, soft, dense)

Random projections

Participants: @mblondel

Merge random SVD PR
Merge sparse RP PR
Cython utils for fast and memory-efficient projection

Kernel Approximation

Participants: @amueller

Move to random projection module

Dictionary Learning

Participants: @vene

Fix (document) alpha scaling
Merge SparseCoder pull request
Merge KMeansCoder pull request
Begin work on supervised image classification

Semisupervised learning

Participants: @larsmans

EM algorithm for Naive Bayes
Fix utility code to handle partially labeled data sets

More ambitious/long term tasks

Patch liblinear to have warm restart + LogisticRegressionCV.

Comment (by Fabian): I tried this, take a look here: liblinear fork
Decision Tree (support boosted trees, loss matrix, multivariate regression)
Ensemble classifiers

Comment (by Gilles): I plan to review @pprett PR on Gradient Boosted Trees. I also want to implement parallel tree construction and prediction in the current implementation of forest of trees.
Locality Sensitive Hashing, talk to Brian Holt
Fused Lasso
Group Lasso, talk to Alex Gramfort (by email)
Manifold learning: MDS, t-SNE (talk to DWF)
Bayesian classification (e.g. RVM)
Sparse matrix support in dictionary learning module

Accommodation

Some of us are planning to stay at a Guest House in Granada to reduce the Hotel costs. If you are interested add your name and arrival and departure dates below:

Name	From	To
Olivier Grisel	Dec. 11	Dec. 21
Gael Varoquaux	Dec. 11	Dec. 21
David Warde-Farley	Dec. 18	Dec. 21
Alex Gramfort	Dec. 11	Dec. 21
Jake Vanderplas	Dec. 15	Dec. 22
Bertrand Thirion	Dec 12	Dec. 20
Gilles Louppe	Dec 18	Dec. 21
Mathieu Blondel	Dec 18	Dec. 22
Lars Buitinck	Dec 18	Dec 22
Vlad Niculae	Dec 18	Dec 22
Andreas Mueller	Dec 11	Dec 22
Nicolás Della Penna	Dec 18	Dec 22
(add your name here)

Past sprints

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upcoming events

Sprint planning

Granada 19th-21th Dec 2011

People, tasks and funding

Places

Tasks

Easy

Branch merging

Not requiring expertise in machine learning

Machine learning tasks

K-means improvements

Random projections

Kernel Approximation

Dictionary Learning

Semisupervised learning

More ambitious/long term tasks

Accommodation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally