-
Notifications
You must be signed in to change notification settings - Fork 0
Past sprints
We are organizing a coding sprint the days before EuroScipy 2011
- Olivier Grisel: review code (esp. related to Vlad's GSoC), doc improvements, maybe work on finalizing Power Iteration Clustering or the text feature extraction
- Gael Varoquaux: merging pull requests
- Vlad Niculae: merge remaining DictionaryLearning code, doc improvements, maybe work on SGD matrix fact. w/ someone?
- Satra Ghosh: work on the ensemble/tree/random forest (only on the 24th)
- Brian Holt: tree and random forest code, improve test coverage, doc improvements
- Bertrand Thirion: reviewing GMM and related stuff or manifold learning (probably 24th only).
- Ralf Gommers: work on joblib (only 24th, from ~12.00)
- Vincent Michel: work on bi-clustering, doc improvements, code review.
- Mathieu Blondel: multi-class reductions (only 24th, GMT+9)
- Fabian Pedregosa : strong-rules for coordinate descent, grouped lasso or related stuff, py3k support.
- Alexandre Gramfort : reviewing commits and sending negative comments to harass Fabian while he is away because he kind of likes that
- Jean Kossaifi
- Virgile Fritsch (only 24th): working on issues (pairwise distances, incompatibility with scipy 0.8, ...) and pull requests merging.
- In Paris: at ENS, in the physics department (24 rue Lhomond), probably in some classrooms on the 3rd floor.
Location At the scipy conference (Austin)
- Gael Varoquaux: review code, merge
- Marcel Caraciolo: review code, easyfix issues.
- David Warde-Farley: review
- In Paris: at Logilab's (104 boulevard blanqui, Paris) - Metro 6 - Glacière
- In Boston at MIT (36-537: 5th floor of building 36)
- On IRC (#scikit-learn on irc.freenode.net)
Please add skills/interests or planned task, to facilitate the sprint organization and pairing of people on tasks. To share knowledge as much as possible, it would be ideal to have pair-like programming of 2 people on a task, with different skills.
At Logilab, Paris (from 9H to 19H):
- Gaël Varoquaux: task: code review, pair programming on specific task where needed.
- Julien Miotte
- Feth Arezki: could help with coding (w/ the logger?), LaTeX. Interested in learning about scikit.
- Nelle Varoquaux: task: minibatch k-means
- Fabian Pedregosa
- Vincent Michel: task: code review, pair programming. features: ward's clustering.
- Luis Belmar-Letelier
- Thouis Jones: task: BallTree cython wrapper, documentation, whatever.
At MIT, Boston:
- Alexandre Gramfort: task: code review and pair programming
- Demian Wassermann: task: Gaussian Processes with sparse data
- Satra Ghosh: task: Ensemble Learning, random forests
- Nico Pinto
- Pietro Berkes
At IRC (from around 9am Brasília time (GMT-3):
- Alexandre Passos: task: dirichlet process mixture of gaussian models (In progress)
- Vlad Niculae: task: matrix factorization (In progress)
- Marcel Caraciolo: task: help in docs and bug fixes (beginner in the project).
Place:
INRIA research center in Saclay-Ile de France, also in channel #scikit-learn, on irc.freenode.org. Room to be determined.
Some ideas:
- extend the tutorial with features selection, cross-validation, etc
- design a sphinx template for the main web page [here http://www.flickr.com/photos/fseoane/4573612893/] is a temptative design, but was not translated into a sphinx template.
- Group lasso with coordinate descent in GLM module
- Covariance estimators (Ledoit-Wolf) -> Regularized LDA
- Add transform in LDA
- PCA with fit + transform
- preprocessing routines (center, standardize) with fit transform
- K-means with Pybrain heuristic
- Make Pipeline object work for real
- FastICA
Anything you can think of, such as:
- Spectral Clustering + manifold learning (MDS/PCA, Isomap, Diffusion maps, tSNE)
- Canonical Correlation Analysis
- Kernel PCA
- Gaussian Process regression
Place:
channel #scikit-learn, on irc.freenode.org. If you do not have an IRC client or behind a firewall, check out http://webchat.freenode.net/
Some ideas:
- adapt the plotting features from the em module into gmm module.
- incorporate more datasets : the diabetes from the lars R package, featured datasets from http://archive.ics.uci.edu/ml/datasets.html , etc.
- anything from the issue tracker.
- extend the tutorial with features selection, cross-validation, etc
- profile and improve the performance of the gmm module.
- submit some new classifier
- refactor the ann module (artificial neural networks) to conform to the API in the rest of the modules, or submit a new ann module.
- make it compatible with python3 (shouldn't be hard now that there's a numpy python3 relase)
- design a sphinx template for the main web page [here http://www.flickr.com/photos/fseoane/4573612893/] is a temptative design, but was not translated into a sphinx template.
- anything you can think of.
Place:
channel #learn, on irc.freenode.org. If you do not have an IRC client or behind a firewall, check out http://webchat.freenode.net/
Possible Tasks:
- Document our design choices (methods in each class, convention for estimated parameters, etc.). Most of this is in ApiDiscussion.
- Documentation for neural networks (nonexistent)
- Examples. We currently only have a few of them. Expand and integrate them into the web page.
- Write a Tutorial.
- Write a FAQ.
- Documentation and Examples for Support Vector Machines. What's in the web is totally outdated. Integrate the documentation from gumpy, see ticket:27 (assigned: Fabian Pedregosa)
- Review documentation.
- Customize the sphinx generated html.
- Create some cool images/logos for the web page.
- Create some benchmark plots.
Terminated, see http://fseoane.net/blog/2010/scikitslearn-coding-spring-in-paris/
- Alexandre Gramfort
- Olivier Grisel
- Vincent Michel
- Fabian Pedregosa
- Bertrand Thirion
- Gaël Varoquaux
Goals
Implement a few targeted functionalities for penalized regressions.
Target functionalities
- GLMnet
- Bayesian Regression (Ridge, ARD)
- Univariate feature selection function
Edouard: Most of things we need are already in datamind, the main main issue is to cut the dependance with FFF(nipy)
Extras, if time permits:
- LARS
Proposed workflow
Pair programming:
- GLMNet (AG, OG)
- Bayesian regression (FP, VM)
- Feature selection (BT, GV)
- LARS: Whoever is finished first.
Place in the repository
- I think GLMNet goes well in scikits.learn.glm.
Edouard: The GLM term is confusing: Indeed in GLMNet the G means "generalized", however in neuroimaging people understand "general" which is in fact a linear model
- Bayessian regression: scikits.learn.bayes . It's short and explicit.
Past sprints
Place:
INRIA research center in Saclay-Ile de France, also in channel #scikit-learn, on irc.freenode.org. Room to be determined.
Some ideas:
- extend the tutorial with features selection, cross-validation, etc
- design a sphinx template for the main web page [here http://www.flickr.com/photos/fseoane/4573612893/] is a temptative design, but was not translated into a sphinx template.
- Group lasso with coordinate descent in GLM module
- Covariance estimators (Ledoit-Wolf) -> Regularized LDA
- Add transform in LDA
- PCA with fit + transform
- preprocessing routines (center, standardize) with fit transform
- K-means with Pybrain heuristic
- Make Pipeline object work for real
- FastICA
Anything you can think of, such as:
- Spectral Clustering + manifold learning (MDS/PCA, Isomap, Diffusion maps, tSNE)
- Canonical Correlation Analysis
- Kernel PCA
- Gaussian Process regression
Place:
channel #scikit-learn, on irc.freenode.org. If you do not have an IRC client or behind a firewall, check out http://webchat.freenode.net/
Some ideas:
- adapt the plotting features from the em module into gmm module.
- incorporate more datasets : the diabetes from the lars R package, featured datasets from http://archive.ics.uci.edu/ml/datasets.html , etc.
- anything from the issue tracker.
- extend the tutorial with features selection, cross-validation, etc
- profile and improve the performance of the gmm module.
- submit some new classifier
- refactor the ann module (artificial neural networks) to conform to the API in the rest of the modules, or submit a new ann module.
- make it compatible with python3 (shouldn't be hard now that there's a numpy python3 relase)
- design a sphinx template for the main web page [here http://www.flickr.com/photos/fseoane/4573612893/] is a temptative design, but was not translated into a sphinx template.
- anything you can think of.
Documentation Week, 14-18 March 2010
Place:
channel #learn, on irc.freenode.org. If you do not have an IRC client or behind a firewall, check out http://webchat.freenode.net/
Possible Tasks:
- Document our design choices (methods in each class, convention for estimated parameters, etc.). Most of this is in ApiDiscussion.
- Documentation for neural networks (nonexistent)
- Examples. We currently only have a few of them. Expand and integrate them into the web page.
- Write a Tutorial.
- Write a FAQ.
- Documentation and Examples for Support Vector Machines. What's in the web is totally outdated. Integrate the documentation from gumpy, see ticket:27 (assigned: Fabian Pedregosa)
- Review documentation.
- Customize the sphinx generated html.
- Create some cool images/logos for the web page.
- Create some benchmark plots.
Terminated, see http://fseoane.net/blog/2010/scikitslearn-coding-spring-in-paris/
- Alexandre Gramfort
- Olivier Grisel
- Vincent Michel
- Fabian Pedregosa
- Bertrand Thirion
- Gaël Varoquaux
Implement a few targeted functionalities for penalized regressions.
- GLMnet
- Bayesian Regression (Ridge, ARD)
- Univariate feature selection function
Edouard: Most of things we need are already in datamind, the main main issue is to cut the dependance with FFF(nipy)
Extras, if time permits:
- LARS
Pair programming:
- GLMNet (AG, OG)
- Bayesian regression (FP, VM)
- Feature selection (BT, GV)
- LARS: Whoever is finished first.
- I think GLMNet goes well in scikits.learn.glm.
Edouard: The GLM term is confusing: Indeed in GLMNet the G means "generalized", however in neuroimaging people understand "general" which is in fact a linear model
- Bayessian regression: scikits.learn.bayes . It's short and explicit.
Edouard: Again the term Bayes might not lead to a clear organization of algorithms.
- Feature selection: featsel? selection ? I'm not sure about this one.
AG : maybe univ?
Edouard: Maybe it is to early to decide the structure of the repository during your coding sprint. I think this organization should follow discussion we had we Fabian, Gael and Bertand. Next I tried to synthesize those discussions, however its just a proposition and many things are missing:
If there's code that we want to share and it does not fit into any of these schemes, it's ok to put it into sandbox/ (it does not yet exist)
- Feature selection: featsel? selection ? I'm not sure about this one.
AG : maybe univ?
Edouard: Maybe it is to early to decide the structure of the repository during your coding sprint. I think this organization should follow discussion we had we Fabian, Gael and Bertand. Next I tried to synthesize those discussions, however its just a proposition and many things are missing: