Skip to content
This repository has been archived by the owner on Dec 6, 2023. It is now read-only.

Support for online learning? #186

Open
MattWenham opened this issue Aug 17, 2018 · 3 comments
Open

Support for online learning? #186

MattWenham opened this issue Aug 17, 2018 · 3 comments

Comments

@MattWenham
Copy link

How feasible would it be to implement online learning, e.g. a partial_fit() method, to allow an existing model to be modified with new data? This would also allow for out-of-core learning and streaming applications.

@jcrudy
Copy link
Collaborator

jcrudy commented Aug 17, 2018

@DoctorRad Unfortunately, the nature of MARS makes online learning basically impossible. The reason for this is that the forward pass is a greedy step-wise search for new terms. If you add new data, you have no way of knowing that the earliest terms in your model would be unaffected.

However, it would be possible, in theory, to allow for model fitting to be resumed, and new terms added with new data, after the initial model fit. That might be worth doing in some cases, although eventually you would probably want to fit a new model on your entire data set.

Question: what problem are your trying to solve with online learning? Perhaps I can suggest a workaround, although you might also be better off just using a method that allows for online learning.

@jcrudy
Copy link
Collaborator

jcrudy commented Aug 17, 2018

@DoctorRad Regarding out-of-core learning, it is theoretically possible to build a MARS implementation that operates across a cluster, but it would be a substantial undertaking and there would still need to be some central coordination node doing a good amount of work. Shared memory parallelism is much more feasible, but not implemented in py-earth (except for perhaps some of the BLAS operations, depending on your environment).

@MattWenham
Copy link
Author

@jcrudy Thanks for your feedback. I suspected that it was largely not possible as I couldn't conceive of a way that it could be done, but thought you might have better ideas.

I am currently using py-earth as a tool to help me learn python data science, so it's only really toy problems for now. However, the regression problem I am considering at the moment has a mixture of continuous and ordinal variables with a considerable amount of missing data, which is attracted me to MARS.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants