You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As the example below shows, a user presenting a table for training a model cannot present new data for prediction with a different ordering of the table columns:
N =1000
X = (x1 =rand(Float32, N), x2 =randn(Float32, N), x3 =categorical(rand('a':'c', N)))
y =categorical(bitrand(N))
model = MLJFlux.NeuralNetworkBinaryClassifier(epochs =10, builder=MLJFlux.MLP(; hidden=(5,4)), batch_size =100)
mach =machine(model, X, y)
fit!(mach)
# this errorspredict(mach, (x3 = X.x3, x1 = X.x1, x2 = X.x2))
# this is false!all(predict(mach, (x2 = X.x2, x1 = X.x1, x3 = X.x3)) .≈predict(mach, X))
Here is my response from the original post:
Mmm. I think this kind of implicit assumption - that the columns of tables are ordered, and that they be presented in a consistent order, is everywhere in MLJ, and probably elsewhere. [Transferring this issue to MLJ].
One could either try to allow tables to be presented in any column order, or throw a warning when the original order is violated. Personally, I think the latter would be sufficient. If MLJ had a generic data-front end for dealing with tables, apart from Tables.matrix which dumps the feature names, this could be an easy fix either way. But a lot of interfaces just don't save the feature names.
I'd support some kind of resolution, but it's a big ask to adapt across the ecosystem.
The text was updated successfully, but these errors were encountered:
This is a problem that other users have also made issues about (e.g. #1023, but I think that there are more).
As a user (and as a contributor as well), the fact that the input into an MLJ machine is a Tables.jl-compatible table made me assume that machines would treat it as tabular data, i.e. use column names. It personally caught me off guard that they don't, and I doubt that I'm the only one.
What makes this more confusing is that some MLJ models do use column names, e.g. those in MLJGLMInterface.jl.
I'd support some kind of resolution, but it's a big ask to adapt across the ecosystem.
I see the point - there are a lot of models out there, and requiring them to use column keys is not going to work.
Maybe there could be an extra model trait in MMI of whether or not a model uses column keys, so that an example like the one above can be part of the test suite for those models.
Otherwise there is always FeatureSelector in MLJModels, which is great.
Over at MLJFlux, @tiemvanderdeure has pointed out the following issue that is actually MLJ generic.
As the example below shows, a user presenting a table for training a model cannot present new data for prediction with a different ordering of the table columns:
Here is my response from the original post:
The text was updated successfully, but these errors were encountered: