Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_input_dim is unable to calculate a model's input dimension if the the model config's data_input.predictor_columns is empty #316

Open
smithcommajoseph opened this issue Feb 21, 2024 · 4 comments
Assignees

Comments

@smithcommajoseph
Copy link
Collaborator

The re-emergence of this bug has kicked off a new round of work, and here is the current state of things after a productive debugging session with @haneslinger. I think this ticket is more of a conversation starter and we'll implement whatever we agree upon

Background
get_input_dim uses the given config's data_input.predictor_columns as a starting point for its calculations. Therefore, loading a pre-trained model whose config contains empty predictor_columns (or, more specifically, an empty array) leads to mismatched shapes from what the underlying torch model expects and what it receives.

The recommended short-term fix
We should ensure that the configurations of pre-trained models contain predictor columns.

The long-term fix
The team should discuss, and we'll settle on an approach that works for everyone.

@stephen-frank
Copy link
Member

stephen-frank commented Feb 21, 2024

Ok, that makes sense. A couple of questions...

  1. In Wattile 0.2.0 (i.e. for a model that is trained today), does configs["data_input"]["predictor_columns"] get automatically populated/updated when the model is first trained?
  2. Wattile exports predictors_target_config.json here, after training. We discussed having a trained model that has an existing predictors_target_config.json lock the predictor columns for all future predictions to the set used for the initial training. Would that require updating configs["data_input"]["predictor_columns"]? If so, when (and where) should that happen?

In the meantime, it sounds like the fix would be to go through and copy predictor column names from predictors_target_config.json into configs["data_input"]["predictor_columns"], either manually or by having SkySpark do that automatically. Do I have that right?

@stephen-frank
Copy link
Member

So I added the following code block to the SkySpark function that handles prediction and it works!

  // Update configs with stored predictor columns
  session
    .pyExec("configs['data_input']['predictor_columns'] = list(column_specification.values())")

image

I guess that answers the short-term fix. I'll get this fix into the nrelWattileExt codebase tomorrow sometime. (Right now it is running in a local version of the function on the SkySpark server. By design, the local copy overrides what is included in nrelWattileExt.)

I would still like to discuss the long-term fix of how to "lock in" the predictor columns after initial training. We can do it via this thread or at our next meeting.

@smithcommajoseph
Copy link
Collaborator Author

Ok, that makes sense. A couple of questions...

  1. @haneslinger, we discussed this and my impression was that configs['data_input']['predictor_columns'] were populated during training. I can find where we write the config to .json, but I can't locate where we add the predictor_columns to that config we write. I'm probably being dense here, but I don't have a clear answer on this question yet.
  2. Looking at it, yeah, it seems that we could rework some areas and add the column data to configs['data_input']['predictor_columns']. I wonder, would it be worth just using the predictors_target_config.json as a source of truth for the predictor cols rather than duplicating theses data in another file?

@stephen-frank
Copy link
Member

To frame this differently, I think the in longer term we talked about...

  1. configs.json is only used for initial training, after which all the configuration parameters of the model are stored embedded within the model as metadata.
  2. For all activities other than initial training (i.e. re-training with adding more data, validation, prediction), the embedded configuration data should be used. This would include the predictor columns.

How exactly this happens I don't have strong feelings except to say it should not require the end user to have to modify the configuration settings themselves to make everything line up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants