Differences between latent and edge energy MLP dimensions and layer dimensions #28

apoletayev · 2023-03-22T14:35:00Z

apoletayev
Mar 22, 2023

I don't think I fully get the differences between two_body_latent_mlp_latent_dimensions, latent_mlp_latent_dimensions, and edge_eng_mlp_latent_dimensions. If I understand the structure of the overall network correctly, the local environments are coded as features of size env_embed_multiplicity. This then feeds into the layers of the network which together have the shape two_body_latent_mlp_latent_dimensions. The first of those is the leading edge, which seems in all examples 2*env_embed_multiplicity, then the hidden layers. The last hidden layer feeds to latent_mlp_latent_dimensions, matching it, and this is where I start not understanding things, and where I have questions.

Here is an example I use:

env_embed_multiplicity: 32
num_layers: 3
two_body_latent_mlp_latent_dimensions: [64, 128, 256, 256]
latent_mlp_latent_dimensions: [256]
edge_eng_mlp_latent_dimensions: [128]

what are the constraints between num_layers, two_body_latent_mlp_latent_dimensions, and latent_mlp_latent_dimensions?
would it be correct to assume that each of two_body_latent_mlp_latent_dimensions corresponds to one layer following the leading edge?
would it be correct to assume that the latent MLP feeds to the final edge energies MLP, but does not need to match dimension?
is the latent MLP one tensorial layer, or multiple layers? (This is prompted by relatively recent discussions and commits to the example configs)

Sorry this is like four in one, I thought they kind of fit together. This all comes of course from being a materials-science user rather than someone who actually puts together pytorch models.

Answered by Linux-cpp-lisp

Mar 22, 2023

Hi @apoletayev ,

Thanks for your interest in our code!

If I understand the structure of the overall network correctly, the local environments are coded as features of size env_embed_multiplicity

Almost--- env_embed_multiplicity is the size of the features in the tensor track, while the various MLP options set the feature size in the scalar track. In an upcoming update this will be named in a much clearer way...

what are the constraints between num_layers, two_body_latent_mlp_latent_dimensions, and latent_mlp_latent_dimensions?

No constraints. num_layers is entirely independent of the width of the scalar features in those layers. In general, the *_mlp_latent_dimensions options specify …

View full answer

Linux-cpp-lisp · 2023-03-22T15:09:18Z

Linux-cpp-lisp
Mar 22, 2023
Maintainer

Hi @apoletayev ,

Thanks for your interest in our code!

If I understand the structure of the overall network correctly, the local environments are coded as features of size env_embed_multiplicity

Almost--- env_embed_multiplicity is the size of the features in the tensor track, while the various MLP options set the feature size in the scalar track. In an upcoming update this will be named in a much clearer way...

what are the constraints between num_layers, two_body_latent_mlp_latent_dimensions, and latent_mlp_latent_dimensions?

No constraints. num_layers is entirely independent of the width of the scalar features in those layers. In general, the *_mlp_latent_dimensions options specify the hidden layers of those MLPs/neural networks, and the sizes of their input/output layers are set separately as the network is being built. For example, if you set

two_body_latent_mlp_latent_dimensions: [64]

the code knows it needs the input width of the two body latent to have the dimension of the initial edge embedding, which could for example be 3 + 3 + 8 = 14 for 3 atom types and 8 radial basis functions. Then the actual MLP that will be built would have dimensions [14, 64].

would it be correct to assume that each of two_body_latent_mlp_latent_dimensions corresponds to one layer following the leading edge?

Yes, each entry in two_body_latent_mlp_latent_dimensions corresponds to a hidden layer in the two body latent MLP/fully connected neural network.

would it be correct to assume that the latent MLP feeds to the final edge energies MLP, but does not need to match dimension?

Yes; the input width of the final edge energy MLP is automatically set as the feature width that is output by the last Allegro layer, which is the output of the final latent MLP, which is set by latent_mlp_latent_dimensions.

is the latent MLP one tensorial layer, or multiple layers? (This is prompted by relatively recent discussions and commits to the example configs)

The MLPs are entirely scalar and correspond to the block labeled "latent" in the paper's schematic. Each Allegro layer contains on tensor product update and arbitrarily many layers of scalar neural networks in the latent, as determined by the hyperparameters discussed above.

1 reply

apoletayev Mar 26, 2023
Author

Thank you for the detailed explanations! I have gone back to the paper and this makes much more sense now.

simonbatzner · 2023-03-22T15:14:11Z

simonbatzner
Mar 22, 2023

Hi @apoletayev, to add to this, make sure to set a 3-hidden-layer MLP for latent_mlp_latent_dimensions: [256], that works much better in practice, e.g. here latent_mlp_latent_dimensions: [256, 256, 256]. Also often only num_layers: 2 is enough.

3 replies

apoletayev Mar 26, 2023
Author

Thank you for the pointer! Yes, increasing the latent_mlp_latent_dimensions gives much tighter errors in my application. I tested num_layers too, but that seems necessary to be kept at 3 for now.

simonbatzner Mar 26, 2023

Awesome, great news.

apoletayev Mar 30, 2023
Author

For what it's worth, and it may be better as a separate discussion, I'm finding that the *_mlp_*_dimensions are not making a difference with out-of-distribution transferability. For example, I am getting the same MAE/RMSE within an meV/Å on force components between latent_mlp_latent_dimensions: [256] and latent_mlp_latent_dimensions: [256,256,256] when they are trained on dataset A and tested on dataset B, where B is larger sized cells of the same material system as A. Increasing r_max helps with the overall error, but it helps both sizes of nets by the same amount.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differences between latent and edge energy MLP dimensions and layer dimensions #28

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Differences between latent and edge energy MLP dimensions and layer dimensions #28

apoletayev Mar 22, 2023

Replies: 2 comments · 4 replies

Linux-cpp-lisp Mar 22, 2023 Maintainer

apoletayev Mar 26, 2023 Author

simonbatzner Mar 22, 2023

apoletayev Mar 26, 2023 Author

simonbatzner Mar 26, 2023

apoletayev Mar 30, 2023 Author

apoletayev
Mar 22, 2023

Replies: 2 comments 4 replies

Linux-cpp-lisp
Mar 22, 2023
Maintainer

apoletayev Mar 26, 2023
Author

simonbatzner
Mar 22, 2023

apoletayev Mar 26, 2023
Author

apoletayev Mar 30, 2023
Author