Add MPS GRUCell for efficiency #1508

HendricksJudy · 2024-10-21T17:06:56Z

Implement MPS-specific GRUCell and update GRU class for efficiency.

GRUCell Implementation:
- Add a new GRUCell class in python/mlx/nn/layers/recurrent.py with MPS-specific optimizations.
- Define the input shape and hidden state shape for the GRUCell.
- Implement the forward pass for the GRUCell with MPS-specific optimizations.
GRU Class Update:
- Update the GRU class in python/mlx/nn/layers/recurrent.py to use the new GRUCell for improved performance on MPS.
- Define the input shape and hidden state shape for the GRU class.
- Implement the forward pass for the GRU class using the GRUCell.
Documentation:
- Update docs/src/python/nn/layers.rst to include the new GRUCell class.
Tests:
- Add tests for the new GRUCell class in python/tests/test_nn.py to ensure correctness and performance improvements.

Love MLX : )

Fixes ml-explore#1500 Implement MPS-specific GRUCell and update GRU class for efficiency. * **GRUCell Implementation:** - Add a new `GRUCell` class in `python/mlx/nn/layers/recurrent.py` with MPS-specific optimizations. - Define the input shape and hidden state shape for the `GRUCell`. - Implement the forward pass for the `GRUCell` with MPS-specific optimizations. * **GRU Class Update:** - Update the `GRU` class in `python/mlx/nn/layers/recurrent.py` to use the new `GRUCell` for improved performance on MPS. - Define the input shape and hidden state shape for the `GRU` class. - Implement the forward pass for the `GRU` class using the `GRUCell`. * **Documentation:** - Update `docs/src/python/nn/layers.rst` to include the new `GRUCell` class. * **Tests:** - Add tests for the new `GRUCell` class in `python/tests/test_nn.py` to ensure correctness and performance improvements. Love MLX : )

thegodone · 2024-10-22T04:31:17Z

This is a cool update, it partially fixes the #1500. Is it possible to made 100% "GPU" full version optimized with MPS core functions similar to the fully cuda version in TF/PyTorch ?

awni · 2024-10-22T04:38:46Z

Maybe I'm missing something, but I'm not following at all why splitting the "cell" out from the GRU layer is faster.. and also where is "MPS" involved? Could you please explain and/or share some benchmarks?

I don't think we need a GRUCell, but if your implementation is faster it should work just as well to keep it in the layer itself.

thegodone · 2024-10-22T05:21:28Z

GRUCell is required from graph neural networks models like AttentiveFP which is one of the best for molecules. pytorch/aten/src/ATen/native/RNN.cpp

HendricksJudy · 2024-10-22T09:02:30Z

This is a cool update, it partially fixes the #1500. Is it possible to made 100% "GPU" full version optimized with MPS core functions similar to the fully cuda version in TF/PyTorch ?

I apologize for my misunderstanding; I will try to make a full version of the 100% "GPU" optimized with MPS core functions.

thegodone · 2024-10-22T10:14:36Z

Could you document it in order to guide me for adding other functions please ?

awni closed this Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MPS GRUCell for efficiency #1508

Add MPS GRUCell for efficiency #1508

HendricksJudy commented Oct 21, 2024

thegodone commented Oct 22, 2024 •

edited

Loading

awni commented Oct 22, 2024

thegodone commented Oct 22, 2024 •

edited

Loading

HendricksJudy commented Oct 22, 2024

thegodone commented Oct 22, 2024

Add MPS GRUCell for efficiency #1508

Add MPS GRUCell for efficiency #1508

Conversation

HendricksJudy commented Oct 21, 2024

thegodone commented Oct 22, 2024 • edited Loading

awni commented Oct 22, 2024

thegodone commented Oct 22, 2024 • edited Loading

HendricksJudy commented Oct 22, 2024

thegodone commented Oct 22, 2024

thegodone commented Oct 22, 2024 •

edited

Loading

thegodone commented Oct 22, 2024 •

edited

Loading