You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is feedback from when trying to implement gru/lstm on CoreML driven by #689.
The biases and weights are stacked together for forward and backward directions when it's bidirectional, similarly activations are passed as an array instead of distinct separate values params.
I think it's more explicit and cleaner to follow the CoreML's design which:
Pass bias & weights for each direction separately when it's bidrectional
Pass activations separately for recurrent_activation, cell_activation, activation.
What do you think?
This also helps to unblock the lstm/gru implementation on CoreML from depending on the outcome of MLConstantOperand discussion.
Thank you for the implementation findings. Let me digest those fields again (ironically, I have a short-term memory when it comes to LSTM's details 😅).
No it actually will do the opposite for DML backend. If the weights/biases are passed separately for each direction then DML backend has to do another concatenation to combine forward and backward weights/biases. (The previous concatenation is to combine bias and recurrent bias) It looks like CoreML prefers separate operands for each direction while DML prefers them as a whole...
The separate weights & bias might help simplify the emulation code, i.e. Chromium TFLite backend, that needs to slice each tensor from the combined one when using it.
hi! @huningxin@fdwr what do you think about this again? This will unblock CoreML implementation for lstm and gru by prevent needing to do constant folding or decomposition.
This is feedback from when trying to implement gru/lstm on CoreML driven by #689.
The biases and weights are stacked together for forward and backward directions when it's bidirectional, similarly activations are passed as an array instead of distinct separate values params.
I think it's more explicit and cleaner to follow the CoreML's design which:
recurrent_activation
,cell_activation
,activation
.What do you think?
This also helps to unblock the lstm/gru implementation on CoreML from depending on the outcome of MLConstantOperand discussion.
@fdwr @huningxin
The text was updated successfully, but these errors were encountered: