Open
Description
This is feedback from when trying to implement gru/lstm on CoreML driven by #689.
The biases and weights are stacked together for forward and backward directions when it's bidirectional, similarly activations are passed as an array instead of distinct separate values params.
I think it's more explicit and cleaner to follow the CoreML's design which:
- Pass bias & weights for each direction separately when it's bidrectional
- Pass activations separately for
recurrent_activation
,cell_activation
,activation
.
What do you think?
This also helps to unblock the lstm/gru implementation on CoreML from depending on the outcome of MLConstantOperand discussion.