Replies: 3 comments 3 replies
-
If I’m not mistaken, you’re suggesting introducing the softmax inside the loss function. Right? Feel free to create a PR and see if it works :) What do you think? |
Beta Was this translation helpful? Give feedback.
-
On the other hand, I should update the numpy biaa to the new API. Right? If so, I’ll create a PR and would appreciate it if you could review it. |
Beta Was this translation helpful? Give feedback.
-
Another thing: When the library is mature, I was thinking of writing an article about it and submitting it to a journal. Would you like to join? |
Beta Was this translation helpful? Give feedback.
-
Now we have softmax function to approximately "project" arrays on to the unit simplex, and there is also an entropic descent method which use softmax perform optimization on the unit simplex (https://ieeexplore.ieee.org/abstract/document/10213413).
However, I still feel there is a more straightforward way to use softmax, that is, integrating it into the loss function like what people do to many neural networks. Define the loss function as
where$\sigma$ is the row-wise softmax function. So $A = \sigma(A'), B = \sigma(B')$ , and we only need to optimize $A'$ and $B'$ .
What we need to do is just defining the loss function in PyTorch, creating an optimizer object with$A$ and $B$ , and PyTorch can do everything else. This would enable us to use some adavanced optimizers like SGD and adam to avoid local minima, and optimize both $A$ and $B$ simultaneously and automatically. It could be especially suitable for large datasets.
Beta Was this translation helpful? Give feedback.
All reactions