-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Adam optimizer #115
Comments
Milan FYI, I found one Fortran implementation of Adam at |
Thanks for the link to NN_MOD. I'd like to work on Adam first. I think it's easier to implement than batch norm, and it will drive the much needed refactor for optimizers in general (rather than them being hardcorded in the Would you like to contribute the linear layer here as a PR? As I understand it, it's just a dense layer but without an activation. Are you just using a dense layer but with a "no-op" activation function (i.e. |
MIlan,
Thanks for working on this. I'm using Keras for now but I'm slowiy moving up to some very large training sets so it would be nice to be able to use neural fortran.
re. linear activation.
The linear activation function is just a passthrough (basically ReLU without the MAX test). The derivative is just 1.0
ie pure function linear(x) result(y)
y = x
end function
pure function linear prime(x) result(dy)
dy = 1.0
end function
Most of the examples I've seen were folks are doing regression and not classification have used a linear output layer.
Re. batch normalization and dropout. When you feel you are up to doing one or the other I favor batch normalization over dropout. I use it first as a scaling pass prior to calling the first dense layer. I prefer this to doing a MINMAX scaling on the input data prior to training because it keeps the original unscaled data somewhat intact. Here is what I've implemented in Keras for the model to give you an idea of how I stack layers in a Sequential model.
model = Sequential()
if ibatchnorm > 0 :
model.add(BatchNormalization())
for i in range (nhidden):
model.add(Dense(nneurons, activation=actfun))
if ibatchnorm > 0:
model.add(BatchNormalization())
if idropout > 0:
model.add(Dropout(0.1))
model.add(Dense(1, activation='linear'))
What I'm trying to build is a surrogate for CFD calculations to predict loads on buildings. My training datasets will eventually approach 100K or more samples. Some initial tests I've run with 3 inputs (xyz position) of probe locations on the target buildings used around 3600 samples. So far batch normalization (in the sequence show above) using 4 layers with 500 neurons per layer and adam as the optimizer is giving the best results. Works a lot better than using just dropout layers. Also, as reported by other researchers, mixing dropout with batch normalization doesn't help (and can sometimes hurt).
…________________________________________
From: Milan Curcic ***@***.***>
Sent: Thursday, January 19, 2023 12:26 PM
To: modern-fortran/neural-fortran
Cc: Weed, Richard; Mention
Subject: Re: [modern-fortran/neural-fortran] Implement Adam optimizer (Issue #115)
Thanks for the link to NN_MOD. I'd like to work on Adam first. I think it's easier to implement than batch norm, and it will drive the much needed refactor for optimizers in general (rather than them being hardcorded in the network % train subroutine).
Would you like to contribute the linear layer here as a PR? As I understand it, it's just a dense layer but without an activation. Are you just using a dense layer but with a "no-op" activation function (i.e. y = x)?
—
Reply to this email directly, view it on GitHub<https://secure-web.cisco.com/1JjVl30q05HOvXpuULyEbyXhteR4Zz7vqBTpIPPY3TtNCCHL3OJjkMxsv-kE1gJ7Q4HApRK53tM5mzPjnpV9EwvVxcVAJ2KacGArNFSHe4PhLgifh2kQFZ4JUdUFmeGogAVaOJbYC17f2V8R0ag-rmMEKk0rmWkCtx-GJbk6Yy5hCP4VOaYcHCMI88_7VSygD3gwem1EfhT01xp8MysL4kTlgDaTPzbUHt9h-HvfFWPUTaOLo9lbL_6hIg83uhx0fZ_FtSExoM6CX-bZBCxkPYz3alcnIw-_yOdGSgU4hG6OA95KIPOXqPMP98533dFHX/https%3A%2F%2Fgithub.com%2Fmodern-fortran%2Fneural-fortran%2Fissues%2F115%23issuecomment-1397425420>, or unsubscribe<https://secure-web.cisco.com/18eBE1j-xXrIdm6Rn_VEJvBZ52kb4nKLDgB8sttFfG2p4q2UFfQg_tPQxVFIG6-CxjGznhzk4S3Ac2jpQG4DxjdYSPZNBqGnW0p2rTIb7eroPCiyLYvsJd8YFyrpIcHLSmWel1Ig_01cr5ibwDjJus6lFiGrIx0FktyrulsMArDDAWBemwwaQSM8Iis1PmFEvs6R1mMHLEbXwQutSBTAr08nkcQaHbJTI478bv-dqGe-Q5XkNsmambwamzwZ5BCqMQaPm994lv3uutCAh0O3JxGPiINylEjuPTd6FWQ4y_Q8JM2iztWb1JdfEQN_jCy3h/https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABJRI3NMD5NR5BE5RMA7RZTWTGBOBANCNFSM6AAAAAAT6C7MQE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@Spnetic-5, would you like to tackle this one next? I forgot whether you have a WIP implementation of Adam or AdaGrad? |
Yes, Adam optimizer implementation is under progress; I'll make a PR soon. |
Done by #150. |
Proposed by @rweed in #114.
Paper: https://arxiv.org/abs/1412.6980
Currently, the optimizers module is only a stub and the only available optimizer (SGD) is hardcoded in the
network % train
method, with updating of weights progataing all the way down to individual concrete layer implementations. Some refactoring is needed to decouple the weight updates from concrete layer implementations and to allow defining optimizer algorithms in their own concrete types.The text was updated successfully, but these errors were encountered: