How to add an L1 penalty to the loss function for Neural ODEs?

caiwenxuan 注册会员
6 days ago

I've been messing with this, and looking at some other NODE implementations (this one in particular) and have adjusted my cost function so that it is:

function cost_fnct(param)
   prob = ODEProblem(model, u0, tspan, param)
   prediction = Array(concrete_solve(prob, Tsit5(), p = param, saveat = trange))

   loss = Flux.mae(prediction, data)
   penalty = sum(abs, param)
   loss + lambda*penalty

where lambda is the tuning parameter, and using the definition that the L1 penalty is the sum of the absolute value of the parameters. Then, for training:

lambda = 0.01
resinit = DiffEqFlux.sciml_train(cost_fnct, p, ADAM(), maxiters = 3000)
res = DiffEqFlux.sciml_train(cost_fnct, resinit.minimizer, BFGS(initial_stepnorm = 1e-5))

where p is initial just my parameter "guesses", i.e., a vector of ones with the same length as the number of parameters I am attempting to fit.

If you're looking at the first link I had in the original post (here), you can redefine the loss function to add this penalty term and then define lambda before the callback function and subsequent training:

lambda = 0.01
callback_func = function ()
     loss_value = cost_fnct()
     println("Loss: ", loss_value)
     println("\nLearned parameters: ", p)
fparams = Flux.params(p)
Flux.train!(cost_fnct, fparams, data, optimizer, cb = callback_func);

None of this, of course, includes any sort of cross-validation and tuning parameter optimization! I'll go ahead and accept my response to my question because it's my understanding that unanswered questions get pushed to encourage answers, and I want to avoid clogging the tag, but if anyone has a different solution, or wants to comment, please feel free to go ahead and do so.

About the Author

Question Info

Publish Time
6 days ago
Update Time
6 days ago