Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use SQP(Sequential quadratic programming) in scipy for neural network regression optimization?

As title, after training and testing my neural network model in python.

Can I use SQP function in scipy for neural network regression problem optimization?

For example, I am using temperature,humid,wind speed ,these three feature for input,predicting energy usage in some area.

So I use neural network to model these input and output's relationship, now I wanna know some energy usage lowest point, what input feature are(i.e. what temperature,humid,wind seed are).This just example so may sound unrealistic.

Because as far as I know, not so many people just use scipy for neural network optimization. But in some limitation , scipy is the most ideal optimization tool what I have by now(p.s.: I can't use cvxopt).

Can someone give me some advice? I will be very appreciate!

like image 553
Chiao Wei Yeh Avatar asked Sep 05 '25 03:09

Chiao Wei Yeh


1 Answers

Sure, that's possible, but your question is too broad to give a complete answer as all details are missing.

But: SLSQP is not the right tool!

  • There is a reason, NN training is dominated by first-order methods like SGD and all it's variants
    • Fast calculation of gradients and easy to do in mini-batch mode (not paying for the full gradient; less memory)
    • Very different convergence theory for Stochastic-Gradient-Descent which is usually much better for large-scale problems
    • In general: fast iteration speed (e.g. time per epoch) while possibly needing more epochs (for full convergence)
  • NN is unconstrained continuous optimization
    • SLSQP is a very general optimization able to tackle constraints and you will pay for that (performance and robustness)
    • LBFGS is actually the only tool (which i saw) sometimes used to do that (and also available in scipy)
      • It's a bound-constrained optimizer (no general constraints as SLSQP)
      • It approximates the inverse-hessian and therefore memory-usage is greatly reduced compared to BFGS and also SLSQP
    • Both methods are full-batch methods (opposed to the online/minibatch nature of SGD
      • They are also using Line-searches or something similar which results less hyper-parameters to tune: no learning-rates!

I think you should stick to SGD and it's variants.

If you want to go for the second-order approach: learn from sklearn's implementation using LBFGS

like image 158
sascha Avatar answered Sep 07 '25 17:09

sascha