Types of solvers and when to use them

Main message

For nonlinear and linear systems there are various solvers that can converge your system. The best solver to use is highly problem dependent but the goal of this lecture is to be able to understand a reasonable starting configuration for your systems.

from IPython.display import YouTubeVideo; YouTubeVideo('GxKgda5nMU8', width=1024, height=576)

Brief introduction to solvers

The goal of this lesson isn’t to introduce the math or theory behind solvers, but more to give a brief introduction to the different classes of solvers and when you would practically want to use one over the other. Throughout this lesson, I will reference other resources that go into much more detail about the solvers themselves and how they operate on systems.

I highly recommend you read Section 3.6 from Martins and Ning’s book Engineering Design Optimization before proceeding with this lesson. That section concisely lays out the main differences between solver types. To start, I’m going to share Fig. 3.3 from the book. This figure shows the categories of solvers relevant for MDO. Some solver types, such as the fixed point iteration methods, are useful for solving both linear and nonlinear systems.

One main distinction between solver types I want to highlight is if they are direct or iterative. Direct solvers factorize the linear system in a single step to obtain the solution whereas iterative solvers iterate through state values until convergence. Direct solvers are only available for linear systems. For small problems, direct linear solvers make the most sense, but for larger (and sparse) systems, iterative solvers are computationally beneficial.

Fixed point iterative solvers pass data between systems until convergence is reached. They do not require gradient information. You can imagine component A giving data to component B, and then B gives data to A, and this process repeats. Fixed point solvers are maybe the most intuitive because limited mathematics are required to understand how convergence occurs.

For any solver setup, a hybrid or nested approach is also possible. What I mean by this is that you could perform n iterations of Gauss-Seidel before using a Newton solver to really drive down the residual. Additionally, a nested solver hierarchy might be advantageous based on the amount of coupling and complexity within your model. [[Nested nonlinear solvers]] goes into more detail about those setups. For computationally expensive problems that you solve often, it’s useful to determine the best solver setup, which might include hybrid approaches.

Linear solvers

In practical terms, linear solvers obtain the gradient information for your system by solving the linear system \(Au=b\). Please do not solve this system by finding the inverse of \(A\) because it is computationally expensive. Instead, if you want to solve the system directly, use one of the factorization methods from the solver tree figure. For large and sparse linear systems, iterative methods are usually the best option.

You only need a linear solver if you need gradient information for your system. This would be relevant if you are performing gradient-based optimization or using a nonlinear Newton solver as Newton’s methods require derivative information. When in doubt, include a linear solver in your system to avoid headaches when solving nonlinear systems.

Nonlinear solvers

Solving general nonlinear systems is no easy feat and is the cause for many issues during the MDAO process. Due to the coupling within complicated system models, nonlinear solvers become extremely important to understand and implement correctly. To be clear, if you have any sort of coupling – explicit components that are backwards coupled, or any implicit components – you need a nonlinear solver. If you do not have a solver converging your model, your output data will be wrong as it will not capture any interdisciplinary backwards coupling.

As noted in Section 3.6 of Engineering Design Optimization, nonlinear methods can solve any algebraic system of equations that can be written as \(r(u)=0\). Because of this, nonlinear systems range from simple to solve to extremely challenging. An example of a tough system to converge is the performance of a turbine engine due to the intricate and coupled elements within the system.

In general, I’d suggest starting with a Newton solver and seeing if it can converge your system. If you do not have efficient derivatives, then nonlinear block Gauss-Seidel is probably the best starting point. From there, you can tinker with solver settings based on your model’s convergence.

Solvers within OpenMDAO

OpenMDAO has a wide array of nonlinear and linear solvers implemented for you to use, as listed on this doc page. One of the most beautiful facets of OpenMDAO is that it tracks the gradient information through any linear or nonlinear solver so you don’t have to worry about what including these solvers mean for your gradient computation complexity. Instead of you needing to unroll a while loop and track the derivatives throughout, OpenMDAO does this behind the scenes when converging your systems.

When creating a model in OpenMDAO, assume that you need both nonlinear and linear solvers until you confirm otherwise. Looking at an N2 diagram (more info here) will help you determine if there’s coupling within the model and where solvers should be located to resolve that coupling. It’s easy to accidentally only put a nonlinear solver on your model and not solve the linear system, which leads to bad derivative information. Avoid this problem by either converging your linear system or verifying that a solver is not needed.

Types of nonlinear solvers within OpenMDAO

You have access to Gauss-Seidel and Jacobi nonlinear solvers within OpenMDAO. Generally, you should use Gauss-Seidel because it has better convergence properties than Jacobi. Use Jacobi if you are parallelizing your model across the solve.

OpenMDAO has both Newton and Broyden solvers as well. Broyden is a quasi-Newton approach that approximates the inverse of the Jacobian instead of solving the linear system, which might reduce computational cost in some cases. That being said, you should probably start with a Newton solver if you’re not sure about which to use.

When using a Newton solver, it might be helpful to first perform a single iteration of Gauss-Seidel to help give your model better initial state guesses. OpenMDAO has a built-in option for the Newton solver called solve_subsystems that if set to True will iterate through the system once before beginning the Newton solve, resulting in a hybrid solver approach. I generally recommend setting solve_subsystems=True as it might result in more robust convergence, but it certainly varies problem-to-problem.

Examples using solvers in OpenMDAO

We now show a few example problems that use solvers within OpenMDAO.

Sellar problem example

This example comes from the Sellar doc page and is a simple coupled analytic system. We first construct the two components and the coupled group in OpenMDAO.

import openmdao.api as om
import numpy as np


class SellarDis1(om.ExplicitComponent):
    def setup(self):
        self.add_input('z', val=np.zeros(2))
        self.add_input('x', val=0.)
        self.add_input('y2', val=1.0)
        self.add_output('y1', val=1.0)
        self.declare_partials('*', '*', method='fd')

    def compute(self, inputs, outputs):
        z1 = inputs['z'][0]
        z2 = inputs['z'][1]
        x1 = inputs['x']
        y2 = inputs['y2']

        outputs['y1'] = z1**2 + z2 + x1 - 0.2*y2

class SellarDis2(om.ExplicitComponent):
    def setup(self):
        self.add_input('z', val=np.zeros(2))
        self.add_input('y1', val=1.0)
        self.add_output('y2', val=1.0)
        self.declare_partials('*', '*', method='fd')

    def compute(self, inputs, outputs):
        z1 = inputs['z'][0]
        z2 = inputs['z'][1]
        y1 = inputs['y1']

        outputs['y2'] = y1**.5 + z1 + z2

class SellarMDA(om.Group):
    def setup(self):
        cycle = self.add_subsystem('cycle', om.Group(), promotes=['*'])
        cycle.add_subsystem('d1', SellarDis1(), promotes_inputs=['x', 'z', 'y2'],
                            promotes_outputs=['y1'])
        cycle.add_subsystem('d2', SellarDis2(), promotes_inputs=['z', 'y1'],
                            promotes_outputs=['y2'])

        cycle.set_input_defaults('x', 1.0)
        cycle.set_input_defaults('z', np.array([5.0, 2.0]))

        self.add_subsystem('obj_cmp', om.ExecComp('obj = x**2 + z[1] + y1 + exp(-y2)',
                                                  z=np.array([0.0, 0.0]), x=0.0),
                           promotes=['x', 'z', 'y1', 'y2', 'obj'])

        self.add_subsystem('con_cmp1', om.ExecComp('con1 = 3.16 - y1'), promotes=['con1', 'y1'])
        self.add_subsystem('con_cmp2', om.ExecComp('con2 = y2 - 24.0'), promotes=['con2', 'y2'])

Now, let’s take a look at what happens when you run this system with no solver.

prob = om.Problem()
model = prob.model

model.add_subsystem('sellar_mda', SellarMDA())
prob.setup()

om.n2(prob, outfile="coupled_no_solver.html", display_in_notebook=False)
prob.run_model()
prob.model.list_outputs();
5 Explicit Output(s) in 'model'

varname     val           
----------  --------------
sellar_mda
  cycle
    d1
      y1    [27.8]        
    d2
      y2    [12.27257053] 
  obj_cmp
    obj     [30.80000468] 
  con_cmp1
    con1    [-24.64]      
  con_cmp2
    con2    [-11.72742947]


0 Implicit Output(s) in 'model'

The outputs above are not correct as we have not resolved the \(y2\) coupling within the system. These next cases show the same system with solvers and different solver settings applied. Depending on your problem setup, certain settings might enhance convergence speed or robustness.

prob = om.Problem()
model = prob.model

model.add_subsystem('sellar_mda', SellarMDA())
prob.setup()

# Nonlinear Block Gauss Seidel is a gradient free solver
nlbgs = model.sellar_mda.nonlinear_solver = om.NonlinearBlockGS()
nlbgs.options["iprint"] = 2  # change the print to show the residuals
nlbgs.options["maxiter"] = 50  # limit the maximum number of solver iterations
nlbgs.options["atol"] = 1.e-9  # set the absolute tolerance for convergence

om.n2(prob, outfile="coupled_with_solver.html", display_in_notebook=False)
prob.run_model()
prob.model.list_outputs();
==========
sellar_mda
==========
NL: NLBGS 1 ; 50.5247285 1
NL: NLBGS 2 ; 3.91711889 0.0775287471
NL: NLBGS 3 ; 0.0758730639 0.00150170157
NL: NLBGS 4 ; 0.00150052731 2.96988693e-05
NL: NLBGS 5 ; 2.96633139e-05 5.87104865e-07
NL: NLBGS 6 ; 5.86406806e-07 1.16063327e-08
NL: NLBGS 7 ; 1.15925306e-08 2.2944271e-10
NL: NLBGS 8 ; 2.29166181e-10 4.53572316e-12
NL: NLBGS Converged
5 Explicit Output(s) in 'model'

varname     val           
----------  --------------
sellar_mda
  cycle
    d1
      y1    [25.58830237] 
    d2
      y2    [12.05848815] 
  obj_cmp
    obj     [28.58830817] 
  con_cmp1
    con1    [-22.42830237]
  con_cmp2
    con2    [-11.94151185]


0 Implicit Output(s) in 'model'

If you’re looking for better convergence when using a nonlinear block Gauss-Seidel solver, using Aitken relaxation might help.

prob = om.Problem()
model = prob.model

model.add_subsystem('sellar_mda', SellarMDA())
prob.setup()

# Nonlinear Block Gauss Seidel is a gradient free solver
nlbgs = model.sellar_mda.nonlinear_solver = om.NonlinearBlockGS()
nlbgs.options["iprint"] = 2  # change the print to show the residuals
nlbgs.options["maxiter"] = 50  # limit the maximum number of solver iterations
nlbgs.options["atol"] = 1.e-9  # set the absolute tolerance for convergence
nlbgs.options["use_aitken"] = True
 
prob.run_model()
prob.model.list_outputs();
==========
sellar_mda
==========
NL: NLBGS 1 ; 50.5247285 1
NL: NLBGS 2 ; 3.91711889 0.0775287471
NL: NLBGS 3 ; 0.0744313595 0.00147316694
NL: NLBGS 4 ; 2.9722743e-05 5.88281104e-07
NL: NLBGS 5 ; 1.16801147e-08 2.31176199e-10
NL: NLBGS 6 ; 2.21703753e-10 4.38802462e-12
NL: NLBGS Converged
5 Explicit Output(s) in 'model'

varname     val           
----------  --------------
sellar_mda
  cycle
    d1
      y1    [25.58830237] 
    d2
      y2    [12.05848815] 
  obj_cmp
    obj     [28.58830817] 
  con_cmp1
    con1    [-22.42830237]
  con_cmp2
    con2    [-11.94151185]


0 Implicit Output(s) in 'model'

Next, we try a Newton solver and see how many iterations convergence takes for this problem.

prob = om.Problem()
model = prob.model

model.add_subsystem('sellar_mda', SellarMDA())
prob.setup()

newton = model.sellar_mda.nonlinear_solver = om.NewtonSolver(solve_subsystems=True)
newton.options["iprint"] = 2  # change the print to show the residuals
newton.options["maxiter"] = 50  # limit the maximum number of solver iterations
newton.options["atol"] = 1.e-9  # set the absolute tolerance for convergence
 
prob.run_model()
prob.model.list_outputs();
==========
sellar_mda
==========
NL: Newton 0 ; 2.25451411 1
NL: Newton 1 ; 5.8309261e-05 2.58633383e-05
NL: Newton 2 ; 0.000862032337 0.000382358369
NL: Newton 3 ; 1.6727336e-05 7.41948606e-06
NL: Newton 4 ; 1.27465221e-08 5.65377791e-09
NL: Newton 5 ; 6.78410217e-09 3.00911942e-09
NL: Newton 6 ; 1.2648016e-10 5.61008509e-11
NL: Newton Converged
5 Explicit Output(s) in 'model'

varname     val           
----------  --------------
sellar_mda
  cycle
    d1
      y1    [25.58830237] 
    d2
      y2    [12.05848815] 
  obj_cmp
    obj     [28.58830817] 
  con_cmp1
    con1    [-22.42830237]
  con_cmp2
    con2    [-11.94151185]


0 Implicit Output(s) in 'model'

For more robust (and more expensive) convergence, adding a line search to your Newton method might help.

prob = om.Problem()
model = prob.model

model.add_subsystem('sellar_mda', SellarMDA())
prob.setup()

newton = model.sellar_mda.nonlinear_solver = om.NewtonSolver(solve_subsystems=True)
newton.options["iprint"] = 2  # change the print to show the residuals
newton.options["maxiter"] = 50  # limit the maximum number of solver iterations
newton.options["atol"] = 1.e-9  # set the absolute tolerance for convergence
newton.linesearch = om.ArmijoGoldsteinLS()
 
prob.run_model()
prob.model.list_outputs();
==========
sellar_mda
==========
NL: Newton 0 ; 2.25451411 1
NL: Newton 1 ; 5.8309261e-05 2.58633383e-05
NL: Newton 2 ; 1.02607101e-06 4.55118472e-07
NL: Newton 3 ; 2.025957e-08 8.98622456e-09
NL: Newton 4 ; 4.00021349e-10 1.77431291e-10
NL: Newton Converged
5 Explicit Output(s) in 'model'

varname     val           
----------  --------------
sellar_mda
  cycle
    d1
      y1    [25.58830237] 
    d2
      y2    [12.05848815] 
  obj_cmp
    obj     [28.58830816] 
  con_cmp1
    con1    [-22.42830237]
  con_cmp2
    con2    [-11.94151185]


0 Implicit Output(s) in 'model'
prob = om.Problem()
model = prob.model

model.add_subsystem('sellar_mda', SellarMDA())
prob.setup()

newton = model.sellar_mda.nonlinear_solver = om.NewtonSolver(solve_subsystems=False)
newton.options["iprint"] = 2  # change the print to show the residuals
newton.options["maxiter"] = 50  # limit the maximum number of solver iterations
newton.options["atol"] = 1.e-9  # set the absolute tolerance for convergence
newton.options["rtol"] = 1.e-15  # set the absolute tolerance for convergence

prob.run_model()
prob.model.list_outputs();
==========
sellar_mda
==========
NL: Newton 0 ; 36.8229305 1
NL: Newton 1 ; 12.2837727 0.333590307
NL: Newton 2 ; 6.0606187 0.164588169
NL: Newton 3 ; 2.14967189 0.0583786207
NL: Newton 4 ; 0.227624537 0.00618159756
NL: Newton 5 ; 0.0642920936 0.00174597982
NL: Newton 6 ; 0.00611849532 0.000166159924
NL: Newton 7 ; 0.0015263487 4.14510382e-05
NL: Newton 8 ; 0.00018041202 4.89944765e-06
NL: Newton 9 ; 3.73164117e-05 1.01340146e-06
NL: Newton 10 ; 5.04139962e-06 1.36909245e-07
NL: Newton 11 ; 9.37031206e-07 2.54469482e-08
NL: Newton 12 ; 1.36709744e-07 3.71262531e-09
NL: Newton 13 ; 2.3929104e-08 6.49842468e-10
NL: Newton 14 ; 3.64867603e-09 9.90870628e-11
NL: Newton 15 ; 6.17305318e-10 1.67641551e-11
NL: Newton Converged
5 Explicit Output(s) in 'model'

varname     val           
----------  --------------
sellar_mda
  cycle
    d1
      y1    [25.58830237] 
    d2
      y2    [12.05848815] 
  obj_cmp
    obj     [28.58830817] 
  con_cmp1
    con1    [-22.42830237]
  con_cmp2
    con2    [-11.94151185]


0 Implicit Output(s) in 'model'

Circuit example

Our next example looks at a different problem from the OpenMDAO doc page – the circuit example.

We first set up this example with purposefully bad guesses before redoing the problem with better guesses for the initial states. This shows the importance of providing good guesses for the Newton solver.

import numpy as np
import openmdao.api as om
from openmdao.test_suite.scripts.circuit_analysis import Circuit

p = om.Problem()
model = p.model

model.add_subsystem('circuit', Circuit())

p.setup()

nl = model.circuit.nonlinear_solver = om.NewtonSolver(solve_subsystems=False)

nl.options['iprint'] = 2

# set some poor initial guesses so that we don't converge
p.set_val('circuit.I_in', 0.1, units='A')
p.set_val('circuit.Vg', 0.0, units='V')
p.set_val('circuit.n1.V', 10.)
p.set_val('circuit.n2.V', 1e-3)

p.run_model()

p.model.list_outputs();
=======
circuit
=======
NL: Newton 0 ; 2.53337743 1
NL: Newton 1 ; 6.97216645e+152 2.75212306e+152
NL: Newton 2 ; 2.56496626e+152 1.01246906e+152
NL: Newton 3 ; 9.43616587e+151 3.72473748e+151
NL: Newton 4 ; 3.47143851e+151 1.37028082e+151
NL: Newton 5 ; 1.27709554e+151 5.04107884e+150
NL: Newton 6 ; 4.69826271e+150 1.8545451e+150
NL: Newton 7 ; 1.72842766e+150 6.822622e+149
NL: Newton 8 ; 6.35865288e+149 2.50995087e+149
NL: Newton 9 ; 2.33926287e+149 9.23377165e+148
NL: Newton 10 ; 8.60583345e+148 3.39698039e+148
NL: NewtonSolver 'NL: Newton' on system 'circuit' failed to converge in 10 iterations.
3 Explicit Output(s) in 'model'

varname  val             
-------  ----------------
circuit
  R1
    I    [0.09997694]    
  R2
    I    [2.30559559e-05]
  D1
    I    [0.]            


2 Implicit Output(s) in 'model'

varname  val         
-------  ------------
circuit
  n1
    V    [9.9976944] 
  n2
    V    [9.76713485]
import numpy as np
import openmdao.api as om
from openmdao.test_suite.scripts.circuit_analysis import Circuit

p = om.Problem()
model = p.model

model.add_subsystem('circuit', Circuit())

p.setup()

nl = model.circuit.nonlinear_solver = om.NewtonSolver(solve_subsystems=False)
nl.options['iprint'] = 2
nl.options['maxiter'] = 25

# set some good initial guesses so that we do converge
p.set_val('circuit.I_in', 0.01, units='A')
p.set_val('circuit.Vg', 0.0, units='V')
p.set_val('circuit.n1.V', 1.)
p.set_val('circuit.n2.V', 0.5)

p.run_model()

p.model.list_outputs();
=======
circuit
=======
NL: Newton 0 ; 2.63440686 1
NL: Newton 1 ; 10.2098095 3.8755629
NL: Newton 2 ; 3.75604877 1.42576639
NL: Newton 3 ; 1.38179632 0.524518949
NL: Newton 4 ; 0.508340098 0.192961879
NL: Newton 5 ; 0.187006627 0.0709862358
NL: Newton 6 ; 0.0687916876 0.0261127803
NL: Newton 7 ; 0.0253013258 0.00960418306
NL: Newton 8 ; 0.00930113345 0.00353063666
NL: Newton 9 ; 0.00341421321 0.00129600832
NL: Newton 10 ; 0.00124785642 0.000473676423
NL: Newton 11 ; 0.000450327474 0.000170940746
NL: Newton 12 ; 0.000156619343 5.94514635e-05
NL: Newton 13 ; 4.89476723e-05 1.85801491e-05
NL: Newton 14 ; 1.12684364e-05 4.27740931e-06
NL: Newton 15 ; 1.12181954e-06 4.2583382e-07
NL: Newton 16 ; 1.45058216e-08 5.50629509e-09
NL: Newton 17 ; 2.7662285e-12 1.05003845e-12
NL: Newton Converged
3 Explicit Output(s) in 'model'

varname  val             
-------  ----------------
circuit
  R1
    I    [0.00996331]    
  R2
    I    [3.66901571e-05]
  D1
    I    [3.66901571e-05]


2 Implicit Output(s) in 'model'

varname  val         
-------  ------------
circuit
  n1
    V    [0.99633098]
  n2
    V    [0.62942941]