What solver convergence looks like

Main message

A system is converged when the residuals are close to 0 within a tolerance. How this is achieved depends on what solver you use, but generally you want your residuals to decrease as computationally quickly as possible.

from IPython.display import YouTubeVideo; YouTubeVideo('M3AIQdvz3c4', width=1024, height=576)

Video transcript available on YouTube and here.

The basic idea of convergence

Solver convergence means that the system’s coupling or implicit relationships are resolved to the specified tolerance. Basically, the states (variables) within the system are at steady-state values for the given inputs. This mean that the outputs of the systems correctly correspond to the inputs. For any solver, we want the residual values of the states to be 0:

\[ R(x) = 0 \]

but really it’s

\[ R(x) < \text{tol} \]

where \(\text{tol}\) is some number that is reasonably close to 0, maybe 1e-4 or 1e-8 or some other small number based on your problem.

Convergence is relevant for both nonlinear and linear solvers. In oversimplified terms, you are converging the states of the system for nonlinear solvers whereas for linear solvers, you are converging the derivatives of the system. See [[Nonlinear vs linear solvers]] for a more detailed look. It’s easier to think about nonlinear solvers converging states for a lot of people, so I’d suggest imagining those instead of linear solvers for this lesson.

Your solver might converge part of your model sooner than another part due to the magnitude of the states and residuals. An example would be a solver that’s converging the mass of an aircraft as well as its angle of attack. Without any scaling, the mass is a much larger magnitude than the angle of attack. The solver would prioritize resolving the mass residual much more than the angle of attack. This is also a motivating case for scaling your state variables when using complicated solver setups. This is a different but related topic to optimizer scaling.

To be clear, the idea of solver convergence is different than that of optimizer convergence. They both use the word convergence and a similar mathematical idea, but practically they’re quite different. You can think of solvers as a certain type of optimization; for a solver you’re trying to get the residuals to 0, but for an optimization you’re trying to find where the gradients are 0. See [[Newton’s method math]] and [[Unconstrained optimization using a Newton solver]] for more details and a more mathy view.

If your model is explicit feed-forward, in that there are no feedback loops or implicit states at all, you do not need a nonlinear solver or a notion of solver convergence. Most complicated engineering models have some sort of feedback. If you’re not certain about the nature of your model, see Understanding N2 diagrams for how to visualize and better understand you feedback loops.

How to tell when something is converged

Determining when something is converged might be challenging. It fully depends on the problem you’re solving and what you’re doing with the results. If you’re performing gradient-based optimization of high-fidelity CFD, you generally want a pretty tight tolerance. If you just need rough numbers or large-scale trends for an analysis, you might be able to get away with a looser tolerance. [[What solver tolerance should I use]] goes into much more detail about this.

Convergence in the terminal

In OpenMDAO, solvers print out their convergence state based on their iprint value. You should generally have solver convergence printing on and logged to a file when doing analysis and optimization. This allows you to determine how a system is converging and how it’s meeting the tolerances you’ve set. Here’s an example output from OpenMDAO which comes from the Python code below.

=======
circuit
=======
NL: Newton 0 ; 2.63440686 1
NL: Newton 1 ; 10.2098095 3.8755629
NL: Newton 2 ; 3.75604877 1.42576639
NL: Newton 3 ; 1.38179632 0.524518949
NL: Newton 4 ; 0.508340098 0.192961879
NL: Newton 5 ; 0.187006627 0.0709862358
NL: Newton 6 ; 0.0687916876 0.0261127803

Those two numbers are the absolute and relative residuals, respectively. You can set the tolerance for both of these values and accept convergence from either metric. There could be a system where you care about things on the order 1e10 or on the order 1e-5, so those relative and absolute tolerances might greatly differ.

Here are some helpful OpenMDAO docs pages relevant to the discussion of solver convergence:

Example circuit problem

We now look at some example problems where we need to use a solver to resolve the implicit coupling. This problem comes from the OpenMDAO doc site.

First, we provide reasonable guesses to the Newton solver and see convergence:

import openmdao.api as om
from openmdao.test_suite.scripts.circuit_analysis import Circuit

p = om.Problem()
model = p.model

model.add_subsystem('circuit', Circuit())

p.setup()

nl = model.circuit.nonlinear_solver = om.NewtonSolver(solve_subsystems=False)
nl.options['iprint'] = 2
nl.options['maxiter'] = 25

p.set_val('circuit.I_in', 0.01, units='A')
p.set_val('circuit.Vg', 0.0, units='V')
p.set_val('circuit.n1.V', 1.)
p.set_val('circuit.n2.V', 0.5)

p.run_model()

p.model.list_outputs();
=======
circuit
=======
NL: Newton 0 ; 2.63440686 1
NL: Newton 1 ; 10.2098095 3.8755629
NL: Newton 2 ; 3.75604877 1.42576639
NL: Newton 3 ; 1.38179632 0.524518949
NL: Newton 4 ; 0.508340098 0.192961879
NL: Newton 5 ; 0.187006627 0.0709862358
NL: Newton 6 ; 0.0687916876 0.0261127803
NL: Newton 7 ; 0.0253013258 0.00960418306
NL: Newton 8 ; 0.00930113345 0.00353063666
NL: Newton 9 ; 0.00341421321 0.00129600832
NL: Newton 10 ; 0.00124785642 0.000473676423
NL: Newton 11 ; 0.000450327474 0.000170940746
NL: Newton 12 ; 0.000156619343 5.94514635e-05
NL: Newton 13 ; 4.89476723e-05 1.85801491e-05
NL: Newton 14 ; 1.12684364e-05 4.27740931e-06
NL: Newton 15 ; 1.12181954e-06 4.2583382e-07
NL: Newton 16 ; 1.45058216e-08 5.50629509e-09
NL: Newton 17 ; 2.7662285e-12 1.05003845e-12
NL: Newton Converged
3 Explicit Output(s) in 'model'

varname  val             
-------  ----------------
circuit
  R1
    I    [0.00996331]    
  R2
    I    [3.66901571e-05]
  D1
    I    [3.66901571e-05]


2 Implicit Output(s) in 'model'

varname  val         
-------  ------------
circuit
  n1
    V    [0.99633098]
  n2
    V    [0.62942941]

Then, we purposefully provide “bad” initial guesses for the states in the Newton solver. This causes the solver to fail and not converge. From this, we can see an example of what the residuals look like when they don’t converge and how OpenMDAO presents that information.

p = om.Problem()
model = p.model

model.add_subsystem('circuit', Circuit())

p.setup()

nl = model.circuit.nonlinear_solver = om.NewtonSolver(solve_subsystems=False)
nl.options['iprint'] = 2

# set some poor initial guesses so that we don't converge
p.set_val('circuit.I_in', 0.1, units='A')
p.set_val('circuit.Vg', 0.0, units='V')
p.set_val('circuit.n1.V', 10.)
p.set_val('circuit.n2.V', 1e-3)

p.run_model()

p.model.list_outputs();
=======
circuit
=======
NL: Newton 0 ; 2.53337743 1
NL: Newton 1 ; 6.97216645e+152 2.75212306e+152
NL: Newton 2 ; 2.56496626e+152 1.01246906e+152
NL: Newton 3 ; 9.43616587e+151 3.72473748e+151
NL: Newton 4 ; 3.47143851e+151 1.37028082e+151
NL: Newton 5 ; 1.27709554e+151 5.04107884e+150
NL: Newton 6 ; 4.69826271e+150 1.8545451e+150
NL: Newton 7 ; 1.72842766e+150 6.822622e+149
NL: Newton 8 ; 6.35865288e+149 2.50995087e+149
NL: Newton 9 ; 2.33926287e+149 9.23377165e+148
NL: Newton 10 ; 8.60583345e+148 3.39698039e+148
NL: NewtonSolver 'NL: Newton' on system 'circuit' failed to converge in 10 iterations.
3 Explicit Output(s) in 'model'

varname  val             
-------  ----------------
circuit
  R1
    I    [0.09997694]    
  R2
    I    [2.30559559e-05]
  D1
    I    [0.]            


2 Implicit Output(s) in 'model'

varname  val         
-------  ------------
circuit
  n1
    V    [9.9976944] 
  n2
    V    [9.76713485]

Feel free to modify this code and try different solver options to see how it affects convergence.