Common ways to compute derivatives

Main message

There are many ways to compute partial derivatives: finite-differencing, complex-step, analytically by hand, or through algorithmic differentiation. The best method depends on your problem formulation, but the best implementation usually involves an intelligent mix of these methods.

from IPython.display import YouTubeVideo; YouTubeVideo('RG0dwwG_IzY', width=1024, height=576)

Video transcript available on YouTube and here.

Finite difference

Finite difference (FD) is the simplest and least invasive way to compute derivatives, but it does not produce accurate or efficient results. FD is when you evaluate a system at a certain point, then perturb that system and reevaluate it. The change in function value is then divided by the perturbation to get an approximated slope or derivative of the system. This is shown mathematically as:

\[\frac{\partial f}{\partial x} = \frac{f(x+h) - f(x)}{h} \]

where h is the perturbation size. The accuracy of the FD approximation varies with h and is highly dependent on model, problem formulation, and location within the design space. Martins and Ning discuss this in great detail in Sec. 6.4 within Engineering Design Optimization. This textbook section also explains forward first-order FD (shown above), as well as central FD, and other higher-order methods. It also does a great job explaining the advantages and disadvantages of FD.

If you use FD in your optimization, I highly recommend reading Sec. 6.4 as it is so comprehensive and concise. Spending 15 minutes reading and understanding it will save you hours and hours of model and optimization debugging.

For the purposes of this course, I’m going to try to dissuade you from using FD. It is tempting because it appears to work for any model, even when that model is a blackbox. However, it scales very poorly with the number of design variables and does not produce reliably accurate gradients. Additionally, as model complexity increases, FD becomes computationally intractable. For instance, we are solving coupled design and mission optimization problems at NASA Glenn with hundreds if not thousands of design variables. Using FD would make this computationally impossible.

That being said, if you’re just starting with a blackbox model and need to approximate derivatives without adding any developer cost, FD is the way to go. If you have knowledge about or access to the model and know that it is able to handle complex numbers, you should use the complex step method.

OpenMDAO has the built-in ability to perform finite difference for any part of your model or the entire system. Here is the doc page on FD for components (aka computing partial derivatives) and here is the doc page on FD for groups (aka computing semi-total derivatives).

The following two examples come from the OpenMDAO doc pages and are recreated here so you can see how FD can be used in your models. In the first example, we use FD to compute the partial derivatives of the component’s outputs with respect to the inputs. The main two lines here are the declare_partials calls in setup_partials where we have method='fd' set to use FD to compute the derivatives.

import numpy as np

import openmdao.api as om

class FDPartialComp(om.ExplicitComponent):
    def setup(self):
        self.add_input('x', shape=(4,))
        self.add_input('y', shape=(2,))
        self.add_input('y2', shape=(2,))
        self.add_output('f', shape=(2,))

    def setup_partials(self):
        self.declare_partials('f', 'y*', method='fd')
        self.declare_partials('f', 'x', method='fd')

    def compute(self, inputs, outputs):
        f = outputs['f']

        x = inputs['x']
        y = inputs['y']

        f[0] = x[0] + y[0]
        f[1] = np.dot([0, 2, 3, 4], x) + y[1]

model = om.Group()
model.add_subsystem('example', FDPartialComp())

problem = om.Problem(model=model)
problem.setup()
problem.run_model()

# Here, we call `compute_totals`, but because the model is only one component,
# the partials are one in the same as the totals.
totals = problem.compute_totals(['example.f'], ['example.x', 'example.y'])
print(totals)
{('example.f', 'example.x'): array([[ 1., -0., -0., -0.],
       [-0.,  2.,  3.,  4.]]), ('example.f', 'example.y'): array([[ 1., -0.],
       [-0.,  1.]])}

Our next example features two components. Instead of using FD at the component level to compute the partials, we use FD at the group level to compute the total derivatives.

class CompOne(om.ExplicitComponent):

    def setup(self):
        self.add_input('x', val=0.0)
        self.add_output('y', val=np.zeros(25))
        self._exec_count = 0

    def compute(self, inputs, outputs):
        x = inputs['x']
        outputs['y'] = np.arange(25) * x
        self._exec_count += 1

class CompTwo(om.ExplicitComponent):

    def setup(self):
        self.add_input('y', val=np.zeros(25))
        self.add_output('z', val=0.0)
        self._exec_count = 0

    def compute(self, inputs, outputs):
        y = inputs['y']
        outputs['z'] = np.sum(y)
        self._exec_count += 1

prob = om.Problem()
model = prob.model

model.set_input_defaults('x', 0.0)

model.add_subsystem('comp1', CompOne(), promotes=['x', 'y'])
comp2 = model.add_subsystem('comp2', CompTwo(), promotes=['y', 'z'])

model.linear_solver = om.ScipyKrylov()
model.approx_totals()

prob.setup()
prob.run_model()

of = ['z']
wrt = ['x']
derivs = prob.compute_totals(of=of, wrt=wrt)
print(derivs)
{('z', 'x'): array([[300.]])}

Using total derivative FD is useful when you have a large model with many subgroups and components, but you’re only interested in the sensitivity of a few input variables into the model.

An example where this is useful is wind turbine design when you are just controlling the turbine blade twist profile. If you don’t have derivatives for your model, you could choose to FD all partials derivatives and have OpenMDAO chain rule them together, or you could FD the total derivatives. Because there are many variables in the components and subgroups, performing FD for all of them would be more expensive than performing FD at the total level. Thus, approximating the total derivatives at the top group level is more efficient in that case.

Complex-step

The complex-step (CS) method is a much more accurate version of the FD method. It works by perturbing a system in the complex plane to obtain derivatives. Because the perturbation occurs in a plane orthogonal to the real values of the function, the outputted float representations of the derivative are not hindered by the limitations of subtractive cancellation. This allows CS to produce numerically exact derivatives when using a small enough step size, such as 1j*1.e-60j.

\[\frac{\partial f}{\partial x} = \frac{Im[f(x + ih)]}{h} \]

Please check out Sec. 6.5 within Engineering Design Optimization or the complex-step paper by Martins in 2003 for more comprehensive and theoretical details.

The biggest practical consideration is that you need a complex-safe model. What I mean by this is that your model has to fully propagate complex numbers through it correctly. If you have an external solver in Fortran or C, for instance, and it’s assuming all numbers are real-typed, then any perturbation in the complex space will not be reflected throughout the model. Thus, complex-step is best used on graybox models where we have some information about what’s going on behind the scenes. If you’re not able to verify if a model is complex-safe, try using the complex-step method as compared to FD to see if the derivatives are roughly the same, accounting for the inaccuracies introduced by FD.

Note: if you are using ExecComps in OpenMDAO, the derivatives are automatically computed using CS due to the accuracy and relatively low computational cost.

Just like with FD, OpenMDAO has built-in support for CS as detailed in this doc page. The following code snippet uses CS to compute the total derivatives of the model. Syntactically, it’s very similar to FD, you simply say 'cs' instead of 'fd'. However, you need to make sure that your model is fully complex safe. I suggest checking your CS derivatives against FD derivatives by using the check_partials command to ensure that the derivatives are accurate.

prob = om.Problem()
model = prob.model
model.set_input_defaults('x', 0.0)

model.add_subsystem('comp1', CompOne(), promotes=['x', 'y'])
model.add_subsystem('comp2', CompTwo(), promotes=['y', 'z'])

model.linear_solver = om.ScipyKrylov()
model.approx_totals(method='cs')

prob.setup()
prob.run_model()

of = ['z']
wrt = ['x']
derivs = prob.compute_totals(of=of, wrt=wrt)
print(derivs)
{('z', 'x'): array([[300.]])}

Analytically or by hand

Analytic derivatives are generally the most computationally efficient option, though they take some time to implement. When you compute derivatives by hand you are paying high developer cost to reduce computational cost by obtaining exact derivatives. This is really where your understanding of calculus and differential equations comes into play. It’s not necessarily easy to compute derivatives by hand, but if you know you’re going to be performing many optimizations with a model or it will be used in many multidisciplinary studies, it’s often quite worthwhile.

People have different best practices for computing derivatives by hand. Some people sit down with a pad of paper, others use WolframAlpha or another symbolic differentiation engine. All methods are valid as long as they result in accurate and efficient derivatives – though that is no small feat.

Sometimes it’s as easy as doing the power rule across a function; other times you have to do some pretty advanced tensor calculus. Depending on how you construct and set up your model, your derivative computations can be made easier to save your sanity. For instance, if you have one 1000-line component, it would probably be very challenging to analytically compute derivatives. However, if you have 10 100-line components, you might be able to more easily compute the partial derivatives for each of those smaller computation amounts. OpenMDAO will then combine the partial derivatives together behind the scenes, which is just fantastic.

I certainly can’t cover all the ins and outs of computing derivatives by hand. It sometimes takes months or years of hands-on experience to get a good feel for computing derivatives of complicated models. That being said, many of our topics regarding model differentiation discuss this, especially [[How to structure your code to be easily differentiable]], [[Advanced ways to compute derivatives]], and [[How to avoid non-differentiable functions]]. Additionally, [[Derivatives of vector valued functions]] and [[Computing derivatives of implicit functions]] are also helpful related topics.

In the following example we provide the analytic partial derivatives of the outputs with respect to the inputs for a simple explicit component.

class MyComp(om.ExplicitComponent):
    def setup(self):
        self.add_input('x1', 3.0)
        self.add_input('x2', 5.0)

        self.add_output('y', 5.5)

        self.declare_partials(of='*', wrt='*')

    def compute(self, inputs, outputs):
        outputs['y'] = 3.0 * inputs['x1'] + 4.0 * inputs['x2'] * inputs['x1']

    def compute_partials(self, inputs, partials):
        partials['y', 'x1'] = 3 + 4 * inputs['x2']
        partials['y', 'x2'] = 4 * inputs['x1']

prob = om.Problem()

prob.model.add_subsystem('comp', MyComp())

prob.set_solver_print(level=0)

prob.setup()
prob.run_model()

data = prob.check_partials(compact_print=True)
------------------------
Component: MyComp 'comp'
------------------------

'<output>' wrt '<variable>' | calc mag.  | check mag. | a(cal-chk) | r(cal-chk)
-------------------------------------------------------------------------------

'y'        wrt 'x1'         | 2.3000e+01 | 2.3000e+01 | 1.2260e-09 | 5.3304e-11
'y'        wrt 'x2'         | 1.2000e+01 | 1.2000e+01 | 1.8754e-09 | 1.5628e-10

#######################################################
Sub Jacobian with Largest Relative Error: MyComp 'comp'
#######################################################

'<output>' wrt '<variable>' | calc mag.  | check mag. | a(cal-chk) | r(cal-chk)
-------------------------------------------------------------------------------
'y'        wrt 'x2'         | 1.2000e+01 | 1.2000e+01 | 1.8754e-09 | 1.5628e-10

Algorithmic differentiation

Algorithmic differentiation (AD) is also known as automatic differentiation. AD is where a computer does the differentiation for you based on code you provide for the model. This can be done via source code transformation or operator overloading. Sec. 6.6 within Engineering Design Optimization goes into much more detail as well.

Some programming languages and packages exist to help perform AD on your models, including PyTorch, Jax, Tapenade, Julia, and other packages. It would be fantastic if AD was a cure-all reliable method of producing derivatives, but it often has limitations in terms of what code can be converted. There are many potential pitfalls, including code structure, variable use, cost of computing derivatives, lack of support for operations, etc. However, if AD works for your model, then you get the benefits of analytic derivatives without the huge developer cost! Isn’t that great?

This is something that the OpenMDAO developer team has considered throughout its development, though it is not natively implemented in OpenMDAO. You can use it for specific components or groups by using an AD method and hooking it up yourself to the compute_partials method for the components.