Fitting tabular data using smooth curve fits video transcript
Fitting tabular data using smooth curve fits video transcript¶
Today we’ll be talking about adding curve fits in a differentiable way. What I mean by this is when you have a set of tabular or discrete data and you want to supply a curve fit across this data so that you can do design optimizations around that data.
My thesis statement here is that we will need to use a smooth and continuous curve fit for this data. It will allow us to use gradient-based optimizers and Newton solvers for solving this model. When I say curve please think of it as a general term which includes curves, surfaces and hypersurfaces. This way we can handle the n-dimensional case. For most of the examples here we will be looking at a one-dimensional curve to make sure that we can visualize it easily and understand what’s going on.
I’ll first show you maybe the the incorrect and simplest way to set up a curve it and then show you how to do this in a better way so that it is well posed for gradient-based optimization. Additionally I’ll give you a real world example and explain some of the ramifications of what the different curve fits mean for that example.
This topic firmly falls within the modeling category but it does also touch in the optimization and differentiation realms.
So first let me explain why using a piecewise linear fit is a bad idea generally. Let’s say we have some data here. We have one dimensional data. Here we have the x-axis and then f of x or kind of the output of this function and we see from this one-dimensional case that we can connect them using a piecewise linear fit. This means just drawing lines in between the individual points and interpolating in a linear way between those points. This is the simplest option because you don’t have to have any knowledge about any kind of physics between those points. You don’t have to know about the nonlinearity between them as well. It’s very simple, it’s naive right.
But an issue arises when you try to use this curve fit in a gradient-based optimization. Let me show you. Let’s zoom in on this point here, which happens to be the minimum of this function. We see here that the gradient or the derivative as we sweep changes instantly at this point. Now that’s bad news for a gradient-based optimizer and the reason why it’s bad news is that if the the gradient-based optimizer is sitting on one side of this point and is attempting to move towards the minimum it will step across that point. So as you saw from that sweep if it goes from left to right then all of a sudden the derivative switches instantly. The gradient-based optimizer is looking for where the derivative is zero or that gradient value is zero because that suggests that there’s an extrama there, in this case a minima. However, because the derivative changes instantly, the optimizer cannot succeed. It says “I’m not sure what to do here.”
So instead I suggest using a smooth and continuous curve fit. There are many different non-linear spline and non-linear differentiable curve fits that you can use. Here for this example we happen to be using the Akima curve fit. As you can see when we sweep from left to right now the derivative changes very smoothly. That nice tangent line changes in a continuous way.
Let’s take a look at what this means in an actual optimization though. Let’s go back to this piecewise linear fit. If I start at x equals 4 here and I ask an SLSQP optimizer or a gradient-based optimizer to find the minimum point in this function from 0 to 10, let’s see what happens. First it moves to the left because we see that kind of slope there and it settles down a little bit to x equals three. But it now doesn’t know what to do. It knows that the minimum should be where the derivative is zero. But at that point the derivative is not zero; it’s a positive value on one side and a negative value on the other and it changes instantly. So now the gradient-based optimizer is kind of flailing. It’s moving all around the design space as it can. It’s trying to find where the value is lower and the derivative of the function could be closer to zero. On the right hand side of this graph we can see that the the derivative function is closer to zero and that’s why it tries to move over there multiple times. But the actual value of the function f of x is lower at x equals three than at x equals 10. Eventually after 211 iterations we see that the optimizer stops. This is a large number of iterations for a 1D problem and we see that all of this trouble is really just because of the kind of C1 discontinuity in the derivative space here.
However if we perform the same optimization using the Akima spline, because of its smooth and differentiable nature, the optimizer very nicely settles into the actual optimum here. We see that here the derivative is zero or near zero and additionally it’s the lowest value for f of x that the function can see. This is wonderful. This is what you want to see in your gradient-based optimizations.
Now this is kind of an academic or contrived case. We see here that if you just have the optimizer we have x as the input and y or f of x as the output we can very clearly see what is going on here. We can debug this behavior. We understand what is happening.
In a real case for an actual complex model for an aircraft design problem we might have many many different blocks here and if you use a linear curve fit within one of these blocks it will be much more challenging to identify where the issue is coming from when your optimization does not converge. And I’ve been focusing on gradient-based optimizers but you also need smooth and differentiable curve fits when you use using a Newton solver, because the Newton solver uses the derivative information to help converge the model. I’ll talk more on this when we explain what a nonlinear solver does and how it operates as well. If you have not seen that lecture yet please check it out.
And then I have a few more notes and one specific example here. This figure is courtesy Laurens Voet from MIT. He was doing engine design work and was using a piecewise linear fit for some of the thermodynamic data. However, he found that the optimization did not converge well. And this is a pretty telling plot that I don’t need to get into all the details of. But I want to show you the different parts of the engine model were greatly affected by different interpolation methods.
On the left hand side here we see that the required fan pressure ratio does not differ greatly based on the two interpolation methods used. However, the thrust specific fuel consumption does differ greatly near the design point when you use different interpolation methods. Additionally, the derivatives especially around the design point become discontinuous if you use a piecewise linear function here. The optimizer would not have accurate information about the function values or the derivatives when modeling this engine. Instead of using a piecewise linear fit, Laurens changed to a Lagrangian fit which provided C2 continuity in the space. This allowed the optimizer to much more easily traverse the more continuous and smooth design space. As you can see on the right hand side the orange line here is much more smooth than the blue line from the piecewise linear fit. This is just one example where using a simple fit for your real world tabulated data produces a poorly posed optimization problem.
Thus this takes me to my main takeaway message which is if you’re using gradient-based optimizers or a Newton solver you must fit a curve to data in a differentiable way. There’s a Python notebook that accompanies this lecture. Please give it a look and try using some differentiable curve fits in your own cases. Thank you very much for watching.