Derivatives of vector-valued functions
Derivatives of vector-valued functions¶
Hello everybody, welcome back to a fantastic installment in our Practical MDO series. Today we’re talking about derivatives of vector-valued functions. I’ll introduce what I mean by all these words in just a moment, but know that getting the derivatives of vector-valued functions doesn’t have to be scary it’s not impossible. You can think of vector-valued functions simply as a conglomeration of many different scalar valued functions, but it often behooves you to look at the entire system, examine the arrays as a whole, and look for patterns in their derivatives. I’ll get more into that later.
This firmly has to do with our differentiation subtopic within our course.
First let me introduce: what are vector-valued functions? Strictly speaking mathematically, vector-valued functions are where the output is not just a scalar value; there are multiple values output in vector-valued functions. For the purposes of this lecture we will consider any arbitrary sized inputs and outputs for functions.
First let me flash up here a kind of simple model. Here we have a scalar input and a scalar output. This would be like y equals x squared. We know that x is a scalar, y is a scalar, fantastic, very easy. There are many models that are relevant in engineering that are just scalar-scalar functions. However, there are also many models in engineering where the inputs may be something else, they may be higher dimensional than just a scalar. Here I’m showing you where the number of inputs, n of x, equals six, we have maybe six inputs coming into this model and we have one output. Again, now like I said earlier, this is not strictly a vector-valued function. But of course these will crop up in engineering problems so we must know how to take derivatives of a model like this.
Here’s an example of a true vector-valued function. Here we have n_x equals 6 and n_f equals four. So there are six inputs and four outputs to this model. Now I’m just flashing up some examples here with the corresponding arrows, I mean we could have two inputs, four outputs. We want to be able to handle any arbitrary case of any size or dimensionality. Additionally when I say vectors here, I’m speaking generally. I don’t just mean n by one vectors, I mean n by m arrays as well or any kind of multi-dimensional tensor. For extremely complex problems it’s very easy to get into multi-dimensional input and output spaces.
Let’s take a look at a few kind of simple examples here. Here we have two vectors, A and B, and they exist in three-dimensional space despite being represented just by 2D here. That means that A and B have X, Y, and Z components each. If we want to compute the angle between these, theta, we have our trusty formula back from our Algebra 2 or pre-calc or geometry days. I don’t know when he first encountered this, but we have our handy formula here for theta, is the inverse cosine of a dot b over the magnitude of a times the magnitude of b. What this means is that we have six inputs here, each of the XYZ components of a and b, and one output. Now again this is not technically a vector-valued function but we still need to understand how to take derivatives of this one output with respect to six inputs, so that’s why I bring this up here. It’s also a very, uh, I don’t know, common or understandable case, right, we understand geometry in vectors.
Let’s take a look at a maybe more exciting case here. I have an 80 pound lab-Pyrenees mix, her name’s Honey, she’s fantastic. She would love if I was constructing a tennis ball launcher like the one shown here. Now if we consider the physics of a tennis ball launcher in two dimensions here, we can think about this as a model which has multi-dimensional input and multi-dimensional output. Let’s say that we have the ability to control the speed of the tennis ball at the exit as well as the theta value shown here, kind of the angle of launch. If we then consider our output to be the velocity of the tennis ball we can break it down into X and Y velocities, we would then get V times cosine of theta for the x velocity and V times sine of theta for the Y velocity. So in this case we have two inputs and two outputs, this is a vector-valued function. If we had a portion of this tennis ball launcher in a model and we want to do gradient-based optimization we would have to know how to compute the derivatives of these two outputs with respect to the two inputs.
Now here it’s really not that complicated, we can do that by hand symbolically, but I just wanted to show an exciting and simple case of launching some tennis balls in the backyard.
So it’s kind of a rudimentary understanding of what vector-valued functions are. Let’s talk about, just a real quick look at some of the math behind them. I’m going to flash up a lot of here, but then I’m going to spell it out, so please do not worry. So the derivatives of vector-valued functions are known as Jacobians. A Jacobian is an array of partial derivatives. In the case when f is a scalar and X is a scalar, partial F partial X is just a scalar, so the Jacobian is one by one. However, in the most general sense, a Jacobian is of the shape n_f by n_x, or the number of functions of interest by the number of inputs. You can also think of this as the number of outputs by the number of inputs. The Jacobian structure appears as shown here on the right. Now like I said before you could think about each one of these as individuals, just as scalars with respect to scalars, but it often pays to look at this as a whole. Usually the values in an array are related to each other and you can actually get derivatives or gradients with respect to a large chunk of the Jacobian at a given time. Additionally it’d be kind of ridiculous in terms of Python coding if you treated each one of these as scalars and you hand computed them, that’d be a huge mess. So a lot of the tips and tricks that I will show here will be kind of focused on treating these as arrays. This allows you to use some array maths, linear algebra, then you can really save some time during your gradient computations.
Let me just kind of connect some dots here. Let’s go back to this visualization of the model, let’s say we have four inputs here and two outputs. If we were to take the general form of the Jacobian and write it for this actual case we would see that we have a two by four array. Again because we have two functions of interest outputs and we have four inputs this would mean that we need to compute eight separate entries for this Jacobian. To give the entire Jacobian array when performing gradient-based optimization in OpenMDAO we need to compute the Jacobian. You can either compute them symbolically or analytically and tell OpenMDAO what the Jacobian is or you can use an approximation like finite differencing or complex step to get the Jacobian.
There are astoundingly many details to Jacobian computation. It turns out you can get a PhD in this topic. They can be focused on more quickly computing the Jacobian, getting a more exact Jacobian, or using sparse partial derivatives to really decrease the memory costs and be able to solve much much larger problems. We’ll get a little bit into that later.
For now just know that when I’m talking about trying to get the derivatives of vector-valued functions we’re trying to compute the Jacobian so this lesson was all about introducing the idea of vector-valued functions and how to get the derivatives of them. These derivatives come in the form of arrays that are known as Jacobians. You can think of this as the first order derivative information for outputs with respect to inputs. Now this is just the first of many lessons that are focused on computing Jacobians, computing them efficiently, and using them in gradient-based design optimization. Using this lesson as a foundation we’ll continue on this topic and get more into the nitty-gritty details in other lectures.
As always please hit those like and subscribe buttons, and thank you very much for watching. Guys, gals, and non-binary pals; take care!