How to debug solvers video transcript¶

Hey everybody, welcome back to a new episode of Practical MDO. Today we’re talking about how to debug solvers. I’ve got many requests for this lesson in particular because debugging solvers can be such a pain. So let’s talk about it.

The process of optimization notoriously pushes your systems to their limits. Even if your model converges well through a parameter sweep or at the initial point, optimizers often push models into weird points in the design space. This might create a very challenging system for the solver to tackle. Luckily I’ll present a series of debugging steps in this lecture that you can do for your model to hopefully resolve any sort of issues you have with solver convergence. This firmly falls within the modeling focus of our Practical MDO course.

So these first two bullet points are really focused on kind of setting the stage and then the third bullet point is the meat of this lecture. So first we’ll talk about what types of solvers you’re using and then we’ll kind of set expectations. We’ll say “should you expect convergence for this setup ?” I’ll have a few considerations there and then lastly I have a nine step checklist for solver debugging, again focused on OpenMDAO, but it should be applicable to any kind of framework or system that you’re evaluating.

So first let’s discuss what types of solvers you’re using within your model. This figure comes from chapter three in the Engineering Design Optimization book by Martins and Ning and we’ve previously examined this, specifically in the “types of solvers and when to use them” lecture. I want to revisit this here because there are different considerations for how to debug your solver behavior depending on which types of solver you’re using. For fixed point iteration methods like Jacobi or Gauss-Seidel you might be able to use a relaxation method such as the Aitken relaxation method to help you reach convergence. On the other hand if you’re using a Newton solver there are all sorts of tips and tricks to reach convergence. If you’re having problems with a run-of-the-mill Newton solver you can try adding a linesearch, either a bounds enforcing or an Armijo Goldstein linesearch. Or you can add solve_subsystems and set that to true to do one iteration of Gauss-Seidel first before attempting the Newton solve. There are many other specific settings and variations that are kind of solver dependent. However maybe seven of the nine tips that I have in this checklist are not dependent on the type of solver that you have.

So my next point is: “should you expect convergence?” This might sound like a tongue-in-cheek question but I actually mean it. It’s very easy to set up a model or an analysis in a way that a solver cannot handle. You might have all your states defaulted to zero and then you’re asking the solver to converge this and it’s nowhere near the converged point. This would be like asking me to do a backflip. I just can’t! It’s impossible! You gotta set it up for success. So we want to give the solver some reasonable initial guesses for the states. This is especially important for Newton-type methods but it’s also handy for block Gauss-Seidel methods.

It’s not just initial guesses that we’re concerned about, though. You might also be asking the solver to model a system that doesn’t make physical sense. An example would be if you’re doing aerostructural wing design and you accidentally put an extremely huge force on the wing. This would cause the wing to deflect a huge amount and all of a sudden the model is no longer valid. Your solver might diverge or explode, if you will. We often see this when the residuals increase drastically instead of decreasing with your solver setup. So it definitely pays to double check and triple check your model to make sure it’s behaving as expected. You can print out the inputs and outputs, you can look at the state values, and I’ll guide you through how to do that in part of the checklist.

All right so here we are. The cat’s meow. The coup de grâce. The checklist for solver debugging in OpenMDAO. So as I said this is going to be kind of focused on using OpenMDAO as the framework for debugging these solvers but you can apply this to any sort of setup that you have.

I’m going to flash up the 10 tips in order here. If you’re looking at these and you feel overwhelmed: don’t. We’re going to go through this together. We’re going to take our time. I’m going to show you some fantastic examples for each one of these. Heck, I don’t even want you to read all this together. But if you want to freeze this and kind of look at it, I welcome that. Additionally this is all spelt out in much more detail in the accompanying Python notebook and I welcome you to read through that as well for more information.

I’ll now break these down one by one. They’re roughly in order of what you should do first to what you should do last, kind of based on the the complexity or the developer cost of doing this. For example, it’s very easy to change some solver settings and rerun your model and see how it goes. But it might be more challenging to reorganize your model or to try a different solver hierarchy because this might take a lot of developer time.

So my first suggestion, the easiest one, is to just try using more solver iterations. For very simple systems, like very cheap ones, this is a no-brainer. If it takes two seconds instead of 0.2 seconds to run 200 iterations instead of 20, then it actually converges; yeah, definitely do. That the default number of iterations within OpenMDAO is 10 so for some systems that’s fine. For other systems that definitely will not be enough. Now if you’re dealing with a much more expensive compute like maybe you have a computational fluid dynamic solver or some kind of coupled system with FEA in the loop, just using more iterations probably isn’t the best solution for you.

To use more iterations in OpenMDAO it’s relatively straightforward. Simply change your solver options. For example here you can say max iter equals three. Maybe that’s enough for your system, maybe it’s not. If it’s not, try increasing it. For example here we set it to 20. If I scroll down here, we see in the example, okay, one, two, three iterations that fail to converge. Then we see okay if we gave it 20 it converges in eight iterations. So again for very cheap systems feel free to try more iterations here, maybe the simplest way to get your solver to converge. Wouldn’t that be great?

So now if more iterations doesn’t help you I highly recommend using solver debug printing in OpenMDAO. This will tell you the states that are coming into your solver when it fails. So here’s one example of that. If you take a look at the options here, we simply say debug_print equal true. Then we also have err_on_non_converge set to true. So that means if the solver doesn’t converge, OpenMDAO will throw an error, and say “hey I didn’t converge here,” well let’s take a look at what’s going on. The debug print means that, okay, once that error happens we will save all of the states that are coming into the solver. In this case the states are pretty innocuous, they’re not that bad. If we just gave more iterations to the solver it’d probably be able to converge this system. But in other cases these states might look terrible or there might be a nan in them or you might be feeding in something from your model in a way that you don’t expect. So it’s a really good way of looking at the inputs and outputs that are going directly into the solver. To be clear these states are the exact states that the solver is seeing at the beginning of its convergence cycle. In a more complex model or something that’s being optimized where you don’t have direct access to the states as they’re going in, this is very useful to be able to look at those states.

Okay my next tip is to check your data connections within your model. You can do this a few different ways and I’ve talked about them in a few other lectures but I highly suggest using the n-squared diagram to take a look, or you can use the view connections command line tool to also take a look, kind of in a list, in a table format. I like the n-squared because it’s very graphical. Here I can see, okay, we have a cycle here, we have this backwards coupling, this is going as I expected, this is going this way, maybe my solver setup is bad, maybe I see something that isn’t connected but it should be connected. You can imagine there are all sorts of different ways that your variables could be mismatched and you’re not certain about how that is until you actually see it right here. Again this is a pretty simple case here for the Sellar problem, but for more advanced cases it really pays to zoom in and zoom out on your model and make sure everything is hooked up as you expect.

My next tip is to improve the initial guess for your state values. It’s too easy to set up a model and say “okay solver, here, give it a shot, good luck.” But if all your default state values are 0 or 1 it may not be close to the actual converged answer that you’re trying to find. If you are able, think about the physical system that you’re trying to model and give some reasonable guesses. For example if I know that I’m doing an aerostructural wing design problem in steady state cruise flight, the wings will be under some loading and that loading will be close to the weight of the aircraft. So I might as well seed the initial version of the solver with that kind of loading on the wing. Across the board anytime you’re looking at a physical system it really pays to give good initial guesses. This is much more important for Newton solvers but it also helps for block Gauss Seidel solvers so they can reach convergence faster. In the event of a very expensive compute system, this also saves a lot of computational time. And especially in an optimization context all that computational time saved really adds up.

Now in this example I’m using the circuit example case that’s present in a few other lectures. And first we just try to solve it without setting any initial values. Then we try to solve it using some good initial guesses within the coupled system. I scroll down here and take a look at the residual output in the first case where you don’t provide good guesses. My gosh, these, these residuals are pretty high, it’s not converging well. I don’t really like the look of this. However in the case where I provide some good guesses, okay, these residuals are more reasonable, it does converge well in just 17 steps. I like the look of that. So again here for any solver, but especially Newton solvers, it’s very important to give very good initial guesses.

This is the tip that I added based on user feedback and this is try checking the bounds on the state values in your model so if you do have bounds on your on your states your solver needs to respect those. If you’re seeing your solver kind of plateau out or not make any progress, it may be hitting the bounds on those state values. Take a look at those bounds and see if you can open up the space for your solver in some way.

Now my next tip is a special tip only for fixed point iteration methods. And it’s to try adding Aitken relaxation. This is just one easy fix in your in your solver setup you can say use Aitken equals true, and then your exact same setup will begin to use this kind of Aitken based relaxation method. This is especially helpful for tightly coupled models, they may or may not help convergence for your system, but it’s so easy to just set this to true and try converging it. For the simple Sellar problem it does shave off a few iterations so why not save that computational time. In some other cases it may make a diverging system actually converge, so that can be extremely helpful for robustness.

Now my next step is all about Newton methods. It’s to consult this fantastic doc page which goes into much more detail on how to debug your Newton solver. Here it goes through the two broad reasons why Newton solvers might fail. The first is that the linear solver isn’t able to solve for an update step for the derivatives. Again the Newton’s method needs accurate derivatives to converge the system. And then the next, and this might be kind of obvious, but it’s that the non-linear solver is not able to find a solution. This doc page then goes into steps about what to do and is very good about linking to other parts of the OpenMDAO documentation to kind of spell out the problems that you’re seeing. I highly recommend if you’re ever using a Newton solver of any sort, to read through this top to bottom. It’s not that long, it’s actually pretty short, it links to a lot of relevant resources that are very helpful. It gives you the mindset of using a Newton solver.

Okay we’re back with my next tip and it’s to try reorganizing your model to minimize the number of subsystems within the solver loop. So these begin to be more hard to implement. Some of the earlier tips I say “okay you just flip this flag to true or you can change the number of iterations.” These begin to discuss kind of model hierarchy, model setup, and solver setup as well. So what I mean by this is if you have a solver at just the top level and your coupling is actually only a few components or groups somewhere within your entire model, you could change the way your model is set up to only have that solver at that lower level, where it’s necessary. This could save some computational costs to make it much easier to debug your system. This is kind of an extreme case but there was one time I was debugging a wind turbine optimization and we had a solver at the top level. When in reality the coupling was only for a very small part of the system. We were able to move that solver to the smaller part of the system and much more quickly debug solving the system.

Here in this example I show an outputted N2 diagram from the pyCycle engine modeling code. If I scroll down here we have three top level groups; the design group, the off design full power, and the off design part power group. Within each one of these groups there is a coupled system. Now I’ve highlighted the final one here. However, within each one of these groups there’s only coupling within that lone group. If we put a solver at the top level here, it would have unnecessary overhead because it’s trying to converge everything at the same time. Instead we can put a solver just on this group, just on this group, and just on this group as well. And that’s exactly what we’ve done here; we see the Newton solver on the right, one, two, three different Newton solvers.

If you haven’t yet moved your solvers exactly to where they need to be within your system, I highly recommend you do this. One; it just makes it that much faster computationally and two; it may lead to better robustness within your software setup. If we asked one solver to converge all three of these at the same time as opposed to one at a time, it might not be as good. And that kind of leads us naturally into my next tip.

My next tip is to try using a nested solver hierarchy. So what this means is if you just have one top level solver and it’s not doing its job, try to resolve subsystems by converging the states at lower levels before passing them up to the top level solver. I’m going to scroll back up here because the pyCycle N2 model actually does a great job showing this. Because as I mentioned we have a Newton solver for this group right here, but then we also have a sub-Newton solver right here. So this Newton solver converges some of the flow properties within the engine. And this passes converge states up to the next level, and then the next level, where there’s another Newton solver.

So again maybe you wouldn’t see see great performance if we didn’t converge these flow properties; maybe this would crash or nan out, but because we have a nested solver hierarchy it gives more robust convergence. Now again this might take a little bit of time to architect or kind of understand the best way to do this for your model, so that’s why it’s so low down on this list. However, it becomes very powerful, not just because of computational cost savings, but because of how robust your model can be. It can begin to converge in ways that it wouldn’t be able to without this nested solver hierarchy.

Now my last tip here is one of those really kind of ticky tack ones. It’s to try removing some states from the solver loop. And what I mean by this is to simplify the coupling of the system that you’re trying to solve. Take for example an aerostructural wing. If you’re trying to make sure that everything is hooked up correctly, that it can converge, try making your wing infinitely stiff. Make your elastic moduli for the wing very, very large. Then when forces hit the wing the mesh displacement for the structural side of things would not be big. If your model still crashes or still fails after that you know something big is wrong with your model. This is because in the event of an infinitely stiff wing, there should be no displacements, that should be very easy for the solver to say “okay, we have these forces, it’s not changing the displacement of the wing at all; I’m fully converged, thank you.”

A more complex case, which is pretty interesting in my opinion, is in the case of floating offshore wind. In the past I’ve been trying to design a floating wind system. You can imagine that a wind turbine that’s floating and is experiencing wave and wind effects; there’s a lot going on here. The rotors are whipping around, they have some amount of aeroelasticity as well which affects the the kind of multi-body dynamics going on. If you’ve already had a model and you know it’s good for land-based wind and you’re trying to make sure that it works for offshore based wind, maybe it makes sense to first kind of turn off the states. Freeze them in place. And what I mean by freezing the states is to remove the consideration of them from the coupling. So in the case of this floating offshore wind turbine, the idea would be to kind of remove the wave effects. This would remove the kind of swaying, surging, or pitching motion influenced by the waves and allow you to debug the rest of your model. Maybe then you learn that, okay, my rotor model is actually causing the issue because I removed the waves, it’s not the ocean model, it’s something else within this system. This last step here is is really more applicable to very complicated systems, when you have a lot going on and you have multiple different types of states. When you have different physical meaning behind the states in your model this really pays off to kind of turn them off and on and make sure that everything is working. In the past this has allowed me to very precisely isolate the the problem subsystems within a more complex system.

So these tips are not the end-all be-all, but it’s a great way to start debugging your model. Additionally every problem is different so maybe some of these will work and some of these will not work for your model.

All right so we went through the checklist we talked about it. Let’s acknowledge this, that the solver convergence is sometimes tricky business. It’d be great if it just worked out and every solver converged exactly how you want it to the first try. I can’t remember a time that’s happened for me. So I suggest following this checklist, put some order to the madness, go through this step by step, use this kind of detailed process, and keep notes as you’re doing it. I can’t stress enough that keeping a log of what you tried and what hasn’t worked and what has worked is really powerful. Additionally six months later when you run into the same issue you can take a look at what you did in the past to help resolve it.

So let me know in the comments if there are other methods that have worked well for you to resolve some of this tricky solver convergence. As always make sure to mash those like and subscribe buttons and thank you very much for watching.