Just a quick update on our progress with Vulkan and Metal. We last spoke about this on the live feed a few weeks ago, but we’re a little further along. Here’s the summary:
- Plane-Maker and Airfoil Maker run in Vulkan and Metal.
- X-Plane runs in Vulkan and Metal up to the main menu (e.g. the app starts) but can’t yet fly or show scenery.
- The Vulkan and Metal code runs on Mac, Windows and Linux.
- The Vulkan code runs on Nvidia, AMD and Intel drivers.
- All shaders are ported.
- All of the zoo animals (abstractions around part of the graphics engine) are now complete. We killed off the last 2 or 3 since the live feed.
For the last two weeks, Sidney and I have been working to port all rendering passes to the new code, so we don’t have to go through OpenGL to fly the aircraft. I lost a few days dealing with this:
That turned out to be a single if statement that got reversed deep in the mesh drawing code during one of the porting steps, resulting in some of the runway lights being replaced with random pieces of … heck, I never figured out what the wrong mesh was, just that it came from some other unrelated part of the sim and changed as the camera moved. To make matters worse, the error only appeared on the Mac, so we couldn’t use Windows OpenGL debugging tools to find the issue.
The silver lining is that once we are all Vulkan/Metal, we’ll have at least four separate debugging tools to go after bugs like this.
We don’t have measurements of performance yet; once we can sit in an aircraft in X-Plane running Metal or Vulkan, we can at least get some initial performance numbers.
I’m headed to the annual Game Developers’ Conference in San Francisco next week.
If any folks from the flight sim community are going to be there as well, I’d love to meet up and talk shop—hit me up on Twitter (@TylerAYoung), or send me an email (my email is my first name at X-Plane.com).
Posted in News
by
Tyler Young |
Thanks to Michael Minnhaar’s unceasing work (and willingness to nag us 😉 ) official builds of WED 2.0r1 are now available. This calls for a celebratory gif!
Please note that this is a major update and the minimum system requirements have now changed to: 64 bit versions of Windows 7/10, OSX 10.9, or Linux with GLIBC 2.23. Read More
We’ve posted X-Plane 11.32 release candidate two – it contains very few changes from release candidate one. If you think RC2 changed your framerate from RC1 (for better or worse), it is imaginary.
The big item for RC2 is that X-Plane now uses our own replicating METAR servers. NOAA weather was plagued by 404s when the server posted METARs didn’t meet the date/time scheme X-Plane expected. Our own server should serve the latest weather we have, whatever that is.
Over the last few weeks we have spent a tremendous amount of developer time investigating reports of instability, crashes and performance problems, and the results have been quite unsatisfying. We really haven’t found a series of smoking guns we could fix to improve stability. We have learned some things about X-Plane’s performance and stability though. The rest of this post gets into the weeds; if you tune out (and I won’t fault you if you do) the TL;DR is: please turn on our anonymous analytics, and click “send” if you get the crash report form. The more gathered data we get about crashes, the better shot we have at addressing the issues.
Crash Rates and Plugins
X-Plane’s overall crash rate (all causes is approximately 14% – which is to say, for all of our users using analytics, for every 100 times they launch X-Plane, the sim quits in a way we did not expect or want 14 times. This number has been remarkably stable – it’s not a ton different between 11.26, 11.30, 11.31 or the 11.32 beta.
The sim quitting on purpose because of bad content is not considered a crash. For example, if you load a DSF with a missing .ter file, the sim will refuse to proceed and quit. This failure mode is user hostile in that you can’t fly, but it’s not a crash – the refusal to load is the code working according to design. While I would like to make this code less hostile to users, it’s also worth noting that these cases are ones where the author of the scenery pack would have been able to fix this if they had loaded their own work even just once. That is, they are caused by an add-on that should not have shipped.
(There is an in-between area where an add-on is mis-installed because it has a library dependency that the user hasn’t met. This is a deployment problem that really needs to be solved, but it’s orthogonal to true app crashes.)
We categorize crashes into plugin and non-plugin crashes; starting with 11.32 we actually get a statistical picture of this. A crash is a plugin crash (and you see the “we crashed because of a plugin: XSquawkBox” or whatever) if the sim crashed while executing code on behalf of a plugin or inside the plugin on the main thread.
There are a bunch of cases where plugins do not get correctly tagged – in particular, we regularly see crashes on random worker threads spawned by plugins; since we don’t know whose thread it is, we can’t blame the plugin. For example, the FF A320 crashing inside CEF on a worker thread is registered as an X-Plane crash (and we see it in our auto-reporting view) but it’s not our code and there’s nothing we can do about it.
One thing I’d like to do in future patches is improve diagnostics. The rate of actual blamed plugin crashes appears so far to be quite low, and given that we do see uncaught plugin crashes in our data on a regular basis, I think this is a case where add-on authors can only fix what they can see. If we can attribute all plugin crashes to plugins, then the plugin authors can catch their own bugs. Better diagnostics also helps a user remove a troublesome add-on in the case where that would help.
“Known” Crashes
We have a few cases where X-Plane hits an error condition and deals with it by crashing. This is pretty bad, drives up our crash rate, and is something we need to fix to be less user hostile. For example, if a PNG file is bad (either corrupt contents or the sector on disk that backs it has gone bad) then X-Plane’s response is usually to mysteriously crash. Besides being rude, the crash gives an end user no idea which file is bad, and thus no way to fix it. X-Plane ships with something like 9000+ PNG files and over 2500 DDS files, not counting add-ons, so if we don’t tell you which file is bad, you’re not going to find it by poking around.
FMOD sound bank incompatibilities is another example – if you have two aircraft sharing a byte-wise copy of the same FMOD data (e.g. command-D duplicate the C172) loaded at once – then our FMOD loading code fails and then registers as a crash. The code is working exactly as we designed it, but the design isn’t robust enough. As we’ve learned, in the real world users duplicate aircraft (and their FMOD packs) on disk all the time.
The crashes in this category are things where we’re being user-unfriendly and better code would make these problems go away or leave users with a way to actually fix them. But they’re not case of “weird stuff inside the sim blew up.”
Persistent Stability Problems
In the crash data we do also see a few persistent stability problems. There’s some kind of crash in the ATC system that we’ve seen for a very long time but we don’t know how to reproduce. It’s a case where if enough users do enough random stuff, we hit an edge case in the ATC system that isn’t handled correctly. The solution here is to embed more diagnostics at the crash site until we can understand it from the reports we get from users. Please hit “send” when you crash – don’t worry about filling in the fields – it’s the report itself that we need.
We also see a lot of crashes inside the OpenGL drivers, from all of NVidia, AMD and Intel. Because the IHVs don’t share symbols and source code with us, we really can’t tell what went wrong in these cases.
My hope is that with Vulkan we’ll have better options for in-driver crashes. With Vulkan, they redesigned the error checking model: error checking is a feature you enable (via a configuration option at app startup) that brings a layer of code in on top of the driver to check what the app is doing. With error checking off we get the fastest framerate, and with error checking on we get slow framerate (more error checking means more slow) but some really great diagnostics.
(To put this in perspective: when Sidney ran the Vulkan version of Airfoil-Maker on Linux with the wrong driver installed and no error checking, it rebooted his entire Window server! So no error checking really means: no error checking.)
Since error checking is optional and selectable when the app runs, we could put an option into X-Plane to run in “safe mode” – if a user is hitting persistent stability problems in the driver, that user could turn on validation and possibly capture an error in X-Plane itself that would otherwise just be “the driver crashed.”
Running Out of Memory
I’ve worked with a few users to try to track down the out of memory problems we’ve heard about, and there isn’t an obvious pattern here. Some users report running out of memory in 11.30, but when put back on 11.26, find that they still run out of memory. We get more out-of-GPU-memory complaints on 11.30, but that might be because a few cases in 11.26 that were crashes due to running out of memory now report the problem in an orderly way. In 11.26 they were just mysterious crashes with a “send” form.
In the cases we’ve seen, the user running out of memory was often…actually running out of memory – that is, the surprising thing is not the crash but that X-Plane ever worked at all on those settings. The fundamental problem we face is that we have no visibility into what the OpenGL driver is doing with GPU memory. The OpenGL tries to manage memory no matter how much we ask for, and if it fails, we don’t know what went wrong.
The good news is: we have much better options for Vulkan. With Vulkan, we manage memory, which means we know what’s going on, and we can take steps to avoid out of memory crashes. If we do run out of memory, it should be for much more obvious reasons. We’re still analyzing what we can do about memory with Metal, but the choices should still be better than OpenGL.
The only advice I can offer now if you are seeing persistent memory crashes is: turn your settings down or use less add-ons. If you push X-Plane to the limits of your hardware, it may work for a while and then fail.
Performance
Sidney has looked at a lot of performance data from users who reported low framerate, and in almost every case, the performance has been as-expected. The most common case we see is users with relatively low single-core CPU performance hitting low framerate at high rendering settings while their GPU is bored. To put some numbers on this, if your CPU’s single-core geekbench score is down around 2000, you are almost certainly CPU bound, way at the bottom of what’s okay for X-Plane, and a new GPU won’t help.
As of now, X-Plane cannot use large numbers of cores (e.g. a 32-core machine is useless) and gets only limited performance boosts for framerate with multiple cores under some circumstances. This is something we are working to change in the future, but it’s not going to change quickly. If you are looking to improve performance with hardware, single core speed is still the most important metric.*
In a few rare cases we saw performance that was disproportionately bad compared to what we’d expect from an old CPU. We’re trying to gather more data from these users but the case is rare enough that we haven’t gotten a useful report yet. If we find a smoking gun, we can act on it.
Like memory, Vulkan will help with diagnosing performance. With Vulkan, more of the code is written by us and the Vulkan code we run has very predictable timing. So when we get complaints about performance, we’ll be in a much better position to understand what is slow and why.
What’s Next
I’m actually not sure what the next patch will be, but we do have a bunch of bug fixes to 11.30 waiting to go out once we have stabilization under control. I also have a pile of bugs that I have not yet fixed that are high on my priority list where something in 11.26 stopped working in 11.30. So if you have filed a bug that’s not fixed, we have not forgotten about it – it’s either next to come out or possibly on the short list.
* Yes, we realize that this dependence on single core speed is bad. It’s just going to take time to move to Vulkan and then offload the single thread.
We have received reports in the past about X-Plane 11’s seaplane dynamics when the aircraft is in the water.
If you would like to test sea plane water dynamics with an experimental build, please email Austin directly – he has some fixes for seaplane behavior and he is looking for early testing.
Update: fearless leader^H^H^H^H^H^H^H^H^H^HAustin says he is only looking for testers who have experience flying a real world small float plane with pontoons, so that he can get test feedback from someone who can validate the reality of the physics.
(You will need to install a custom build from him on top of X-Plane 11.31 to do this.)
Update 2: fearless leader^H^H^H^H^H^H^H^H^H^HAustin say he has enough testers. If you’re not getting builds now, you’ll have to wait for a public beta.
This blog post is more or less a “stern talking to” for plugin developers, but before I go there, I want to acknowledge that we (Laminar) screwed up the docs here in a way I didn’t even realize until working on a bug report. A decade ago when the plugin SDK was young, we did have clear docs that the plugin APIs were not at all thread-safe. Sandy and I were also on the plugin dev email list and we’d administer a Scottish-style beating to anyone who even started typing the first few letters of “threading”.
However, this document was lost in the migration from the old XSquawkBox server to developer.x-plane.com. I’m working with Jennifer on new docs now, and I apologize for the thrash any plugin developer gets hit with from not knowing the threading guidelines.
With that in mind: the X-Plane plugin API is not thread-safe. You can only call plugin APIs on the thread that called you. No exceptions!
The plugin SDK was invented for X-Plane 6. At the time, multi-CPU and multi-core hardware was totally unavailable to the flight sim community. Apple hadn’t released the dual-G5 yet, and the Pentium D hadn’t come out yet. There was no point in thinking about multi-core because there weren’t multiple cores.
The plugin guidelines were therefore set up very simply: the API is single threaded; call us back on the thread we call you. The SDK internally has no locks or handling of re-entrancy and has no model to cope with resource sharing or data integrity across threads.
Furthermore, X-Plane’s crash detection system is not meant to categorize other-thread crashes, so you can’t easily tell that your plugin is crashing the sim. You might even crash a different plugin, or some internal part of X-Plane.
One more way we’re not thread safe that might not be obvious: when you read a dataref, you are executing code from another plugin. You can’t do this on a thread if the XPLM isn’t thread-safe, but you also can’t do this if the other plugin isn’t thread safe.
Stopping the Bleeding
In X-Plane 11.30 we made a change to stabilize the sim: we added code to actively detect and ignore plugin calls to the XPLMTerrainProbe APIs from background threads. We did this after seeing automatically reported crashes that turned out to be due to plugins calling the SDK terrain probe API from worker threads. By ignoring the call, we avoid the crash.
My plan is to start doing this for all plugin calls at a patch point in the next few months. The problem here is that there’s basically no such thing as a benign threading bug – the table stakes here are the complete destabilization of the sim.
If you are a plugin author and you are using background threads to call XPLM APIs, please stop doing this now. Please plan to fix this in your plugin as soon as possible. The change will probably not make things any worse – my current idea is to no-op the calls, just like we did with terrain probes. But if your plugin is using these async calls and sometimes succeeding but sometimes crashing, you’re going to stop seeing the crashing and the sometimes lucky “success”.
I’m working with Tyler and Jennifer on a docs update now – hopefully this week all of the docs should be completely consistent. But they’re going to say what I’ve said above: no calls to the XPLM on worker threads.
Will We Ever Be Thread-Safe?
Sandy and I did some work back in the XPLM 2.0 days to start making the SDK partly thread-safe. This work was not completed, but the idea was to at least make a small number of APIs callable from worker threads. For example, by making flight loop callbacks schedulable from worker threads, a plugin could “wake up” SDK code from an async IO callback. That idea still makes some sense and we may get there someday.
For expensive tasks, we’ve already made API changes to address the underlying performance problems. For example, you can’t load an object synchronously from a thread, but you can ask us to load it asynchronously and call you back, and we use one of our threads to offload the work.
Some APIs I expect to never be thread safe. For example, we can’t sanely provide a threaded API to datarefs because we can’t promise that the plugin on the other side of the call is thread-safe. Given that a dataref read function can call other datarefs or other arbitrary plugin code, the opportunity for dead-locks is limitless.
Tyler’s Fever Dream
I should mention something that Tyler has been looking at as a future SDK initiative. This is somewhere between pie-in-the-sky and a fever-dream: plugins running asynchronously in separate processes.
The idea is to have a version of the main SDK APIs that are “fundamentally asynchronous” (e.g. the response comes later and that’s baked into the contract). This would allow plugins to run in another process, with results being communicated via IPC. Out-of-process plugins would have a bunch of advantages:
- You could write a plugin in pretty much any language we can write a binding for. Right now, plugins must fit into an unmanaged DLL inside X-Plane; this would allow the full “async API” to be used in .net or even Matlab. The requirement would only be that the plugin “app” environment be able to host unmanaged C code. This is a much less difficult requirement to meet.
- Plugins would be isolated by process boundaries; if a plugin crashes, the sim keeps running. Plugins can’t go around trashing each other’s memory.
- Use of multiple cores for plugin CPU processing is basically free, because the plugins can execute in parallel.*
- Plugins can each have their own CEF instance – or any other library that doesn’t like to be multiply instantiated in a single process.
One of the major hurdles to implementing an API like this was sharing graphics across the process boundary. As it turns out, we have to solve this problem for Vulkan anyway – the same APIs that share surfaces between Metal/Vulkan and OpenGL are IPC-friendly from day 1. So a 2-d plugin OpenGL window in Vulkan doesn’t have to be in X-Plane’s process.
I don’t see async plugins ever replacing 100% of in-process plugins, and this isn’t a plan to kill off the current C API. But I do think out-of-process plugins would be a much better fit for a wide range of plugin tasks.
* Users familiar with game programming will appreciate one of the fundamental dangers here: plugins in other processes might steal oversubscribed cores from the sim itself, causing stutters and unreliable framerate. Tyler and I have talked about that a little bit, and have some ideas for taking plugin background work into account and giving plugins ways to opt in and say “please run this background work for me at a time that won’t hose framerate.”
EDIT: See the recording of the Q&A session here on YouTube!
We’ve been posting about this on social media for a bit, but realized we hadn’t talked about it here.
Today at 11 am Eastern (16:00 UTC; click here for time zone math) we’ll be doing another live Q&A on our YouTube channel. We’ll be taking questions in the YouTube comments, but if you can’t make it live, we’ll try to answer a few questions from the comments on this post.
In case you missed the first, second, and third (part 1 & part 2) rounds of this, this is a streaming broadcast featuring:
- Austin Meyer, owner & creator of X-Plane
- Ben Supnik, desktop product manager
- Chris Serio, mobile product manager
- Alex Unruh, art director
- …and a handful of other special X-Plane friends.
Posted in News
by
Tyler Young |
X-Plane 11.32 is available as a opt-in public beta for Laminar and Steam users. If you are seeing the sim randomly crash more frequently than before the X-Plane 11.30 series, please try this beta.
NOAA dropped non-HTTPS access to weather data today, causing Real Weather to fail; this is fixed in this beta build. The NOAA issue should not affect any weather add-ons, nor will the fix.
Edit: 11.32 release candidate 1 appears to fix only part of the problem; upper winds are all “zero”. I’m traveling now so it will be a few days before this is fixed.
Most of the crashes we’ve seen have been the GPU driver failing to get us memory. We don’t know if 11.32 will help, but we have tried a change to how we work with the driver that is more like 11.26 and more conservative, that we are hoping will be more stable.
Edit: there have been very few auto-reported crashes with 11.32 – less so than the number of “it crashed” blog comments! Remember to it “send” on the auto-reporter if X-Plane crashes; you don’t need to enter any data, just hitting send captures your Log file and where the sim crashed, which is what we need most.
Inevitably after a large update to X-Plane like 11.30, new bugs go undetected during the beta process. We do a quick update to try to kill off these bugs as quickly as we can, e.g. X-Plane 11.11, 11.26, 11.31.
Yesterday we shipped 11.31. Unfortunately this isn’t the end of X-Plane 11.30 bugs, and in two cases, 11.31 appears to have introduced new problems.
We are working now on X-Plane 11.32, and our rough plan is:
- Fix all the really serious bugs (crashes, performance so bad you can’t fly) and ship that ASAP.
- Fix the rest of the lingering 11.30 bugs.
- Take a moment to question life choices.
If you are seeing crashes in X-Plane 11.30 or 11.31, the most useful thing you can do is to auto-report them, preferably with your email address in the report, so that we can contact you to run special builds.
The rest of this post is an update on the state of some of these bugs.
Driver Bugs
Older Intel OpenGL drivers contained a bug in their pre-processor that caused them to reject our HDR shadowing shaders. I rewrote the shaders to work around this bug for 11.31, and the rewrite has exposed a bug in OS X OpenGL drivers from 10.10.5. I have already fixed this and confirmed the fix with users who still run 10.10.5, so this bug fix will ship in 11.32 for sure. In the meantime, turn off HDR to work around the problem.
Also, if you are a Mac user running 10.10.5, consider updating to a newer Mac OS!
Weather Crash
The most mysterious crash we see is new to 11.31 – a mysterious crash in the weather code. This crash is mysterious to us because nothing changed anywhere near this code from 11.30 to 11.31. The crash reports also don’t make a ton of sense – Sidney and I spent a few hours last night staring at disassembly and being baffled.
I have been contacting users who auto-reported this crash, and fortunately the response to running some test builds for this has been quite positive. I’m hoping to narrow down the change that caused this so that we can wrap our heads around what went wrong.
This weather crash is the one I am most concerned about because it is both unrelated to anything we changed and introduced in 11.31 – a tiny release designed to stabilize, not destabilize the sim. I don’t have a work-around at the time because I don’t have anything like causal steps to reproduce.
Please do not contact me with “I have a crash, can I help” – if we haven’t seen your crash report, there’s no way for you to know if the crash you have is this one or something else. If your email address is in your auto-reports, we can ping you.
Running Out of Mapped GPU Memory
The largest source of instability we’ve seen recently comes from 11.30, and it’s the GPU not being able to provide X-Plane with mapped memory. Since we radically changed the rendering engine in 11.30 (as part of our port toward Vulkan) I am not surprised to see a major GPU problem, but it is still a top priority to fix it. This bug is equally common in 11.30 and 11.31, appears to affect AMD and NVidia windows users (but perhaps AMD more – we’re not sure), but isn’t something we see on our lab machines. Sidney and I have some ideas on how to at least work around the problem so that people can fly.
Performance Problems
We’ve heard a lot of chatter about performance problems and complaints about performance loss, and we are collecting detailed performance reports from users so we can see what’s going on. Since the performance tests are automated, it’s relatively quick for us to gather this data.
So far, while we have seen a lot of mediocre performance (and mediocre hardware!), we have not yet measured via tests the kind of “catastrophic” performance problems that one might expect from the amount of complaining on forums, etc. That doesn’t mean there isn’t a serious problem out there, it just means we haven’t seen it.
For performance, my view is: if we find something truly awful (e.g. the sim used to run at 25 fps and now runs at 5 fps after the update), we’ll go fix it. I don’t want to make anyone’s copy of X-Plane impossible to use.
But if someone is seeing a 5-10% loss of performance, at this point it’s better to ignore it, because the same engineers who would analyze and fix the performance problems (Sidney and myself) are the ones who are doing the Vulkan port, and the Vulkan port gives us a better shot at fixing performance than trying to beat more performance out of the OpenGL driver stack. In fact, even analyzing the problem is more possible with Vulkan, because the OpenGL drivers do a number of expensive things that they give us no visibility into.
To give you an example of what I mean: we captured performance data from a user with an older Ryzen 16-core CPU and a GeForce 1080 GTX. Our analysis showed: 20-30 fps in the highest fps test (basically everything maxed out), 50-60 fps in medium settings, and 60-70 fps at the lowest settings. At all times, the GeForce 1080 was not maxed out and the CPU was the bottleneck.
Here’s the thing: the older Ryzen CPU has a single-core Geekbench score about as good as my 2014 iMac – that is, it’s not a top tier CPU, it’s old, and it was never optimized for single threaded performance. The user’s system is unbalanced for X-Plane (a lot more GPU than needed for that CPU), and 20-30 fps with everything maxed out is all I would expect from a CPU of that performance level.
So there’s no sign that the machine is underperforming our expectations. And the obvious thing to do to make the system faster is to be more multi-core (since the machine has 16 cores). And that means: focus on Vulkan.
So when it comes to performance, I’m going to beg patience and try to not lose momentum on the Vulkan/Metal port. In the long term it’s a better way to help everyone go faster. If we find something truly bad though, we will go and investigate more.
I don’t know what the ETA is for 11.32, but my plan is days, not weeks. The crash bugs are our top priority right now.
X-Plane 11.31 is available to test – run our installer, update X-Plane, and check “get betas”. Full list of bug fixes here.
X-Plane 11.31 contains bug fixes that we could get done quickly, that almost made it into 11.30, and that were high priority, e.g. crashing on some Intel GPUs is fixed, and the external visuals don’t randomly lose sync.
We do still have some other fixes to get in at a later time. For example, there are a number of particle replay bugs where X-Plane isn’t saving the data needed to replay the particle effect; we will patch those in a separate patch where we can add more data to the .rep/.sit structures.
Some users have reported performance bugs, and we are gathering data and looking into them, but I’m not treating them as a five-alarm fire. We’re at a point where we are making rapid progress on the Vulkan port, and I don’t want that progress to grind to a halt as we investigate OpenGL performance problems; we already know that the long term solution to OpenGL performance problems is going to be Vulkan, not stabbing the OpenGL code repeatedly with a fork in the hope that it’s better behaved.
If we find something blatantly wrong with the OpenGL code in 11.30, we’ll fix it, but when it comes to ensuring performance, the very fact that the engine is OpenGL and not Vulkan limits us. At this point the IHVs are making their best performance analysis tools for Vulkan and Metal, not OpenGL, and Vulkan provides an API where the drivers performance is deterministic. (What we’ve seen so far is differing OpenGL performance for basically the same hardware drivers.)
X-Plane 11.30 was a really big code update to X-Plane – it had a major update on our route to re-writing the rendering engine, hence all of the rendering bugs we’re fixing. Over the next few updates I think we’ll have less code change as we stabilize, paired with art updates. We’ll take gateway airports and we have some landmarks ready to go.
Posted in News
by
Ben Supnik |