We were discussing a particularly exasperated sounding bug report on one of the internal Slack channels when I realized that this might not be obvious: a crash with the error message “pipeline must not be null” – it’s one error message that covers a whole category of bugs. We fixed one major case (skycolors were broken) in b1 and added one major case (custom billboard lights on aircraft) in b2 – conservation of pipeline bugs!
Null pipelines are a new category of crash in X-Plane 11.50, so here are a few notes on what this error is and what you can do to help us fix them (and what you don’t need to bother with).
What Is a Pipeline?
A pipeline is just the Vulkan and Metal term for a shader (plus some extra gak (1)) that we use to do our drawing.
X-Plane 11.41 would ask the OpenGL driver to build shaders as it needed them, and then the driver would turn those GL shaders into hardware pipelines on the fly as it got presented with different scenarios.
Not 11.50. We build everything up front. Vulkan has two rules:
- Using a pipeline is fast.
- Building a pipeline is not fast.
This is a great pair of rules for us – it means if we build our pipelines at load time, we are not going to have stutters mid-frame.
Why Are We Crashing?
There is one down-side to the 11.50 way of doing things: if we don’t build all of the pipelines we need up front during load, then when it comes time to draw, we’re toast. That’s what a “pipeline must not be null” error is – it just means the loading code did not create the pipeline the drawing code needs.
Why not just build every pipeline we could ever possibly need? Load time. X-Plane can build hundreds of thousands of pipelines depending on rendering settings, scenery packs, custom aircraft, etc. We actually did “just build everything” early in our development process and the sim could take half an hour to load.
So we try to build only the pipelines we need. If we build too many, we slow load, and if we build too low, you see this error.
What Do You Do When You See This Error?
On Windows and Linux, it’s really easy: close the alert box and when the auto crash report form comes up, please press “send”. Don’t bother with you email or a message; everything we need to kill this bug is already in the auto report! (Jennifer’s edit: please DO include your email address with any auto report if you want us to be able to confirm we have your specific report! This is the only way we have of identifying who it came from.)
The good news is: the auto crash reports for the pipeline crashes are insanely easy to find and fix.
Mac users: if you see one of these, we need the Apple crash report – please send it in a bug report.
(1) for the plugin developers that know some OpenGL: a pipeline is basically a GLprogram (shader) plus a bunch of the fixed function state that goes with it: blending, depth/stencil, vertex format, FBO format, and some rando stuff thrown in.
The idea is to have the pipeline contain so much information that there is no risk that the driver has to build two hardware shaders for one Vulkan shader (to cope with other fixed function state) no matter how weird the hardware is.
On lots of actual hardware, the pipeline has stuff that’s not actually in the shader, but some surprising things, like vertex format, actually often are.
I’m curious about something. From the article I understand that you try to predict what you need, but if this fails you get an error. If I understand this correctly, is this really a viable solution for the future?
From an amazing little piece of software on Linux called DXVK (a D3D9-11 to Vulkan) translator I know that caching pipelines is a viable option. Obviously DXVK has no other option to do so, but building up a cache at runtime eventually eliminates all stutter and there is no risk of ever failing due to a non-existent pipeline. This could even made hybrid by populating part of the cache up front and leaving the not so common stuff to build at runtime.
Steam actually made this a core feature of the Steam client (on Linux anyway) by distributing pipeline caches (and shader caches for specific drivers) as product updates. This might be a little excessive though.
I may be misunderstanding things though.
I’m not sure which part of the strategy you are asking about.
We _do_ cache compiled pipelines where we can, and that’s why second loads are faster than first-loads. But they do have to get compiled once, somewhere, and we do have to have some idea of what the total universe of pipelines is that is not “all possible ones” because that one is in the millions.
And we can’t just have them online because we can’t ensure that we have access to all GPUs and driver versions that might ever run X-plane.
My question was about the “Why Are We Crashing?”. That’s the part that seems strange to me. With the ability build and cache pipelines on the fly, I still don’t understand why a missing pipeline would lead to more than a one time stutter. Unless the crash is deliberate in order to get the information, in which case the only thing that makes sense to me would be to find the most occurring ones. But crashing doesn’t seem like the best solution for that. So basically the paragraph failed to explain its headline.
The crash is deliberate to get the information. If we have code to build them on the fly and keep flying, we have two bugs: a stutter bug AND a missing pipeline.
>> So we try to build only the pipelines we need. If we build too many, we slow load, and if we build too low, you see this error.
This doesn’t sound like something where you are 100% certain that you will get all pipelines you need before you start a scene once bugs are ironed out. To me this sounds like “I have a slider where on one side you get fast load time with frequent crashes and on the other long load with barely any crashes.”
Can you, with the right code, 100% predict all the pipelines for a given scenario? Does changing rendering settings during flight force pipeline builds? Is XP11 static in the sense that no 3rd party thing can pull in objects during the scene that weren’t anticipated?
With all the complexity and extendability that XP11 undoubtedly requires, I am amazed that you can predetermine all possible pipelines that are needed during load time.
Yes, yes and yes.
1. With the right code, we can predict 100% of the pipelines. It is definitely deterministic.
2. Settings changes force a rebuild, so there’s no edge case there.
3. Third party content simply changes the subset of all possible pipelines that we will use – that universe is bounded and finite and NOT “sized” by content – e.g. the total pipeline count is calculable and known in the unextended sim and doesn’t change when third party content is added.
Basically for each material introduced in ANY art asset anywhere, we _may_ build one or more pipelines as needed if they don’t already exist, based on the cross-product of the current rendering settings and that material. We de-dupe, e.g. your 400th orthophoto texture isn’t making new pipelines, it’s the same as the other 399.
Could you add a date and time at the beginning of a blog post entry? I appreciate that this blog exists.
I think it is useful to teach Mac users how to locate the “crash report” as it is not common knowledge. The “crash reports” can be found by running the “Console” and then selecting “Crash Reports”, then finding the latest XPlane crash. The report can be sent by clicking “Share” and then “Mail” or by copyin the whole text into a text editor and then saving as a text file. The Share/Mail method creates an xml file, while the text method creates an ordinary text file; if Laminar can decode the xml file, and I believe you can, it is easier to share then to go over the process of creating a text file.
Hi Ben, thank you for taking the time to explain these things as you go along, it is interesting and informative to read and understand. I have no programming knowledge whatsoever so I can only consider myself a user of technology and not a contributor in any way.
I know you’re a very busy man indeed but I hope you don’t mind me just asking a question about xplane, I’d rather hear it straight from the hoarses mouth so to speak!
I completely understand the improvements that vulkan brings to the GPU and I’m very grateful to have them, but my question is what if any optimisation options exist for the CPU?
On my system the frametime is always very high and always in the red compared to the GPU which is very low and always in the green. Is there anything that can be done from a programming perspective to improve things on the CPU side or is purely a hardware only solution?
Many thanks and regards, Andy.
Wait, this is exactly backward. Vulkan brings optimizations almost entirely to the CPU, not the GPU.
The GPU’s performance is mostly limited by the shaders we write, the rendering passes we choose to render, how big the framebuffers we render to are, and sometimes the blending options. This stuff has changed a little bit in Vulkan, but not much. To the extent that it’s better, it’s only better because with much better tools, we can see what we’re doing more easily.
The CPU’s performance is where the win is – the CPU spends some time in X-plane code and some time in the Vulkan or OpenGL driver building up instructions to send to the GPU to execute. The cost of building up the instructions was really big in OpenGL – seeing this take 30,40,50% of a frame was not uncommon. These costs are _much_ lower in Vulkan, allowing the CPU to render a frame in less time.
We have also fixed a number of CPU performance problems in our own code – once we got Vulkan running it became obvious where our worst pain points were, so we fixed those too. The result is big improvements in CPU time.
The same is true for Metal as well – same dynamic – faster CPU frame time because less time in the driver making command lists, no real change in GPU time.
So if your machine was CPU bound (and many users who have a nice graphics card are) Vulkan is a big win. That’s why we see users with Vegas going “wow” – that GPU was bored waiting for CPU instructions via the slow AMD GL driver. With Vulkan, the CPU can tell the GPU what to do fast.
I have seen some Mac users go “this isn’t faster at all” and this isn’t surprising. Apple doesn’t ship a lot of GPU power relative to their display resolutions, so if you run X-Plane on a Mac at the MAX screen res, odds are the GPU can’t keep up. E.g. my 5K iMac runs pretty well at, say, 1440p, but there’s no way this GPU can run at 5K with any kind of perf. So my _suspicion_ is that these users are GPU compute power limited and that hasn’t gotten any better. We’ll know more when we do more serious perf analysis of user machines.
There is still more we can do with the CPU in the future – there is more code that can be optimized, and now that we have a driver that is more multi-thread friendly, we can start doing some of the build-the-frame work on multiple threads. We are NOT doing this for 11.50 but we do have more room for improvement.
IF you aren’t seeing CPU time get better, that’s something we can profile later in the beta.
Thanks Ben, I really appreciate you taking so much time to compose such a detailed response, it’s much appreciated.
I’m sorry but I misunderstood completely, I was under the impression that vulkan was bringing improvements to the way the GPU was utilised, my bad!
I’m a vr user with a reverb and I’m really struggling to get a smooth experience running the orbx uk scenery. I installed VR FPS so I could see what is happening in game and I observe that the GPU has a very quick response time whereas the cpu has a very slow response time compared with the GPU which is why I assumed that it was the CPU which was slowing down the experience.
I have a 2080ti with 11gb, 32 ram and my intel is clocked to 4.8ghz.
I’m running the reverb at default resolution (2160×2160) any less and the image is too blurred.
I know this is still in beta so maybe I’m just already getting the best that my hardware can offer.
I’ll see how things progress with the betas moving forward.
Thank you for your time and assistance, especially impressive considering it’s a Saturday! 🙂
Cheers.
While experience with Vulkan has been positive, I too am seeing slower CPU response times than GPU: I run a 3x1080p monitor setup with 70 FOV on each. Fully understand that this is pushing my system to its limits. I observed the CPU (9700k 4.9) was slower than the GPU by a factor of 3-4. CPU would be in 0.0441 while GPU (1660Ti) times are around 0.0115. See some blurry textures but I think that will get fixed eventually, but I do see lower utilization of the GPU in terms of VRAM in the VRAM profiler. (Device total 3.82; device used 1.77; device unused 2.05; 20 fps). Looking forward to future releases!
Hi Ben
So what do you do if the auto-crash reporter does not show due to a bug in itself?
Very seldom have I seen this auto-respond feature
Most of the time if the crash reporter doesn’t show it’s because a plugin crashed. We DO sometimes have the crash reporter fail because the crash is early at startup – in that case we have to get a mini dump from the user directly and it’s very time consuming.
Hi
i don’t know why, but after crashes i don’t get the crash reporter.
Is there a setting i have broken ?
Thank you
No. If the crash is plugin-related we don’t auto-report.
What is the ”official” recommendation for Threaded Optimization with Vulkan? Sorry, if this has been covered earlier.
I don’t think ‘threaded optimization’ does anything with Vulkan.
Why do you crash the software instead of doing a fallback, workaround or just “catching” the error?
Usually because:
1. We cannot fall back – by the time we crash things have gone so wrong that it’s too late to recover and/or
2. We totally expect to fix the issue so the most useful thing is to capture the crashes so we can fix them until there are none.
Pipelines are definitely in this second category! Every pipeline error we get via the crash reporter is fixable – I found four this morning and three of them are fixed for b4 — still working on the last one.
There are crashes in this first category; “why can’t you just catch the crash and work around” is actually a very complicated subject in computer science, but the short answer is: when a program crashes you have to restart it because the thing that _caused_ it to crash is unknown and has _already_ screwed the process up enough that you need the clean slate to get back to not crashing.
For those at home who are programmers and find this answer surprising, there was a cppcon talk by someone who worked on fighter plane systems…their view of reliability is instructive.
I believe this is the link you are referring to:
https://www.youtube.com/watch?v=sRe77Mdna0Y
Very informative indeed.
Cheers,
Erwin
Yep that’s the one. One of the things that makes my thinking about code different now from twenty years ago is that I used to think I could ‘reason’ about why my program had crashed.
I now understand that if my program has crashed, _by definition_ one of my assumptions about the program operation is _wrong_, and therefore there’s a fair chance that whatever code I write to respond ‘gently’ to this crash is going to do something very different from what I intended.
So systems that must be robust need a hard barrier inside of which the earth can be scorched, e.g. a process restart when the OS provides very strong process-to-process protections. Erlang is designed around this (or so Tyler tells me) – the VM itself provides guarantees about the execution of the code such that you can kill a light-weight task and get the same “everyone else is fine” guarantees.
With C++, within the process once your invariant is broken, there is no limit to the broken. Maybe a thread crashed _while doing an allocation_. Now your heap and allocators are all broken, which means all of your STL containers are all broken, so is std::string, it goes on. You eventually realize that only restarting the process is going to work.
As a side note, someone else mentioned having async XLua – I think Philipp’s preferred architecture: add-on plugins _not in the same process_ is actually the future of a lot of plugin code. If you’re going to do the work to have a plugin talk to the sim asynchronously (to get multi-threading) you might as well put it in another process and get the crash-proof win. 🙂
On my first day at Bell Labs last century, my mentor explained to me two types of systems: One is like banks — you’d rather have them shutdown than be wrong. The other is like phone systems, they can make a few mistakes, but can never be down. XP contains bits of both, I suppose.
Keep up the great work guys!
Oh that’s a really good analogy.
The fascinating thing is: the WAY the phones can make a few mistakes but never be down IS by being like banks…just not all at the same time.
Someone’s gotta dig up the “hello Bob” Erlang video now. 🙂 Sidney??
https://youtu.be/xrIjfIjssLE
Hello Mike…
Hello Joe.
zOMG.
“As a side note, someone else mentioned having async XLua…. ”
Ill take that as a green light.
XTlua first build, with good tail winds, should be on the org “plugin devs” subforum around next weekend.
“there was a cppcon talk by someone who worked on fighter plane systems…their view of reliability is instructive.”
Sounds interesting! Is there a link, or how to google to find it? 🙂
You mentioned that you had early development versions that built all the pipelines. Is there a command line switch that will gen all the pipelines? After all we are beta testers ;o helping you guys with these issues (no one is just flying right?). If we get the null error and then run it with all pipelines (-allpipelines, knowing that it will take a long time) and see if that “fixes” things. Maybe go back and do a -regenpipelines to go back to the default number of pipelines. Or do you just get the crash report and add the missing pipeline to the list of default ones created at startup?
My guess is the on-the-fly option is non-functional – it’s been off for months and probably won’t compile now since we’ve refactored the pipeline system a few times. We don’t really _need_ more info – once you auto-report a null pipeline crash, we’re golden – the auto-report has your log file, and the log file lists the pipeline we wanted as well as the ones we have and the parent shader. The crash report has the call stack up to the crash which pretty much always identifies what was drawing. These are the _easiest_ bugs to triage that we face in Vulkan/Metal.
So they’re very user-visible and annoying, but they’re gonna get got very fast.
I wonder: Whouldn’t most users be happier with this strategy:
Provide a “dummy pipeline” that can be used if some pipelineis found missing.
Also when a pipeline is missing att it to a queue of poipleines to compile, and compile them in the background. So next time it is needed, it’s there.
I think most users would prefer a possible stutter over a program crash.
Happiness is not the goal of the beta program.
Killing as many bugs as possible as fast as possible to get the sim to a shippable state is the goal of the beta program.
If I thought we might be stuck with null pipeline bugs forever because this was some kind of unfixable problem, then proxy-and-compile in the background might make sense – this strategy is used in some engines where it is basically impossible to predict work-load ahead of time. (There’s a Nintendo emulator that does this.)
In our case, it doesn’t make sense to spend the engineering time to build that widget – we aren’t going to need it.
So are you saying x-plane knows what it needs to compile ahead of time, and can *guarantee* it has enough time to compile it all? – Not just ‘just in time’ but ‘just in plenty of time’?
Yes in two ways.
1. Before the start of the flight, the start of the flight is _held off_ until all compilation finishes.
2. During flight, everything new loads in the background, and isn’t allowed to be used until compilation completes too.
Does 2. mean motion-stalls have been swapped for slight scenery-pop, or slight delay of lighting changes? (in the realms of ‘you aint gonna notice’ i imagine)
Can i also guess that these null-pipeline errors are finding the parts of code that assume assets never need to be waited for, and you just need to add ‘wait till its ready’ code?
It’s only a scenery pop if you were flying SO fast that you got to the new region before all processing is done. And that was already true because DSFs load in the background, same with autogen. So it’s possible you could see scenery come in by flying _slightly_ slower but I expect this is mostly a problem for fighter plane pilots.
The null pipeline errors are just bugs – mismatches between the pipelines used and the pipelines generated.
“1. Before the start of the flight, the start of the flight is _held off_ until all compilation finishes.”
On a somewhat related note: Would it be possible to have an option some day to _NOT_ hold off until everything is there, but to start the flight as soon as possible? That is basically as soon as the sim is able to render the user plane and the mesh of the single DSF the user plane is on (so it isn’t suspended in air)?
I know this sounds weird at first look, but my reasoning is this: on one hand, initial loading can take really long with photo scenery on an HDD, but on the other hand, the first few minutes of a flight are usually spent staring at the panel flipping switches. So the sim could use this time, during which the user probably doesn’t care much about the outside world anyway, to finish loading, instead of holding everything up.
Just being amazed by the amount of homework being done by the development team, and being curious by the domain of the general programming problem (being a middleware programmer myself )…
How far does the logic pre-compiling ahead of time (especially on the first load) can go or predict…
If a user installs a new addon (say a new scenery in Australia – on the far another side of the planet) – which has its own set of textures and shaders.
Is it that when we start x-plane and it detects that new addon- a trigger will run the pre-compilation?
Keeping in mind that x-plane does not know what airport or flying plan the user intent.
Before your flight starts, we preload everything during that preload time. So at t=0 everything is there.
Anything that is added to scenery _during_ the flight is loaded in the background, and the GPU prep (compilation of shaders, loading up of textures in VRAM, bla bla bla) is also done in the background. So that background load is not considered ‘done and ready to draw’ until it’s ready all the way to the GPU.
Thanks yet again Ben. Excellent work. I have a 10 year old PC with upgraded everything and the increase in FPS is huge. Never seen anything like it before. Can turn sliders right up. Stability, stutters etc, no problems. You are on a winner here. One thing though, I do get some error messages with Orbx photoscenery. Is there any issue with Vulkan and photoscenery? Apologies for using pipeline topic to raise photoscenery! Cheers.
I have been checking out the Microprofile on my system (iMac Pro). Renders are generally about 33ms (about 30fps), but I see many seemingly periodic spikes to 50ms caused by a “Swapchain acquire” that takes about 18ms. Any idea why this happens?
I would love to see a blog article that explains the different stages in the Microprofile!
That means we are stuck waiting on the window manager to give us another buffer to draw into to show you.
Experimenting with Ubuntu 18.04, I found a workaround to remove most of the input lag I’m experiencing under Vulkan.
When logging into my system, I choose the “Unity” window manager (i.e. click on the gear and choose “Unity” from the menu).
Once logged in, I run the “CompizConfig Settings Manager”.
Click on the “Composite” plugin icon.
Uncheck the “Unredirect Fullscreen Windows” option. Make sure “Detect Refresh Rate” is on.
Click the “Back” button.
Close the window.
Note this only works with the “Unity” window manager, as “CompizConfig Settings Manager” is unique to that window manager.
P.S. – If you don’t have the “CompizConfig Settings Manager” installed, you can install it with this terminal command:
sudo apt-get install compizconfig-settings-manager
Thank you so much, I have also the input lag, and I’ll try that ASAP. I’m on Ubuntu 18.04 Unity anyway. Cheers!
It works !
Thanks a lot
Upgrading to “Ubuntu 19.10” with the “nvidia-driver-440” installed and logging in with the “Plasma” window manager finally fixed the input lag for me.
You know what is funny:
Before Vulkan I had 40 FPS with OpenGL, now I have 40 FPS with Vulkan and 60 FPS with OpenGL, same settings same conditions (Zibo 738, at EGLL, 13:00PM, weather clear)
Ben, can you give your thoughts on the these Vulkan optimised Nvidia drivers? Good news or not so?
https://developer.nvidia.com/vulkan-driver
I don’t understand the question. This just looks like a list of the latest NVidia drivers, and they have Vulkan support.
Yes, sorry Ben. I thought these were specific creations for use specifically with Vulkan, optimised somehow, outside the normal evolution.
Thanks again for your teams continuing efforts and for these enlightening blog updates.
Kev,
Are you running the Vulkan-driver? It would be nice to know if it performs same, better, worse than the Game or Studio drivers, and if not, what does this driver do that it would be a separate driver offering?
Dear Ben,
We all know you guys are busy with this upgrade to make a better Sim for us all. I was wondering if Lasso from bitsum will be more useful if as in my case I have 4 actual cores and 4 virtual cores, I could have 1 core for all other tasks while using the other 7 busily concentrating on X-Plane and Vulcan with priority. I have been enjoying all the visual goodness, keep up all the magic. Captain Robert Phelps ATP “jagjet”
You could try it, but there’s a chance that it makes things worse, not better. If you use it and your FPS dies, we don’t want to hear about it. 😉
Ben,
Speaking of Shaders and Vulcan, I came across this NVIDIA site and drivers and would like to know what Graphics drivers your team used to develop Vulcan for X-Plane and whether or not this driver might provide some level of enhanced performance? Thanks.
https://developer.nvidia.com/vulkan-driver
We developed on whatever was the latest published driver from NVidia – there’s one standard ‘tip’ to their dev. It changed over time since Vulkan has been going on for a while.
The driver doesn’t really enhance performance – it is just part of performance. Both the NV and AMD Vulkan drivers work really really well.
Nice to hear! I mean: It’s much better than hearing: “We’ve discovered quite a lot Vulkan errors we have to work around”
Ben, why was the decision made to use Vulkan and not Dx12? The step do use Dx12 is much higher?
Vulkan gets us Windows, Linux and Android; DX12 gets us only Windows. Also the SPIR-V tool-chain from Vulkan plays well with GL and Metal — they’re starting to have this for DX HLSL I _think_ but we sort of get the shader side of Vulkan for free.
Given that the driver is now abstracted, we could write a DX12 back-end – I see no indication that it would be better than Vulkan.
The OpenGL driver offers 16x antialiasing for non-HDR (medium on the visuals slider). Is there a reason why Vulkan doesn’t?
I’m actually getting this error message in Plane Maker as well. I’ve submitted a bug report.
Ben,
I opened the .acf file of the Falcon7X from the org in Plane Maker and when I hit the Space Bar to open it in WireFrame I get this dbx “Pipeline must not be nullptr pipeline Please report this to Laminar Research. [OK]
Should I file a BUG REPORT and include the .acf file?
Thanks,
Scott