A commenter asked if we were planning to use a GPGPU API in X-Plane 10 or beyond. I really can’t answer for the far future, but I can say that we aren’t planning to use GPGPU for X-Plane 10. This post will explain a little bit about what GPGPU is and why we haven’t jumped on it yet.
Every new technology adoption has an adoption cost. So the question of GPGPU isn’t just “will it help” but “is it better than the alternative uses of our time”. For example, do we spend time coding GPGPU, or do we spend time optimizing the existing code to run faster on all hardware? But this post is about GPGPU itself.
GPGPU stands for General Purpose programming on Graphics Processing Units – the Wiki article is a bit technical, but the short of it is: graphics cards have become more and more programmable, and they are highly powerful. GPGPU technologies allow you to write programs that run on the GPU other than graphics.
There are two major APIs for writing GPGPU programs: OpenCL and CUDA. OpenCL is designed to be an open standard and is heavily backed by Apple and ATI; CUDA is NVidia specific. (At least, I don’t think you can can get CUDA to run on other GPUs.) I believe that NVidia does support OpenCL with their hardware. (There is a third compute option, DirectCompute, that is part of DX11, but that is moot for X-Plane because we don’t use Windows only technologies.
If that seemed confusing as hell, well, it is. The key to understanding the alphabet soup is that there are API standards (which essentially define a language for how a program talks to hardware) and then there are actual pieces of hardware that make applications that use that language fast. For drawing, there are two APIs (OpenGL and Direct3D) and there are GPUs from 2+ companies (ATI, NVidia, and those other companies whose GPUs we make fun of) that implement the APIs with their drivers.
The situation is the same for GPGPU as for graphics: there are two APIs (CUDA and OpenCL) and there is a bunch of hardware (from ATI and NVidia) that can run some of those APIs.*
So the question then is: why don’t we use a GPGPU API like OpenCL to speed up X-Plane’s physics model? If we used OpenCL, then the physics model could run on the GPU instead of on the CPU.
There are two reasons why we don’t use OpenCL for the physics engine:
OpenCL and CUDA programs aren’t like “normal” programs. We can’t just pick up and move the flight model to OpenCL. In fact, most of what goes on in the flight model is not code that OpenCL would be particularly good at running.
For a GPGPU program to be fast, it has to be running on the GPU. That’s where the win would be: moving work from the poor CPU to the nice fast GPU. But…we’re already using the GPU – for drawing!
And this gets to the heart of the problem. The vast majority of the cost of the flight model comes from interaction with the scenery – a data structure that isn’t particularly GPU-friendly at this point. Those interactions are also not very expensive in the bigger picture of X-Plane, particularly when the AI aircraft are threaded.
The biggest chunk of CPU time is being spent drawing the scenery. So to make X-Plane faster, what we really need to do is move the graphics from the CPU to the GPU – more time spent on the GPU on less time on the CPU for each frame of drawing we run through.
And the answer for why we don’t use OpenCL or CUDA for that should be obvious: we already have OpenGL!
So to summarize: CUDA and OpenCL let you run certain kinds of mathematically intense programs on the GPU instead of the CPU. But X-Plane’s flight model isn’t that expensive for today’s computers. X-Plane spends its time drawing, so we need to move more of the rendering engine to the GPU, and we can do that using OpenGL.
* Technically, your CPU can run OpenGL via software rendering. The results look nice, but aren’t fast enough to run a program like X-Plane. Similarly, OpenCL programs can be run on the CPU too.
A while ago I wrote twoposts trying to explain why we would use real physics for the AI planes. Looking back over the comments, I think my message missed the mark for a number of readers. The basic idea was this:
It’s quicker to get the new ATC system done by using the existing physics model than by inventing a brand new, parallel “fake physics” model for AI planes. So using real physics lets us focus on other features.
The physics model is not more expensive than a fake physics model would; the few things that actually take CPU time on the real physics model are things the fake model must do: check for ground collisions, etc.
In other words, using real physics doesn’t hurt the schedule and it doesn’t hurt fps. I followed that up with a bunch of talk about how you incrementally build a complex piece of software, and off we went.
What I didn’t do is make the argument for why real physics might be better than fake physics. So: I made a video of the 777 taxiing and taking off.
Some disclaimers: this isn’t a marketing video, it’s what was on my machine at this instant. This is an older v9 plane with v9 scenery*. Version 10 is in development and has plenty of weird stuff going on, and the AI still needs a lot of tuning. Anyway:
With the video compression, calm conditions, v9 airplane, etc. it’s a bit tough to see what’s going on here, but I’ve seen a few AI takeoffs as I run X-Plane in development and it seems to me that the real physics model provides a nuance and depth to how the planes move that would be impossible to duplicate with a “fake” AI (e.g. move the plane forward by this much per frame). When the airport isn’t flat, the plane sways depending on its landing gear, weight, and wheelbase. The plane turns based on its rotational inertia, easing into the turn (and skidding if Austin dials in too much tiller). When the plane accelerates, the rate of acceleration takes into account drag, engine performance, and wind.
* Except for Logan – that’s George Grimshaw’s excellent KBOS version 2 – version 3 is payware and I hope they’ll some day bring it to X-Plane. Unfortunately there is some Z-thrash in the conversion.
If I could have a nickel for every time I get asked “should I buy X for X-Plane 10”, well, I’d at least have enough nickels to buy a new machine. But what new machine would I buy? What hardware will be important for X-Plane 10?
The short answer is: I don’t know, it’s too soon. The reason it’s too soon is because we have a lot of the new technology for version 10 running, but there’s still a lot of optimization to be done.
As I have posted before, the weakest link in your rendering pipeline is what limits framerate. But what appears to be the weakest link now in our in-development builds of X-Plane 10 might turn out not to be the weakest link once we optimize everything. I don’t want to say “buy the fastest clocked CPU you can” if it turns out that, after optimization, CPU is not the bottleneck.
One thing is clear: X-Plane 10 will be different from X-Plane 9 in how it uses your hardware. There has been a relatively straight line from X-Plane 6 to 7 to 8 of being bottlenecked on single-core CPU performance; GPU fill rate has stayed ahead of X-Plane pixel shaders (with the possible exception of massive multi-monitor displays on graphics cards that were never meant for this use). X-Plane 10 introduces enough new technology (instancing, significantly more complex pixel shaders, deferred rendering) that I don’t think we can extrapolate.
To truly tune a system for X-Plane 10, I fear you may need to wait until users are running X-Plane 10 and reporting back results. We don’t have the data yet.
I can make two baseline recommendations though, if you are putting together a new system and can’t wait:
Make sure your video card is “DirectX 11 class”. (This confuses everyone, because of course X-Plane uses OpenGL. I am referring to its hardware capabilities.) This means a Radeon HD 5000 or higher, or an NVidia GeForce 400 or higher. DirectX 11 cards all do complete hardware instancing (something X-Plane 10 will use) and they have other features (like tessellation) that we hope to use in the future. We’re far enough into DX11 that these cards can be obtained at reasonable prices.
Get at least a quad-core CPU. It won’t be a requirement, but we have been pushing to get more work onto more cores in X-Plane 10; I think we’ll start to see a utilization point where it’s worth it. The extra cores will help you run with more autogen during flight, cut down load time, and allow you to run smoother AI aircraft with the new ATC system.
Finally, please don’t ask me what hardware you need to buy to set everything to maximum; I’ve tried to cover that here and here.
I’m a bit behind on posting; I’ll try to post an update on scenery tools in the next few days. In the meantime, another “you see the strangest things when debugging pixel shaders” post.
(My secret plan: to drive down expectations by posting shader bugs. When you see X-Plane 10 without any wire-frames, giant cyan splotches, or three copies of the airplane, it’ll seem like a whole new sim even without the new features turned on!)
Hint: it might not be what you think! Vertex count isn’t usually the limiting factor on frame-rate (usually the problem is fill-rate, that is, how many pixels on screen get fiddled with, or CPU time spent talking to the GPU about changing attributes and shaders). But because vertex count isn’t usually the problem, it’s an area where an author might be tended to “go a little nuts”. It’s fairly easy to add more vertices in a high-powered 3-d modeling program, and they seem free at first. But eventually, they do have a cost.
Vertex costs are divided into two broad categories based on where your mesh lives. Your mesh might live in VRAM (in which case the GPU draws the mesh by reading it from VRAM), or it might live in main memory (in which case the GPU draws the mesh by fetching it from main memory over the PCIe bus). Fortunately it’s easy to know which case you have in X-Plane:
For OBJs, meshes live in VRAM! (Who knew?)
For everything else, they live in main memory. This includes the terrain, forests, roads, facades, you name it.
Meshes In VRAM
If a mesh is in VRAM, the cost of drawing it is relatively unimportant. My 4870 can draw just under 400 million triangles per second – and it’s probably limited by communication to the GPU. And ATI has created two new generations of cards since the 4870.
Furthermore, mesh draw costs are only paid when they are drawn, so with some careful LOD you can get away with the occasional “huge mesh” – the GPU has the capacity if not everyone tries to push a million vertices at once. (Obviously a million vertices in an autogen house that is repeated 500 times is going to cause problems.)
But there is a cost here, and it is – the VRAM itself! A mesh costs 32 bytes per vertex (plus 4 bytes per index), so our mesh is going to eat at least 32 MB of VRAM. That’s not inconsequential; for a user with a 256 MB card we just used up 1/8th of all VRAM on a single mesh.
One note about LOD here: the vertex cost of drawing is a function of what is actually drawn, so if we have a million-vertex high LOD mesh and a thousand-vertex low LOD mesh, we only burn a (small) chunk of our vertex budget when the high LOD is drawn.
But the entire mesh must be in VRAM to draw either LOD! Only things drawn on screen have to be in VRAM, but textures and meshes go into VRAM as a whole, all LODs. So we only save our 32 MB of VRAM by not drawing the object at all (e.g. it being farther away than the farthest LOD).
Meshes in Main Memory
For anything that isn’t an object, the mesh lives in main system memory, and is transferred over the PCIe bus when it needs to be drawn. (This is sometimes called “AGP memory” because this could first be done when the AGP slot was invented.) Here we have a new limitation: we can run out of capacity to transfer data on the PCIe slot.
Let’s go back to our mesh: our million vertex mesh probably takes around 32 MB. It will have to be transferred over the bus each time we draw. At 60 fps that’s over 1.8 GB of data per second. A 16x PCIe 2.0 slot only has 8 GB/second of total bandwidth from the computer to the graphics card. So we just ate 25% of the bus with our one mesh! (In fact, the real situation is quite a bit worse; on my Mac Pro, even with simple performance test apps, I can’t push much more than 2.5 GB/second to the card, so we’ve really used 75% of our budget.)
On the bright side, storage in main memory is relatively plentiful, so if we don’t draw our mesh, there’s not a huge penalty. Careful LOD can keep the total number of vertices emitted low.
Take-Away Points
Non-OBJ vertex count is significantly more expensive than OBJ vertex count.
OBJ meshes take up VRAM; the high LOD takes up VRAM even when the low LOD is in use.
To reduce the cost of OBJ meshes, limit the total LOD of the object.
I don’t want to say anything and risk murphy’s law, but it looks like the CRJ will see the light of day after all.
I always enjoy seeing third party add-ons that really show what the rendering engine is capable of. Also, it’s good to know that Javier brushes his teeth. 🙂
More dubious screen-shots of in-development pixel shaders gone bad. This one was taken while working on full-screen anti-aliasing for X-Plane’s deferred renderer.
Deferred renderers cannot use the normal hardware full screen accelerated anti-aliasing (FSAA) that you’re used to in X-Plane 9. (This problem isn’t specific to X-Plane – most new first person shooter games now use deferred rendering, so presentations from game conferences are full of work-around tricks.)
It looks like we will have a few anti-aliasing options for when X-Plane is running with deferred rendering (which is what makes global lighting possible): a 4x super-sampled image (looks nice, hurts fps), a cheaper edge-detection algorithm, and possibly also FXAA.
No discussion of OSM would be complete without a discussion of licensing and copyright. OSM makes this particularly complex because the project is in the process of changing their license from CC-SA-BY to ODbL. For a full discussion, I recommend the OSM Wiki, but the short version is:
CC-SA-BY (Creative Commons, Share-Alike, with Attribution) is a license that says that anyone can use a work and modify it, as long as attribution to the original author is maintained and the terms don’t change for the derived work. This isn’t particularly surprising, it’s a typical “open source” license, similar to the GPL.
ODbL (Open Database License) is a slight twist: with the ODbL, anyone can use the original data, but if you modify the data, you have to give back your changes. You don’t have to make your derived works open source though. (So under the ODbL if you build custom scenery out of OSM, your scenery doesn’t have to be open-source licensed, but if you ‘fix’ any roads, you need to submit the fixes back to OSM.)
OSM is CC-SA-BY now and is working to switch to ODbL. Basically their lawyers realized that CC-SA-BY is great for images and text, but isn’t actually legal for databases (which is what OSM is). The ODbL will protect OSM as it is – that is, as a “database”. Since OSM is a huge open project, the license change is going to take a long time and lots of people will post lots of rants on lots of mailing lists in the process.
Here’s our plan: we are going to make the v10 global scenery abide by both the spirit and the legal requirements of both licenses. (At least, I hope we are going to try to do this. I am not a lawyer and wouldn’t mind if this was all a lot simpler.)
The version 10 DSFs will be CC-SA-BY. This means that you can modify and redistribute the global scenery DSFs. In the past, we haven’t officially supported this, but we’ve allowed it to go on in the community. With version 10, modifying DSFs will be officially okay. (Please note that copying the sim DVDs will not be okay. You can modify our DSFs, but we are not inviting anyone to sell pirated DVDs! X-Plane is still a commercial product!)
We are encouraging all users to improve OSM directly in the OSM database – we will not accept modified DSFs as a way to fix roads. (This helps us comply with the ODbL.)
All of the scenery tools used to create the global scenery will be open source.
This last case is important because we do a lot of processing to the raw OSM data before we create the DSFs. Technically we are required by the ODbL to “give back” thsoe changes, but the truth is that I don’t think anyone really wants the hundreds of GB of temporary files we create as we process. So instead we will give back the tools that do that processing, so people can recreate our processed database as desired. From my discussions with OSM community members, this is apparently an acceptable way to ‘give back’ our changes.
I should say that nothing is guaranteed here. Heck, it’s even possible that OSM will change its license in a way that screws up the whole global scenery project before we ship. (This is highly unlikely – I’m just saying that there is legal uncertainty with OSM that we haven’t had to deal with when using other data sources.) But I think we’ll be okay; X-Plane’s use of OSM (to create a mash-up of OSM plus other data sources like SRTM elevation to create a derived copyrightable work like a DSF) is definitely one of the use cases that OSM wants to make possible.
In my previous post I provided a brief description of how we’re going to use OpenStreetMap data in X-Plane 10. How do you get involved? Map your area; improve the quality of OpenStreetMap where you live.
If roads or waterbodies are missing, incomplete or wrong, fix them! If they’re missing information, add it.
Please respect OpenStreetMap’s community. We are new guests in their house. Take a little time to learn about OSM and how their work proceeds.
Get in touch with local OSM chapters for your area; there may be non-X-Plane OSM users working in your area already.
Please create high quality mapping data, not just useful data for X-Plane. For example, X-Plane does not use street name information, but nearly everyone else who uses OSM does. If your street is missing, please add the name, not just the road type.
When working on OSM, please try to make OSM match reality – don’t worry about what “looks good” in X-Plane. OSM should match real life, and X-Plane should do its best to recreate this view. Please do not hack OSM data to make X-Plane look better!!!! (I can’t think of any case where you’d want to do this – the scenery creation process works best when the data is accurate.)
A brief note to users in the United States: the US is a little bit different from other OSM countries because we have more free data than Europe. As a result, the US OSM data has been “seeded” with imports of data like TIGER and NHD. (For more on this, I recommend Steve Coast’s SOTM.US keynote video.) The result is that while unmapped areas in other countries tend to be empty, unmapped areas in the US are often filled in with data that is present but not particularly good.
So if you live in the US, take a look at your home town. Some of the most common problems are: incorrect road types or incorrect one-way information, missing bridges, and missing water bodies. To meet the level of quality that OSM already has in Europe (have you seenwhat the Germans have mapped?!?!) the imported free data needs a scrubbing by real human beings who know the area.