This blog post is more or less a “stern talking to” for plugin developers, but before I go there, I want to acknowledge that we (Laminar) screwed up the docs here in a way I didn’t even realize until working on a bug report. A decade ago when the plugin SDK was young, we did have clear docs that the plugin APIs were not at all thread-safe. Sandy and I were also on the plugin dev email list and we’d administer a Scottish-style beating to anyone who even started typing the first few letters of “threading”.
However, this document was lost in the migration from the old XSquawkBox server to developer.x-plane.com. I’m working with Jennifer on new docs now, and I apologize for the thrash any plugin developer gets hit with from not knowing the threading guidelines.
With that in mind: the X-Plane plugin API is not thread-safe. You can only call plugin APIs on the thread that called you. No exceptions!
The plugin SDK was invented for X-Plane 6. At the time, multi-CPU and multi-core hardware was totally unavailable to the flight sim community. Apple hadn’t released the dual-G5 yet, and the Pentium D hadn’t come out yet. There was no point in thinking about multi-core because there weren’t multiple cores.
The plugin guidelines were therefore set up very simply: the API is single threaded; call us back on the thread we call you. The SDK internally has no locks or handling of re-entrancy and has no model to cope with resource sharing or data integrity across threads.
Furthermore, X-Plane’s crash detection system is not meant to categorize other-thread crashes, so you can’t easily tell that your plugin is crashing the sim. You might even crash a different plugin, or some internal part of X-Plane.
One more way we’re not thread safe that might not be obvious: when you read a dataref, you are executing code from another plugin. You can’t do this on a thread if the XPLM isn’t thread-safe, but you also can’t do this if the other plugin isn’t thread safe.
Stopping the Bleeding
In X-Plane 11.30 we made a change to stabilize the sim: we added code to actively detect and ignore plugin calls to the XPLMTerrainProbe APIs from background threads. We did this after seeing automatically reported crashes that turned out to be due to plugins calling the SDK terrain probe API from worker threads. By ignoring the call, we avoid the crash.
My plan is to start doing this for all plugin calls at a patch point in the next few months. The problem here is that there’s basically no such thing as a benign threading bug – the table stakes here are the complete destabilization of the sim.
If you are a plugin author and you are using background threads to call XPLM APIs, please stop doing this now. Please plan to fix this in your plugin as soon as possible. The change will probably not make things any worse – my current idea is to no-op the calls, just like we did with terrain probes. But if your plugin is using these async calls and sometimes succeeding but sometimes crashing, you’re going to stop seeing the crashing and the sometimes lucky “success”.
I’m working with Tyler and Jennifer on a docs update now – hopefully this week all of the docs should be completely consistent. But they’re going to say what I’ve said above: no calls to the XPLM on worker threads.
Will We Ever Be Thread-Safe?
Sandy and I did some work back in the XPLM 2.0 days to start making the SDK partly thread-safe. This work was not completed, but the idea was to at least make a small number of APIs callable from worker threads. For example, by making flight loop callbacks schedulable from worker threads, a plugin could “wake up” SDK code from an async IO callback. That idea still makes some sense and we may get there someday.
For expensive tasks, we’ve already made API changes to address the underlying performance problems. For example, you can’t load an object synchronously from a thread, but you can ask us to load it asynchronously and call you back, and we use one of our threads to offload the work.
Some APIs I expect to never be thread safe. For example, we can’t sanely provide a threaded API to datarefs because we can’t promise that the plugin on the other side of the call is thread-safe. Given that a dataref read function can call other datarefs or other arbitrary plugin code, the opportunity for dead-locks is limitless.
Tyler’s Fever Dream
I should mention something that Tyler has been looking at as a future SDK initiative. This is somewhere between pie-in-the-sky and a fever-dream: plugins running asynchronously in separate processes.
The idea is to have a version of the main SDK APIs that are “fundamentally asynchronous” (e.g. the response comes later and that’s baked into the contract). This would allow plugins to run in another process, with results being communicated via IPC. Out-of-process plugins would have a bunch of advantages:
- You could write a plugin in pretty much any language we can write a binding for. Right now, plugins must fit into an unmanaged DLL inside X-Plane; this would allow the full “async API” to be used in .net or even Matlab. The requirement would only be that the plugin “app” environment be able to host unmanaged C code. This is a much less difficult requirement to meet.
- Plugins would be isolated by process boundaries; if a plugin crashes, the sim keeps running. Plugins can’t go around trashing each other’s memory.
- Use of multiple cores for plugin CPU processing is basically free, because the plugins can execute in parallel.*
- Plugins can each have their own CEF instance – or any other library that doesn’t like to be multiply instantiated in a single process.
One of the major hurdles to implementing an API like this was sharing graphics across the process boundary. As it turns out, we have to solve this problem for Vulkan anyway – the same APIs that share surfaces between Metal/Vulkan and OpenGL are IPC-friendly from day 1. So a 2-d plugin OpenGL window in Vulkan doesn’t have to be in X-Plane’s process.
I don’t see async plugins ever replacing 100% of in-process plugins, and this isn’t a plan to kill off the current C API. But I do think out-of-process plugins would be a much better fit for a wide range of plugin tasks.
* Users familiar with game programming will appreciate one of the fundamental dangers here: plugins in other processes might steal oversubscribed cores from the sim itself, causing stutters and unreliable framerate. Tyler and I have talked about that a little bit, and have some ideas for taking plugin background work into account and giving plugins ways to opt in and say “please run this background work for me at a time that won’t hose framerate.”
So are you saying it is still okay to do async work, as long as all interaction with the X-Plane SDK is done on the main/default thread?
Precisely. You have to ‘marshall’ your results back to the main thread yourself, e.g. with a thread safe message q that you non-blocking read on the main thread every frame or few frames.
Yup, this is the approach I’m taking with a plugin I’m working to convert incoming UDP OSC messages to commandExecutions – one thread listens for the messages, interprets them, and pushes an abstraction of the appropriate command (literally the GoF command pattern, I realized as I was writing this) onto the threadsafe queue, and the X-Plane thread pops them off same and executes them during the flight loop callback.
I’m surprised: We have particle systems, but no thread-safe plugin API.
Maybe the plugin API is not multi-threaded because once X-Plane was more or less a one-man-show. 😉
But AFAIK Windows uses LPCs (Local Procedure Calls, like RPCs, but local) to manage distributing the load on multiple threads. Linux when going SMP used one big fat mutes lock, so only one thread could enter the kernel at a time. Over time they had a few locks per subsystem, and (AFAIK) today the kernel is massively parallel inside. I know almost nothing about MacOS, but if it’s still based on Mach, it should have message passing too. In Linux there also exists some message passing interfaces.
Maybe that’s the road to go: Instead of calling code directly, establish some local communication port where theads can drop their requests and pickup their responses, while on the other side the “X-Plane server” can decide whether to process the requests one-by-one or in parallel. If you go from client (plugin) server (X-Plane) model to a fully symmetric model of processing requests (plugins can also process requests and send responses), you could go gradually from single threaded to multi-threaded (as I see things).
This is a great thread…erm, post. 😉
Thanks for giving the SDK some love and a mention in the blog. A lot of this is a bit over my head, and I’ve never been guilty of even trying to create a worker thread. So I’m safe!
Even XPSDK 3.x is getting long in the tooth, though, and there are many things that could be enhanced. Philipp explained the reasons for not updating XPLMNavigation back at FlightSimExpo, but for the developer community to keep bringing out add-ons that fit the current state of the sim, the SDK really needs to be brought up to date – not just in multi-threading. Given Tyler’s dreams (or nightmares?) it does seem like LR is not leaving the SDK in the dust, and that’s encouraging. Thanks for that!
Matlab?! Bless ya’ll’s hearts (and those of any grad students in that wake). “Let them eat VBA”!
Doesn’t the flight loop wait for each plugin to finish? Serially? When I inject a sleep(1) into a plugin, FPS = 1 as well. Ergo, if the master plugin thread forces a join after spawning a batch in which each worker is known to invoke a different API call, where is the harm?
I suppose another way of asking is: is the worker that loops through the plugins not the only concurrently running thread in the xplane universe even though what my eyes see on the screen doesn’t update?
The flight loop does not _wait_ for plugins (where by “wait” I mean sleep the thread by blocking on an OS sync mechanism like a semaphore). Plugins are executed _synchronously_ on the main thread non-concurrently to the FM, so your plugin callback’s execution time is part of the “critical path” of the main thread through the sim loop.
There are concurrent worker threads in X-Plane and they are able to do a variety of background work, some of which is picked up within the frame and some of which is picked up after several frames. Plugins do not have access to this mechanism, except when calling XPLMLoadObjectAsync – in that case, the work of loading and building the OBJ file is done on this background pool and picked up later.
does this mean we may or may not make xplm calls inside a flight loop callback now?
You _may_ call any XPLM call from inside any XPLM callback, because the callbacks are always delivered on the main thread. The rule of the XPLM is “call us only on the thread on which we call you”, and in practice that thread has always been the main thread. We are probably never going to change that because UI callbacks naturally come from the main thread, so if we have to pick one, that’s the sane one to pick.
So from a flight loop callback, you can call XPLM routines to read/write datarefs, create windows, etc.
Way back in the day, there were _further_ restrictions, e.g. if you registered a flight loop from inside another flight loop callback, the XPLM would intermittently blow up; we had a Lot of fine print guidelines. All of that was fixed in the XPLM 2 for X-Plane 9; for v9, 10 and 11, you can call any other XPLM call from any XPLM call. (I think this even includes deleting an object from within its own callback.)
Unfortunately, an asynchronous interface causes a lot of extra load on the processor. I do this experience daily with FSX/Prepar3d SimConnect. As a developer of the plugin FSTramp, I even think about leaving the SimConnect interface partially and switching to the simple FSX-gauge interface to read and write variables. This is similar to XPLM and therefore fast.
What makes you say that an asynchronous interface would cause “a lot of extra” load on the processor?
The achieved frame rate is crucial. If I deactivate SimConnect in my plugin, it will increase by one frame. With a total loss of 1.5 frames by my plugin that’s a big part. By contrast, the proportion of the map window drawn at 15 Hz is negligible.
Right – but this is a data point specific to sim-connect. There’s nothin fundamental about the sim-connect design that would make it fundamentally more expensive than synchronous plugins.
I hope XPUIPC isn’t using background threads to make API calls as it’s no longer developed, so unlikely to get an update/fix, and several useful tools rely on it.
The out-of process plugins is great idea! No more plugins interference and incompatibilities! Please, please, make it happen!! I always wanted to write plugin to run in separate process but inability to share gl context prevented me from doing so.