Really Really Really Really Boring Stuff Archives - X-Plane Developer https://developer.x-plane.com Developer resources for the X-Plane flight simulator Tue, 16 Feb 2021 15:02:11 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.1 https://developer.x-plane.com/wp-content/uploads/2017/12/cropped-X-32x32.png Really Really Really Really Boring Stuff Archives - X-Plane Developer https://developer.x-plane.com 32 32 Have You Heard the Good News About Elixir? https:/2021/01/have-you-heard-the-good-news-about-elixir/ https:/2021/01/have-you-heard-the-good-news-about-elixir/#comments Mon, 18 Jan 2021 18:20:18 +0000 http://developer.x-plane.com/?p=40122 [This post is a “behind the scenes” look at the tech that makes up the X-Plane massive multiplayer (MMO) server. It’s only going to be of interest to programming nerds—there are no takeaways here for plugin devs or sim pilots.]

[Update: If you’re interested in hearing more, I was on the ThinkingElixir podcast talking about this stuff.] Read More

The post Have You Heard the Good News About Elixir? appeared first on X-Plane Developer.

]]>
[This post is a “behind the scenes” look at the tech that makes up the X-Plane massive multiplayer (MMO) server. It’s only going to be of interest to programming nerds—there are no takeaways here for plugin devs or sim pilots.]

[Update: If you’re interested in hearing more, I was on the ThinkingElixir podcast talking about this stuff.]

In mid-2020, we launched massive multiplayer on X-Plane Mobile. This broke a lot of new ground for us as an organization. We’ve had peer-to-peer multiplayer in the sim for a long time, but never server-hosted multiplayer. That meant there were a lot of technical decisions to make, with no constraints imposed by existing code.

Requirements

We had a few goals from the start:

  1. The server had to be rock solid. We didn’t want a tiny error processing some client update to bring down the whole server for everyone connected.
  2. We wanted a single shared world1. Functionally, this means the language/framework we chose would need to have a really good concurrency story, because it would need to scale to tens of thousands of concurrent pilots.
  3. We wanted quick iteration times. We couldn’t be sure how well MMO would be received by users, so we wanted the initial investment in it to be just enough to validate the idea.
  4. It needed to be fast. Multiplayer has a “soft real time” constraint, so we needed to be able to service all clients consistently and on time. (Quantitatively, this means our 99th percentile response times matter a lot more than the mean or median.)

Choosing a Language

From those requirements, we could draw a few immediate conclusions:

  • The requirements for stability and fast iteration time ruled out C++ (or, God help us, C). Despite having a lot of institutional knowledge about those languages, they’re slower to develop in than modern “web” languages, and a single null pointer will bring down the entire system. (Ask me how I know. 😉 )
  • The speed & scalability requirements ruled out a lot of modern web languages like Ruby, where the model for scaling up is generally “just throw more servers at it.” We didn’t want to (forever!) pay the development cost of synchronizing multiple machines across a data center—that’s a drag on both dev time and client latency.

This eventually led me to a few top contenders:

  • Rust
  • Go
  • Elixir

Each of these languages has a solid concurrency story. Rust would probably be the fastest & most scalable, at the cost of developer productivity. But Elixir had one major thing that neither Rust nor Go could touch: fault tolerance built in to the very core of the platform.

Elixir has this concept of running code in lightweight, separate “processes.” These are emphatically not OS processes—under the hood, they’re just a data structure in Erlang/Elixir VM (called the BEAM). One of the core ideas of Elixir processes is that they’re expendable: a crash in one process doesn’t affect other processes, unless those processes explicitly depend on the crashing one. So, consider a process tree structured like this (apologies for my ASCII art):

           UDP Server           ______________________  
          /    | ... \         | Spatial Data Store 1 |  
        /      |       \        ----------------------  
      /        |         \           ______________________  
    /          |           \        | Spatial Data Store 2 |
Client 1   Client 2 ... Client n     ----------------------  
                                                ...  
                                         ______________________
                                        | Spatial Data Store n |
                                         ----------------------

A crash in the Client 1 process affects only that client’s connection—not Client 2, nor the UDP server itself. Likewise, a crash in Data Store 1 (in our case, we partition the data in memory based on each plane’s spatial location) doesn’t affect the data in any other data store.

(Of course, a crash in the base UDP Server would still destroy all client connections—there’s no getting around that, so we try to minimize the work the UDP server itself does.)

This makes an Elixir server extremely fault tolerant. And that’s paid off for us in practice. In the last 30 days, we’ve had ~2,000 crashes in client connections (usually because of garbled UDP packets)—each of these required the client to reconnect behind the scenes, but it didn’t affect any other clients. To date, we’ve never had a crash high enough in the process tree to disconnect multiple clients or lose data, and I don’t really expect we will.

The other thing this process architecture makes possible is fair scheduling of clients against each other: if you have 10,000 clients, and one of them for whatever reason takes 10 seconds to process an update, that client won’t be allowed to bogart a hardware thread—it’ll be suspended after a few hundred milliseconds to schedule another process. That makes it a lot easier for us to keep response times consistent even in the face of unexpected issues in the wild.

The result of all this is that we can support thousands of clients on a single off-the-shelf cloud VM instance, with great reliability. Developer productivity has never been better, either—I went into this knowing zero Elixir, and by the time I had worked through the official Getting Started tutorial, I felt confident enough in the language to dive in.

Surprises with Elixir: The Bad

This wouldn’t be an honest post-mortem if I didn’t talk about the ways in which Elixir didn’t live up to its hype.

First, despite all the tools the Elixir ecosystem has to support multi-node distributed systems (i.e., a cluster of servers), this is never going to be easy if you need synchronization between them. I’m not aware of any platform that does it better, but this is something the Elixir community kind of oversells. Everybody wants to talk about multi-node clusters, but the reality is that (at least at our scale), we didn’t actually need multi-node support, and it would have been utterly foolish to pay the very high dev costs to build it in from the start. If we ever need to support orders of magnitude more concurrent pilots, we’ll do so by moving to a bare-metal, 64-core machine or something… not by spinning up dozens of 4-core VMs.

The same goes for zero downtime deployments. Again, the community loves to talk about this, and it is indeed really cool that the BEAM supports it. (I don’t know of any other plaform where this is possible!) But just like multi-node clusters, there’s a very high dev cost to making this work, and you probably don’t need it. In our case, we’re just doing blue/green deploys: we migrate client traffic from the old server to the new one when you start your next flight.

The biggest shortcoming we encountered in practice was with Elixir’s package ecosystem. To be fair to Elixir, it’s actually way more broad than reading comments on the internet had led me to believe, but it still doesn’t hold a candle to NPM or pip (both for better and for worse). This meant I had to implement the UDP protocol we use for game state sync (RakNet) from scratch2. That was time consuming, but not too terrible. (Of course, I come from the C++ world, where “implement it from scratch” is the default!)

The last pain point I had was with IDE integration. As somebody who uses JetBrains exclusively for all my development work (C++, Objective-C, Android, Python, PHP, Node.js, etc.), it pains me that there’s not a first-party Elixir IDE. The community intellij-elixir plugin is really good for a community plugin, but in no way will it make you think it’s natively supported. Booting up the debugger can take literally minutes on our project—the debugger is effectively useless, and I use test harnesses or logger debugging almost exclusively.

Surprises with Elixir: The Good

There were a few really amazing things I encountered in working with Elixir that I didn’t really expect from just reading about it on the internet.

  1. Elixir’s support for integrating with Python, C, and other languages that can talk to C wound up being really valuable. This is great for leveraging libraries written in other languages (though it’s not really suitable for use in our real-time updates due to the inherent cost of marshalling data between the two languages). This let me use a METAR parser written in Python from within Elixir, without having to do painful things like call a system process, ask the Python script to write to disk, then read from disk.
  2. Pattern matching (and more broadly, the general principle in functional languages of working on the “shape” of the data rather than explicit strong types) is intoxicating. It just leads to such simple, straightforward code! This is one of those things that once you experience it, you start conceiving of all programming problems in these terms, and it’s hard to go back to a language without it.
  3. It’s so nice to have an all-Elixir stack. I’ve written web apps in other languages (PHP, Node, a bit of Ruby), so I was very used to depending on external technologies for core functionality—HTTP servers, caching layers, Cron jobs, etc. This slide from Saša Jurić’s outstanding talk The Soul of Erlang and Elixir really sums up how Elixir can serve as a web service unto itself: Now, to be clear, is Elixir’s version of these tools as fully featured as the alternative, standalone version? Probably not. But for X-Plane’s use case, we’ve not found any shortcomings, and without being an expert in Redis/Cron/PM2/whatever, I couldn’t actually tell you what Elixir’s version of this stuff is lacking. And that’s the point, really—instead of needing expertise in a bunch of different tools, you can learn one (i.e., Elixir) really well.

Want to Get Started with Elixir?

If all this is intriguing enough to make you want to dive into Elixir, I can recommend a few resources:

  1. The best place to start is the Saša Jurić talk linked above. This gives an overview of the philosophy of Elixir (and Erlang, which it’s built on). It’s a great introduction to the high-level concepts you’ll build everything else on top of.
  2. Next, go through the official Getting Started tutorial. I’ve never seen a language’s first-party documentation as good as Elixir’s. You could honestly read this alone and have enough knowledge to write production services.
  3. Saša Jurić’s book Elixir in Action. I’ve read most of this for a deeper dive into the language, and while it’s not necessarily required reading beyond the official tutorial, I did find it valuable.

Thoughts, questions, comments? You can drop them in the comments below, or hit me up on Twitter.

[1] Long term, we might actually want to split the world’s traffic into multiple servers (e.g., one for Europe, one for the Americas, etc.), since no amount of technical tricks can eliminate the latency of sending a packet from, say, Sydney to New York. For the initial release, though, we could deal with the latency, and we wanted the option of hosting tens of thousands of players on a single server.

[2] We recently open sourced the RakNet protocol implementation—you can find it in the X-Plane GitHub. The README gives a good overview of the full MMO server’s architecture, too: each client connection is a stateful Elixir process, acting asynchronously on a client state struct; clients asynchronously schedule themselves to send updates back to the user.

The post Have You Heard the Good News About Elixir? appeared first on X-Plane Developer.

]]>
https:/2021/01/have-you-heard-the-good-news-about-elixir/feed/ 21
XPLMInstance: Two Tricks https:/2020/04/xplminstance-two-tricks/ https:/2020/04/xplminstance-two-tricks/#comments Sat, 25 Apr 2020 16:13:57 +0000 http://developer.x-plane.com/?p=39857 This post is just targeted at plugin developers who are modernizing their object drawing – if you don’t write plugin code, the Cincinnati Zoo has been showing their animals on Youtube – it’ll be a lot more entertaining than this post. Read More

The post XPLMInstance: Two Tricks appeared first on X-Plane Developer.

]]>
This post is just targeted at plugin developers who are modernizing their object drawing – if you don’t write plugin code, the Cincinnati Zoo has been showing their animals on Youtube – it’ll be a lot more entertaining than this post. (An XPLMInstance cannot tunnel down two feet in fifteen seconds – one point for the zoo animals.)

XPLMInstance makes a persistent object that lives inside X-Plane that is visible in the 3-d world. It changes how you draw from “run some drawing code every frame” to “tell X-Plane that there is a thing and update its data every now and then.”

Instancing is actually a lot easier than draw callbacks! But there are two tricky gotchas:

1. You must create the custom DataRefs for your OBJ’s animation before you load the object itself with the SDK. (If the DataRefs do not exist at load time, the animations are disabled as “unresolved to any DataRef”.)

2. When you create the instance, make sure your custom DataRefs are on the list of DataRefs for that instance.

Here’s the really baffling thing: if you create the custom DataRef and then add it to the instance’s list, your DataRef callbacks will not be called.

Wha?

Here’s the trick: the DataRef you register is a global identifier, allowing the object to refer to what it wants to listen to. That’s why you have to create the DataRef – so that the identifier exists.

But when you create an instance, each instance has memory that holds a different copy of those DataRefs.

For example, let’s say you have a truck with four DataRefs, and you make five instances. X-Plane allocates 20 slots (four DataRefs times five instances) to store five copies of each DataRef’s values.

The instances never look at the DataRef itself. They only look at their local copies. That’s why when you push different data to the instance with XPLMSetInstancePosition, each instance animates with its own values – each instance looks at its own local data.

This is also why you won’t see your DataRef callbacks called (unless you use DataRefEditor or some other tool). The object rendering engine isn’t looking at the DataRefs themselves, it’s looking at the local copies.

In other words, XPLMInstance turns DataRefs from the pull model you are used to (X-Plane pulls on your read function to get the value) to a push model (you push set with XPLMSetInstancePosition into the instance’s memory).

This implies two things about your add-on:

  • It doesn’t really matter what your DataRef read functions do – they can just return zero, and
  • You can’t use tools like DataRefEditor or DataRefTool to debug your animations. (That didn’t work well in legacy code either, but it really won’t work now.)

If you try the obvious optimization of not creating your custom DataRefs (“hey, no one calls them”) before you create your instance, you will find that animation just stops working. This is because we need the DataRef to be that global identifier to match your instance data with the animations of the object itself.

One last note: if your old code used sim/graphics/animation/draw_object_x/y/z to determine which object was being animated (from inside a plugin “get” function) you do not need to do this anymore. Because each instance has its own local copies and your DataRef function isn’t called, this technique is obsolete.

In summary:

  • You must register custom DataRefs.
  • Their callbacks can just return 0 – they’ll never be called.
  • Always list your custom DataRefs for animation when you create an instance.
  • Do not use draw_object_x/y/z; use XPLMSetInstancePosition to create per-specific-instance animation.

The post XPLMInstance: Two Tricks appeared first on X-Plane Developer.

]]>
https:/2020/04/xplminstance-two-tricks/feed/ 22
Linux users: Please don’t run X-Plane with sudo! https:/2017/10/linux-users-please-dont-run-x-plane-with-sudo/ https:/2017/10/linux-users-please-dont-run-x-plane-with-sudo/#comments Sat, 21 Oct 2017 03:49:06 +0000 http://developer.x-plane.com/?p=7939 TL;DR: Running X-Plane with sudo is a bad idea. Instead, create proper udev rules (per this and this).

During the 11.10 beta, I’ve gotten a lot of bug reports from Linux users who report that their keyboard is being recognized as a joystick. Read More

The post Linux users: Please don’t run X-Plane with sudo! appeared first on X-Plane Developer.

]]>
TL;DR: Running X-Plane with sudo is a bad idea. Instead, create proper udev rules (per this and this).

During the 11.10 beta, I’ve gotten a lot of bug reports from Linux users who report that their keyboard is being recognized as a joystick. This is… sort of a bug, but mostly intentional.

(If you’re not a Linux user, this won’t apply to you… but it will bore you! 😉 )

Background: What changed?

On Linux, prior to X-Plane 11.10, we were very picky about what USB devices we considered to be a joystick: we required a device to present a so-called “absolute” axis (in contrast to a “relative” axis like a mouse uses). The downside of this is that it prevented home cockpit builders from creating button-only hardware.

So, in 11.10 and beyond, we relaxed the requirements: if a USB device presents us with either an axis, button, or hat switch, we’ll treat it like a joystick.

The problem with this policy seems obvious: keyboards have “buttons”! Like, 104+ of them!

The reason we didn’t worry about this is that the keyboard is only accessible (as a USB device) to programs running as root. So long as X-Plane runs as a normal user, it doesn’t even have the option of treating the keyboard as a joystick.

Why do people run as root?

The impetus for running as root (via sudo) is simple: if your Linux distro doesn’t recognize your joystick hardware as something that should be available to normal applications, running as root is a brute-force way to let X-Plane use your joystick.

Let me say emphatically: This is a bad idea.

Especially with early, buggy betas, running as root makes it possible for X-Plane to do way more damage to your system than would ever be possible as a normal user. Consider the unlikely—but possible!—scenario where somebody made a typo in the code which inadvertently tries to delete a system folder. There are two possible outcomes here:

  • If you’re running as a normal user: Nothing happens. The operating system refuses to let X-Plane hurt your system.
  • If you’re running as root: The operating system silently obeys. You curse X-Plane for breaking your system.

Running X-Plane as root is like giving a blank check to every cashier you buy something from—it’s way more power than they need to do their job, and it’s liable to burn you at some point!

The Right Way™ to let X-Plane use your joystick

As described in the latter half of this old dev blog post, you don’t have to run with sudo. Instead, you can create udev rules to tell your operating system to let normal applications use your joystick. The GUI tool linked at the end of that post makes it even easier.

(Some users found the instructions there confusing; this post on the Org might help.)

Remember that after you create your rules, you can even submit them to your distro to make life easier for other flight simmers!

There’s one hitch: after running with root, your file permissions (especially your prefs) may have gotten screwed up. This can be fixed from the terminal by making your normal user account the owner of your X-Plane directory, like this:

$ sudo chown -R <username>:<username> /path/to/X-Plane/

(So, in my case, my username is tyler, and X-Plane is installed to ~/Documents/X-Plane/, so I’d run $ sudo chown -R tyler:tyler ~/Documents/X-Plane/.)

Now, to those of you who have been running as root… “go, and sin no more”! 😉

The post Linux users: Please don’t run X-Plane with sudo! appeared first on X-Plane Developer.

]]>
https:/2017/10/linux-users-please-dont-run-x-plane-with-sudo/feed/ 14
X-Plane 11.05, 11.10, and My Mostly Dead Hard Drive https:/2017/08/x-plane-11-05-11-10-and-my-mostly-dead-hard-drive/ https:/2017/08/x-plane-11-05-11-10-and-my-mostly-dead-hard-drive/#comments Sat, 19 Aug 2017 15:24:29 +0000 http://developer.x-plane.com/?p=7822 TL;DR version: my iMac’s fusion drive “lost its marbles” right before I went on vacation. This has delayed cutting an 11.05 release candidate 2 with a few scenery fixes, but we should get to it next week. In parallel, we’re working furiously to get all of the code locked down for 11.10. Read More

The post X-Plane 11.05, 11.10, and My Mostly Dead Hard Drive appeared first on X-Plane Developer.

]]>
TL;DR version: my iMac’s fusion drive “lost its marbles” right before I went on vacation. This has delayed cutting an 11.05 release candidate 2 with a few scenery fixes, but we should get to it next week. In parallel, we’re working furiously to get all of the code locked down for 11.10.

Everything else that follows is really, really, really, really boring. I’m writing it only because some of my co-workers watched this slow motion car crash and tightened up their backup game a bit. If my drive fail can shake you out of complacency, read on.

Basically: my iMac is my main development machine, and the data is backed up and/or duplicated in a bunch of different places: a USB time machine archive, a Backblaze cloud backup (both are “full machine”), DropBox for virtually all of my documents, and my work for Laminar is kept on Laminar’s source control servers. Data loss was never a huge risk here.

Time loss, however, is a real risk! My goal was to lose as little work time to fixing my machines as possible. So my plan was: restore from time machine disk backup, request a cloud backup restore via hard drive, return the hard drive. The total cost would be a few hours of disk copying and less than an hour of my time. My development machine would be usable for new work while waiting for the cloud backup to arrive.

This has not gone as well as I had hoped! You can learn from my fail here — a few notes.

  1. Your backup might as well not be a backup if you have not checked that the backup contains the data you think it contains. It turns out that both the cloud backup and time machine backup were missing files!  I’m very lucky that they weren’t missing the same files.
  2. Time machine sometimes decides not to back stuff up. OS X has a hidden per-file/directory attribute that can exclude a file from backup without showing it in the Time Machine UI!  Once you check your time machine backup and find a folder is missing, from terminal you can do tmutil isexcluded <file path> to see if the file has been explicitly excluded.  If it is, tmutil removeexclusion <file path> fixes this.
  3. Backblaze ships with a bunch of file exclusions too – mostly designed to not archive stuff that isn’t your data. But beware – stuff you care about might not be on the list. (For example, virtual disks in a virtual machine are excluded by default.)  I had to add back .iso files to the backup list. Backblaze backups are also not bootable. This is something I can live with, but always read the fine print about what’s in the backup.
  4. The Backblaze data restore has been very slow – over ten days for less than half a terabyte and it’s still “in progress”.* While they haven’t exceeded the maximum restore time they advertise, it’s slow enough that the delay matters.
  5. One other note on Backblaze: I saw major performance problems on my iMac while Backblaze was running, even when a backup was not running (since they were scheduled for overnight). I do not think this is necessarily Backblaze’s fault – it may be a problem with CoreStorage (which “runs” the fusion drive) or even a fault with my drives. From what I can tell, cloud backup exacerbated it by putting a lot more file traffic on my system.
  6. A possible danger if (like me) you keep documents on DropBox to have them everywhere: when I restored my iMac from Time Machine, I was exposing DropBox to my data from a week ago. I didn’t wait to see if DropBox would figure out what happened; I unlinked my iMac while it was offline after the restore, then re-established DropBox and let it download my data. Better safe than sorry.
  7. I have been backing up to portable 2.5″ USB drives because they’re cheap and really convenient, but they have a down-side: the mechanisms can easily fail and take your whole backup down. I have five of these drives and one has failed in a three year period.
  8. I’m really unhappy with CoreStorage, to the point where I would not recommend a fusion drive anymore. CoreStorage is an Apple virtual-volume technology (similar to soft-RAID) that makes one small SSD and one large HDD look like a single unified volume, with some of the data “cached” on the SSD for performance. CoreStorage is a lot newer than HFS, so when things go wrong, most disk utilities you would go to just don’t work.

I actually ended up in a state where (after wasting almost an entire day) I could see my data, but only in single-user mode with a read-only file system. I might have been able to directly copy the data, but I picked to format the drive and restore from the backup to save more of my time and get back to coding X-Plane.  My suggestion for developers getting iMacs: get an internal SSD (whatever storage size you can afford) and supplement with a fast external hard drive over Thunderbolt.

Going forward, I am replacing the portable backup drives with a Synology NAS RAID device – this gets me high performance, high capacity backup (about 10 TB) with redundant drives. I picked HGST drives because they’ve had a good track record for reliability. With a large network attached storage server, I can have all of my machines backing up in the house all of the time, and have that be the primary way of getting my data back. I’m keeping cloud backup as a last-resort-the-house-burned-down kind of thing.

If my cloud backup hasn’t shipped Monday, I will rebuild the setup I use to cut builds by hand (it’ll take a few hours but it’s doable) and we’ll cut 11.05r2 that way. If the drive comes, I can get the last of my data back and we’ll get to 11.05r2 the easy way. Either way, we’ll get things moving again.

 

* I opted for a hard drive restore, which should have one day of shipping time, instead of a download; a smaller restore based on download made clear that the transfer speeds would be slower than FedEx for that quantity of data.

The post X-Plane 11.05, 11.10, and My Mostly Dead Hard Drive appeared first on X-Plane Developer.

]]>
https:/2017/08/x-plane-11-05-11-10-and-my-mostly-dead-hard-drive/feed/ 31