Sunday, September 4, 2016

Chromium Gamepad API and Shared Memory

Ever wondered how Chromium implemented Gamepads with all of their odd-ball characteristics like different key mappings, bugs and sometimes really long polling times? Well here goes...

Chromium is broken into at least 3 different chunks for Gamepad input. The first chunk is the service in the browser which surfaces Gamepad data. The browser has access to system level devices like Gamepad's while the renderer or content process doesn't. This means at least one process jump is necessary to push the data down.

The second chunk is the renderer itself. This chunk has to read data, preferably in a non-blocking and fast way, and then provide the data to the JavaScript APIs. We know that this comes through in 2 different ways. First, a Gamepad can be provided through the gamepadconnected and gamepaddisconnected events. They come through on the event object. The second way to get a gamepad is to call navigator.getGamepads() and retrieve the array of them. This may not return anything useful if the user hasn't interacted with the Gamepad at least once though. This is to ensure that a website can't use the Gamepad as another way to fingerprint you (find out information specific to you that can be used across navigations to reassociate your session even across different sites).

The third chunk are the Gamepad's themselves, known as providers. These connect to the actual device or system level APIs and might take some time depending on the API and drivers involved.

So this gives us our latency options. We want the Gamepad to be up to date and for this we have to poll. Chromium implements either a 16ms or 10ms polling rate depending on which branch you are in. This polling rate only affects how quickly the Browser process has the latest data available and it still needs to travel back to the renderer process. 

There is a lot of data it turns out. You can have up to 4 gamepads connected (kind of arbitrary, but good enough for now) though we'll discuss later why this is soon to be insufficient. And each gamepad can have tons of buttons and axes that need to be reported. This gets even more interesting with matrices for location and orientation if we start to discuss motion controllers.

Passing all of this data down through an IPC channel would create latency. It's also a lot of data, so it might even be a bottleneck in the IPC system if there are "many listeners". The push model isn't a good idea here, so could we do better? We could try to pull the data only when the page asks for it. That might be better, but that too might be happening at a high rate, even if the data hasn't "changed". You may also have to do a blocking IPC in this case since you are requesting a response. It could be async, but then you get even more latency and your data is always from the "last frame" instead of the current frame.

Shared memory works great here and this is how Chromium has implemented a more optimal behavior. With shared memory we can lock and update/read as we see fit. This could create some problems, anytime you take a lock bad things CAN happen, but would they? Let's go back and think about our we moved away from IPC, it was because there might be too many consumers of the data, in this case readers, and thus if the readers are always locking, they may never allow a write. We have to do even better.

What guarantees do we need? First we need to guarantee that the writer can write when it wants too without too much blocking. Also that the readers can read without too much blocking. But more importantly that the readers don't get "corrupted" data. They need a full frame of data and the data is pretty larg, maybe as large as say 1k, and so we need to ensure that we got it all and the writer didn't intercede in any way.

Turns out we can lock for the writer and optimistically read for the reader and then do a post validation that our read was "complete". This is implemented in Chromium as a OneWriterSeqLock though comments and a TODO indicate there are other options. I won't go into those other options since I've not read that far into the code yet. So how does this guy work?

We wrap this structure into a GamepadHardwareBuffer (lock and underlying data) and wrap that again with a GamepadSharedBufferImpl (wraps the shared memory handle). We then synchronize on the lock object through operations such as WriteBegin and WriteEnd which are just forwards to the underlying lock implementation. Every time the writer starts a write, it increments a version counter. An odd value means the write is in progress and thus our readers should wait. When the writer is done, it increments the same counter. An even value means the write is done. To ensure concurrency an atomic structure is used, in this case the reads/writes to the version are synchronized. You can see this all happening in the GamepadProvider class.

The read operation then comes in two parts. First a ReadBegin to retrieve the version in synchronized memory. The possible return values are either an odd value or an even value. If odd, then the ReadBegin yields in some manner and waits until the writer is done. This assumes the writer is pretty fast and that it isn't writing at an insane refresh rate.

If even, then the ReadBegin returns the value and that is our loop version. We then read the data in shared memory. This should either be the memory associated with our current version or some future version. Remember, we aren't locking and so the writer could have incremented the version and begun writing to the memory after we started our copy.

Once our copy is done, we do a ReadRetry and pass in our value. We read the current value from the shared atomic, compare to our current version, and make sure they match. If they do, then it means the values we read from shared memory were those that were committed and matched our version number. If the version number has since changed, then it means our read may have data from more than one version and we should discard it and try to read again. This is all implemented in the GamepadSharedMemoryReader class. 

That wraps up our interrogation of the Gamepad API and how it shares data. There are no limitations on the number of clients in this case. Any client can wake up at any time and decide to read and the writer can always increment the version. There is very little contention other than the memory ordering requirements induced by the use of the atomic types. Note, that these atomic types are implemented under the covers by Chromium and are not simply std::atomic.

So, is this interesting? Are getting Chromium brain dumps from a passerby like myself helpful to anyone else?

1 comment:

  1. Yes, this was an interesting read and I would enjoy reading more of your observations. Thank you for explaining the synchronization strategy and the version counter's evenness as semaphore.

    ReplyDelete