You can see the head of Brandon's branch in experimental here. You can also see some the log history for the most recent changes. Unfortunately, the "source of truth" for my diagrams is no longer visible after many rounds of rebasing, squashing, merging and otherwise keeping up with all of the blink changes getting rolled into the branch.
I'll lead with the diagram and then talk about some of the broad sweeps structures. Its a big diagram so expand it out to full size so you can really see what is going on.
In broad strokes the architecture is split into Browser, Renderer and GPU processes. The Oculus is shown in its own process since it is responsible for its own compositing and time warp functionality and has its own IPC.
Browser
The Browser process starts with a VRService and VRDeviceManager. These guys are how the browser wraps up all of the devices on your system. Each device is exposed via a VRDeviceProvider derivation and encapsulates the logic to both enumerate and poll the devices.
To communicate the VRDisplay, which you see in the Renderer process expects a VRDisplayPtr to be returned from Mojo across the IPC boundary. So all communication between the providers and the manager are done via Mojo compliant structures.
While a lot of boxes, this is conceptually simple. To implement a new device all you have to do is implement a new provider, plug in code that knows how to enumerate your device and communicate your device details back using the prescribed interface. The heavy lifting of communicating across the processes and controlling access to devices is all done in the hosting layers.
To bootstrap the VRService is registered with each Renderer using the Mojo service interface registration logic.
Renderer
The Renderer process is a bit more interesting and it contains all of the objects that talk to JavaScript and all of the V8 bindings. This is the layer that communicates composition details to the GPU process to make sure that submitFrame gets the texture from the WebGL surface to the device.
When a page request displays using navigator.getVRDisplays, the Mojo client service architecture kicks into high gear. There is some interesting bouncing around after you build a VRController (this is the client side of the VRService in the Browser process) to get to to the service through the frame's interface provider. This is pretty common for all services that bridge this gap and so I now consider this hookup boilerplate.
One curiosity at this level and maybe an optimization for the future is that each NavigatorVR will get its own VRController which means that the Browser may in turn communicate data to the same Renderer process multiple times. Normally for this type of architecture we try to keep these channels 1 to 1 and then multi-cast within the process.
Once the service connection is live the rest of the work happens mostly in VRDisplay, which in turn provides device information through some other required platform objects so that we could have a few properties. VRFrameData in the future will replace most of the extra interfaces, some of the enums, etc... Things get simpler in WebVR 1.1 and its a great time to fix this stuff and deprecate older interfaces since we don't have broad adoption and upgrading to the new API is fairly trivial.
GPU
The GPU to Renderer communications are using an old school IPC mechanism specific to to the GLES2 implementation. Any operation we need to request happen in the CPU process needs a new method and a bunch of auto-generated code. There were only 5 such methods during my original documenting step and they mainly related to creating, destroying and submitting frames to the VR specific compositor.
This entire interface, I've heard, may soon be replaced by Mojo and so digging into it doesn't seem like a very good use of time, but if there is enough interest I could share my notes here as well. It is somewhat interesting how the structures are built, copied, etc... Basically an entire GPU command buffer implementation specific to communicating between the Renderer and GPU.
Consequently, any VR implementation will need to implement a compositor. This is how the generic submitFrame call can communicate information to the device specific graphics stack which may take in a texture or be implemented as a swap chain. There are also device specific configurations such as whether or not to enable deformation or asynchronous time warp.
As of this diagram a potential problem is that the device is being accessed from two different processes. First in the Browser where there isn't a sandbox that prevents talking to the various device services. Second in the GPU where we instantiate, potentially, a second instance of the VR run-time to submit the frames. Because of this dual nature, "bad things"(tm) can happen and it is a source of communication and design within Chromium to figure out how to unify all of this into a single location. You could imagine solutions such as allowing compositing in the Browser process since this only implies doing a texture copy or swap. Or perhaps moving the VRService into the GPU process and getting the data from the single VR compositor source. A third option is to move all of this into yet another process which is complicating but great for Chrome's architecture.
Thankfully no matter which approach happens in the end, it most likely won't affect the Renderer code very much. Its an implementation detail rather than something that dramatically affects the shape of the API. It will affect performance and quality of the VR implementation to some extent, but that is always something to be finely tuned for best results.
Future Thoughts
I'll probably reserve any future discussions in this area until the code stabilizes a bit more. There are a lot of cool little changes coming up that I'm interested to see. For instance, the VRDisplay has to carry more data. In fact, to support certain compositors there might be device specific data that has to be sent through. Today, since the VR provider can be talked to from two locations its not possible to efficiently share this data so we'll have to figure out something. Shared memory maybe?
The security model around device access and frame focus will also be cool to talk about once all of that code lands. Currently there are bits and pieces available, but I don't see a fully cohesive picture. Lots of fun here!