A Mobile VR Friendly Render Pipeline

This article is fundamentally a collection of some interesting rendering techniques to increase the quality of graphics possible using the mobile chipsets in standalone VR headsets.

Intro

I am a cinematographer / filmmaker with a strong background in audio visual product development and a keen interest in aesthetics and conjuring narrative experiences… both passive and interactive. With the rise of virtual production bringing real-time rendering techniques, typically reserved for games, into the film production pipeline and fully immersive Virtual Reality experiences now accessible to a wider audience, I decided to use the “quieter” period that started in March 2020 to rekindle my old interest in computer graphics by trying out some game engines. As is often the case when I dive into a topic, I quickly came across some potentially unnecessary limitations holding things back, which in this case was the visual fidelity of the current VR game titles… so I just had to dive much further down the rabbit hole into the world of graphics programming and game engine development.

Though I may never be in a position to integrate them into my own game engine, I thought it might be a good idea to collect them all in one place and, who knows, they may still be useful to some fellow developers & enthusiasts out there…

Why bother?

Now that there are freely available, incredibly powerful, fully featured game engine options out there for independent developers with their own tools, and customisable pipelines, why would anyone be interested in creating their own?

Specialisation: Fundamentally, the big engines are designed to cater to a wide user base of content types and genres on as big a range of devices as possible. Consequently, compromises have to be made regarding available features and performance, which limits your options when you are targeting something very specific or are intending to implement innovative solutions to problems.

Bloat: Multifunctional systems within systems within systems inevitably lead to bloat. Games are all about performance, even more so in VR. If you can reduce the complexity of your codebase you can improve runtime performance, reduce memory requirements and speed up compile times significantly.

Control: Neither of the two main engine options provide easy direct coal-face access to the render pipeline… Unity is closed-source and the scriptable options are still too limiting, whereas Unreal has to be compiled from source every time you want to add a custom shader!

I can understand why they have followed these paths, however I must follow my own…

I feel that a niche market such as mobile VR is stuck between the needs of high-end AAA pipelines and those designed for low-end Mobile & 2D applications, whereas the hardware already available is capable of so much more.

Using my background in cinematography, I have identified some key areas that imho could greatly improve the visual fidelity and realism of environments… techniques which are now pretty common in PC/Console games, but where the big engines do not support them in an efficient way for mobile.

Tiled Renderers

Mobile chipsets work in a fundamentally different way to desktop hardware. The frame to be rendered is split up into a grid of tiles and all of the drawing is done independently for each specific tile, which is then copied back to the relevant buffer in main memory once it is complete – this “resolve” process is comparatively very slow… but it is very power/heat efficient to deal with smaller chunks of memory in this fashion.

This means that each tile only has access to info contained within itself, anything outside of that may as well not exist. (And in truth they often don’t until all tiles have been drawn and resolved) It also means that you can potentially discard info that does not need to be retained at the end of drawing a tile… eg. Depth/Stencil buffer (though actually depth will eventually prove useful for us later on)

The main reason for this is to eliminate the need for holding enormous amounts of data in multiple full screen memory buffers, sapping battery and generating vast amounts of heat. Instead, the GPU has a small amount of very fast “on-chip” memory, just enough for the tile in question.

This makes a traditional deferred lighting rendering pipeline (the default of many desktop-focussed engines/pipelines for the last decade) with post-processing effects and numerous buffers constantly copying data around, hugely inefficient and completely impractical.

However there are some features available that can be comparatively less expensive on this hardware due to the fast on-chip tile memory:

MSAA Auto Resolve – As all geometry is drawn in a single pass, each pixel can easily contain up to 4 samples (depending on coverage) which are automatically resolved to a single value that is copied to main memory without needing an additional intermediate buffer.

Early-Z & Depth Query – To save on unnecessary shading calculations, the GPU will test if the pixel to be shaded will pass/fail the depth test before shading. This hardware also allows us to use a GLES extension (shader frame buffer fetch) to query the previous value held in the depth buffer for the pixel we are currently shading.

Alpha Blending – Fixed function alpha blending has immediate quick access to the pixel beneath it rather than reading back from a buffer in main memory.

What this leads to is a total change in methodology where anything other than geometry that will be drawn (including post effects, tonemapping values etc.) must be created and/or made available to the main drawing pass before it starts, so that they can simply be drawn on a full screen quad in the main pass or included in the shader calculations, rather than using a blit/shader operation to copy from one buffer to another. (As this would incur one or more full screen round trips to main memory, which is exactly what we are trying to avoid)

HDR pipeline using 8-bit Render Targets (Luminance in alpha)

The final framebuffers on this hardware only support 8 bits per channel (RGBA8) so there is little point in using High Dynamic Range (HDR) render targets (Eg.16 bit) as this would once again transfer around unnecessarily large quantities of data that will cost too much in terms of bandwidth and will eventually need to be downconverted to LDR anyway.

An 8 bit buffer can only store 2⁸ = 256 discrete levels of intensity. Considering that the sun is approximately 1000x brighter than a standard lightbulb, this rather limits our ability to create a more realistic simulation of how light behaves in the real world.

However, we can still do our shader lighting calculations using enough precision for a higher dynamic range and then simply convert them to the lower range before output. Even using “half” precision 16 bit float values in the shader gives us ~65,000 intensity levels which is plenty.

The process to convert HDR values to LDR values is usually called Tonemapping. The most basic way to do this is called Reinhard tonemapping and uses the following formula:

V_out = V_in / (V_in + 1)

This will simply map the range of 0->Infinity to the range 0->1. This works well for the lower brightness values (ie. below 1) which get mapped to the first half of the range (0->0.5) but all brighter values get significantly compressed into the remaining 0.5->1 range. This can lead to the need to manually tweak the brightness values of light sources to maintain a good look throughout environments with a mixture of very bright and dark areas. (It can be very useful for testing, but we can do better)

An alternative is to automatically adjust the compression curve using an exposure parameter… But where do we get this value from?

What we ideally need is an average brightness (luminance) of the entire screen.

We can calculate the CIE weighted luminance for a given pixel because we already know its linear HDR colour, using this formula:

Y = 0.2126 * R + 0.7152 * G + 0.0722 * B

We start with a default exposure value of 0.5. Seeing as we are doing all drawing in one pass, we can then save the Y value to the alpha channel of the main framebuffer output and then take a fixed number of samples from it to calculate an average for the whole screen before rendering the next frame.

You may be wondering why not sample the colour values instead and calculate the luminance from those… well the luminance buffer serves another very important purpose that will become clear in the Bloom section.

(I currently create a 1×1 buffer and sample 25 points into it in a 5×5 grid. I then use glReadPixels to read it out to the CPU so I only have to do my exposure calculation once, then pass it back into all relevant shaders as a single uniform variable.)

GLubyte* exposurePixel = (GLubyte*)malloc(1 * 1 * sizeof(GLubyte) * 4);
glReadPixels(0, 0, 1, 1, GL_RGBA, GL_UNSIGNED_BYTE, exposurePixel);
float exposureSample = (GLfloat)exposurePixel[0] / 255;

This average is mixed into the old exposure value at a small percentage per frame. This works because we want the exposure value to change slowly over time like the iris in our eye does. Otherwise the screen will flicker wildly as we move the camera around. (The mix percentage can be used to control the speed of exposure change)

float exposure = mix(lastExposure, newExposure, 0.02); //changes speed of exposure reaction

The exposure based tonemapping can then be applied using the following formula:

vec3 toneMappedCol = vec3(1.0) - exp(-result * exposure);

One last thing we should do is gamma correction to move the linear colour space values to sRGB for display on the screen. This is done with a simple exponential formula as follows:

finalOutput = exp(tonemappedCol, 2.2);

And then we’re done right? …not quite, this is just the beginning of our pipeline!

Post-Processing Effects

Reprojection

One of the main objectives here is reducing the amount of data passing across the memory bus. This means that whenever we save something back to main memory, we’d like to re-use that information if at all possible in future. However the camera is constantly moving, so pixels from previous frames will not be in the same location in the new frame. But we already know where the camera was in the previous frame so, as long as we keep a copy of the old View matrix, we can reproject data from one to the other.

Obviously objects move around too, so there will be some disparity between the frames, but for low frequency effects like bloom, and volumetrics, the disparity is within acceptable limits, especially for camera rotations, which are the main translation in VR.

This same principle is basically how asynchronous time warp works for VR in the first place… but that is all done by the HW vendor’s runtime compositor, which is outside of our control.

When we render an effect to a full screen quad at the near plane (or preferably a single oversized triangle – we’ll get to that later) we reproject the corner vertices to their locations in the previous frame in the vertex shader and the rasterizer will interpolate them for us by the time we get to the fragment shader. We then convert these to ScreenSpace UV coords for sampling any buffers created in the previous frame.

vec2 reproject_UV(){

//NB: could reduce matrix multiplications by doing on CPU, but its only 6 vertices for a quad, so no big deal...

//Transform to worldspace
vec4 WS_newPos =  invView * invProjection * vec4(2.0 * aTexCoord - 1.0, 1.0, 1.0); //max depth is +1 because NDC +Zaxis faces into screen (in other spaces +Z is out of screen)
vec3 WS_Pos = WS_newPos.xyz / WS_newPos.w;

vec3 newPos = WS_Pos.xyz; //could do some other vertex calcs here if needed

//Convert back to NDC
vec4 NDC_pos = projection * oldView * vec4(newPos, 1.0); //using "oldView" matrix to reproject into previous frame
vec2 newTexCoord = NDC_pos.xy /= NDC_pos.w;
newTexCoord = 0.5 * newTexCoord + 0.5;

return newTexCoord;

Bloom & Dual-filtered blur (Luminance in alpha prevents feedback)

This is where it all started for me. I feel that when properly used, bloom adds so much to the aesthetic of a scene that I was unwilling to accept its absence… So knowing almost nothing about graphics programming, I set off on this journey.

My first port of call was Shadertoy to test out the efficient blur shader, and once I had got that working, I then followed learnOpenGL.com from start to finish to learn how to build a test environment… and it just carried on from there…

Bloom traditionally samples the fully rendered HDR scene into a new buffer using a high-pass filter to remove values below a certain threshold. This new buffer is then blurred (Often separated into several horizontal and vertical stages to reduce the sample count from N² to N+N for each stage, where N is the filter edge size) This bloom texture is then merged on top of the original scene in another full screen pass. Lots of buffers, lots of memory etc etc.

So the core idea behind our solution is to re-use the previous frame to create the bloom texture. This comes with a few issues though…

LDR Bloom

LDR bloom can be done by setting the high pass filter to values above say 0.9 in the LDR frame. However this tends to affect objects that would be close to white anyway eg. Snow, which creates a sort of over-the-top dreamy look (characteristic of a specific era of early 2000’s gaming when it was heavily overused) We want to avoid this and only apply it to truly intensely bright areas eg. Light sources.

Our frame is no longer HDR (we tonemapped it for display), so we need to recover how bright the colour was before we tonemapped it.

Wait… Didn’t we save the CIE luminance values to the alpha channel? Yes! We can use this to reconstruct an approximation of the HDR colour value. It is an approximation because we compressed both ranges down to 8 bits, which means we will potentially get banding when we re-multiply the values.

This doesn’t end up mattering because we are going to heavily blur the results. The compromise we do make is that the bloom result will be less saturated than if we were doing a true HDR setup. (Which we can actually compensate for by artificially adjusting the saturation in the shader) Overall the results still look surprisingly good.

//Attenuate previous frames brightness values using bloom threshold
float maskedLum = smoothstep(clamp(threshold - 0.1, 0, 1), 1.0, oldFrame.a);

//Calculate Bloom
const float gamma = 2.2;
const float desaturate = -0.8; //negative values increase saturation

vec3 bloom = pow(oldFrame.rgb, vec3(gamma)); //convert back to linear colour

bloom *= maskedLum; //mask bright regions based on luminance (inc. threshold)

bloom = vec3(mix(bloom, vec3(oldFrame.a), desaturate)); //modify saturation

Feedback Loop

If we overlay the scene onto itself every frame, won’t that cause a feedback loop? Yes it will… any areas that make it past the high pass filter will reinforce themselves and get brighter and brighter. Not good.

This is really where our luminance buffer comes into its own. We simply don’t write the luminance of the post processing quad to the alpha channel of the main frame buffer. The RGB values are blended with the buffer affecting its colour, but we skip writing the alpha value so the luminance remains the same. This breaks the feedback loop because a brightly lit area of bloom will not increase the luminance stored in the buffer.

To do this we use a cool feature of fixed function alpha blending called glBlendFunctionSeperate that allows us to independently control which value ends up in the RGB and alpha channel of the destination buffer.

We set it to the following:

glBlendFuncSeparate(GL_ONE_MINUS_DST_COLOR, GL_ONE, GL_ZERO, GL_DST_ALPHA);

This means our RGB values will blend together and slowly approach 1 as the brightness tends toward infinity. Whereas our alpha (luminance) values remain unchanged.

Efficient Blur

How do we get a wide enough radius blur without taking a crazy number of samples?

I looked into box filters, Gaussian curves, separable blurs, rolling averages, Kawasi etc etc. until my head was about to explode… And after much research I eventually stumbled across a presentation by Marius Bjorge (Arm) that took inspiration from a Kawasi blur and separated it into two asymmetric sampling patterns; one for downscaling and one for upscaling.

Asymmetric “Dual Filtered” blur leverages the bilinear texture filtering hardware to do a lot of the sample averaging for you (Like creating mipmaps, but without creating the distinctive sharp cross pattern) and is both exceedingly efficient, while producing excellent results.

All this requires is a pyramid buffer chain starting at ¼ resolution (1/16 area) which downscales, then upscales using the following sampling patterns:

We can adjust the amount of bloom blur by further offsetting the sampling locations, within a reasonable limit, before artefacts start to appear. (Usually somewhere around 3-4 texels, depending on how many steps there are in the pyramid)

Spare fourth channel

Seeing as we are not going to be writing the alpha channel from the bloom to the fullscreen quad rendered in the main pass, it means we have a spare channel that we can use to blur something else… Did I hear someone say, “soft shadows?” (We’ll come to that later in the shadows section)

Texture Samplers… only 16? Can we make use of the extras to reduce latency in simple shaders?

As far as I understand, a texture sampler is a hardware feature of modern GPUs that can be thought of like a telescope that can “see” a 2×2 pixel area of the current mipmap level of the texture they are assigned to, at the given coordinates. If you are sampling using bilinear filtering, the sampler will return the interpolated average of the 2×2 area, weighted by the texture coordinate location within the area. (But each of the individual values in the 2×2 area will end up in the cache)

Each of these “telescopes” operate (more or less) independently, so maxing out and using all 16 samplers will take the same amount of time as a single sampler (but will use more memory bandwidth to send back the info.)

This gives us the opportunity to optimise away some of our operations in shaders that sample the same texture multiple times… like our efficient blur. On the way down it takes 5 samples, and on the way up it takes 8… but we can simply assign the same texture to multiple samplers so that these take the same time as a single sample!

Raymarched Volumetrics

Now this one is a real holdover from my experience with cinematography; in film, one of the cheapest and most effective ways to add atmosphere to a suitably-lit scene is to use a haze machine to gently scatter light – it provides crucial depth cues and can really bring scenes to life.

The majority of this method came from an article on the Arm developer website. Basically we once again work in our ¼ resolution buffer and march along rays in View space performing lighting calculations at each step. (The main advantage of using View space is that the rays are perpendicular to the screen and the camera location is 0,0,0) This does mean that we’re going to need to use the graphics programmer’s old friend; linear algebra to translate points between spaces.

I should mention that linear algebra (matrix maths) is something that I never properly studied and was only introduced to it about a week before dropping out of University many years ago… Consequently, it’s one area that is really challenging for me, so if you struggle with it too, you’re not alone.

The really clever part of the process is how we use a small blue noise texture to offset the samples along the ray for each fragment to gain a much higher spatial resolution. This noise is then mostly eliminated by re-using the same blur pyramid that we already constructed for the bloom.

We do however need to be able to terminate a ray early if it is obscured by scene geometry. Otherwise we would be able to see light scattered in rooms behind walls that are directly in front of us. For this we need some knowledge of the scene. This is where it becomes necessary to keep the previous frame’s depth buffer information (when normally we would discard it) and reproject it so it mostly lines up with the new frame.

The Asynchronous Space Warp technique which appeared around a year after I started this project requires the depth buffer to reproject previous frames, so it seems like the big boys have also decided this info is worth the resolve cost. An interesting Qualcomm patent also appeared that would allow one to resolve a lower-than-full resolution version of a tile’s depth buffer to main memory. This would be ideal for effects like these calculated at ¼ resolution. I have chased the hardware manufacturers about when this feature will be available in their drivers via a GLES/VK extension, but they said they weren’t implementing it… yet!

Light Propagation in Participating Media

In traditional 3D rendering, we treat the space between objects as a vacuum where light travels unhindered. The fundamental principle behind volumetric fog is that a percentage of any light travelling through a point in space is directed towards the camera by bouncing off particles hovering in the air; we call this “in-scattering.” If we know the incoming light directions and colours etc. we can calculate this at each sample point along the ray using the following Schlick phase function approximation for Mie scattering:

pS(x,θ) = 1−k² / 4π(1- k cosθ)²

Tweaking these values to get good results can be a little fiddly, but once it works, it can look really good. One choice you can make to simplify the calculations is whether the scattering coefficient “k” is directional or not (isotropic vs anisotropic) This allows you to bias more scattering in a certain direction. Eg. towards the camera, if you have the performance budget for the extra calculations.

//calculate in-scattering using Schlick phase function approximation

float tau = 0.1;  //fog "thickness" value
float k = -0.5; //positive favours back-scatter …negative favours forward-scatter …zero is omnidirectional

float cosTheta = dot(L, -V); //invert View direction because phase function vectors should point away from sample position

float denom = 1.0 - (k * cosTheta);
inScatter = lightIntensity * tau * vec3( (1.0 - (k * k)) / (4pi(denom * denom)));
inScatter *= exp(-tau * length(fragPosV)); //Beer-Lambert attenuation due to fog (inverse square would be cheaper but not pbr)

Now obviously there is a limit to how many samples/calculations we can do along a ray; even at ¼ resolution, things add up quite quickly, especially if you have more than a few lights.

Not that we have talked about them yet, but another way that we can reduce the number of calculations is by using any existing shadowmaps. If we sample the shadowmap for a given light at the beginning of the current loop iteration, we can exit early if the current point is in shadow and skip the lighting calculations altogether. (This may not provide a huge benefit as texture samples are comparatively slow compared to compute instructions)

The original article has a clustered forward shading data structure set up to divide the camera frustum into discrete 3D volumes and a list of lights are assigned to each visible cluster using compute shaders. This can save on unnecessary lighting calculations and eliminate those for invisible clusters or those with no lights.

This test implementation currently uses traditional forward shading and iterates over a small number of lights – see the “future work” section for more info on this. The limit is currently set around 5-10 samples per ray with a max distance of 50-100m, though these values can be tweaked around, as mentioned previously.

Attenuation & Linear Depth Fog

Now we must consider the other half of the equation that we have not yet accounted for… loss of light due to “out-scattering.” Ie. Some light travelling between an object and the camera will bounce off the dust particles in random directions. As we are supposedly trying to keep things physically based (PBR) we should basically be darkening pixels based on the distance a ray travels through the fog as well…

However there is no need to do this as part of the volumetric calculation. The attenuation is simply based on distance (for uniform fog), so it can be done when rendering any geometry’s fragments in the main pass. We already know the fragments depth values during shading, so we can use it to darken them accordingly.

While we are calculating this, we might as well also add a simple linear depth fog at the same time, because it will (to a limited extent) hide the fact that our volumetrics have a cutoff distance and the calculations are basically already done. We can tint this fog to any colour of our choosing. (If we had a light probe setup, we could even consider tinting our fog, though this might lead to some strange effects… could maybe work? Hmmmm)

//Depth Fog
//---------
float LOG2 = 1.442695;
float fogDensity = 0.01;
vec3 fogColor = fs_in_dirLightTS.color * sunIntensity * 0.01;
float z = linearDepth;
float fogFactor = exp2( -fogDensity * fogDensity * z * z * LOG2 );
fogFactor = clamp(fogFactor, 0.0, 1.0);

result = mix(fogColor, result, fogFactor);

Lastly, we fade this effect out on our skybox as we look further above the horizon using a simple dot product between the skybox texture sampling vector and either the World “Up” axis. Otherwise the skybox will always be completely fogged, as it is rendered at the max depth value.

float horizon = pow(1.0 - clamp(dot(TexCoord, vec3(0,1,0)), 0, 1), 6);

result = mix(result, fogColor, horizon);

NB: This fog is of course using the perpendicular distance to the far plane, rather than the World space distance to the viewer which may be unacceptable to some people as it might make some objects in the distance fade in and out of the fog unrealistically as the camera moves around. To combat this, I tend to keep the effect quite subtle, just enough to simulate the scattering of ambient light in the atmosphere (more like Rayleigh scattering)… rather than going full-on “Silent Hill.” Alternatively see the contact shadows section about calculating the distance from depth.

Potential Improvements

The current method for the main volumetric calculation iterates along a ray in a simple loop for each fragment… This may not be a very good use of the highly parallel nature of the GPU. It may be more efficient to render a series of fullscreen passes for each sample along the ray. Though this would increase the number of samples of blue noise texture, it may also increase cache coherency for sampling the shadowmap… hmmm more testing to be done.

Alternatively we could target specific areas of the screen by rendering proxy geometry (quads for point lights and cones for spotlights) with this same volumetric shader. Targeting these smaller areas could concentrate fewer samples much closer to where they are needed, so we could potentially remove lots of unnecessary calculations.

I did experiment with some of the other techniques mentioned in the original article (feedback loop, depth/luma aware blur) but found that the change to the final result didn’t necessarily warrant the increased complexity; ie, the visual improvements were marginal at best… or significantly worse!

Low-Poly Obfuscation

The guidelines set out for mobile VR apps stated that we should be aiming for approx 10,000 Vertices/triangles per frame and ideally around 100 draw calls. A large number of titles have therefore gone down the route of a very simple colourful cartoon-like aesthetic with minimal textures on relatively low poly assets.

The next section outlines some of the possible techniques available to make lower poly counts less obvious and create much richer and more detailed environments.

One of the things to consider is that, though texture fetches are slow, the hardware has systems to hide this latency, especially if info for the texture fetch (eg. Coordinates) is known before the fragment shader runs. The hardware we’re targeting also has 16 texture sampler units, all of which support bilinear filtering and, more importantly, can be used simultaneously.

Iterative Parallax mapping (distance attenuation)

The first technique actually kind of breaks the “no dependent texture fetches” rule because it uses several lookups into a height map to offset the texture coordinates for the current fragment. This creates the illusion of a bumpy surface below the original polygon.

Alternative techniques like “Steep parallax mapping” and “Parallax offset mapping” use far too many texture fetches to be practical on this hardware, however the iterative technique can produce quite good results with only 3-4 iterations.

We can also simplify the technique slightly by leaving out the /w term which limits the maximum offset, preventing large height map values turning everything into a swimmy mess of pixels at very shallow angles.

The effect is most noticeable and effective close to the camera at relatively steep angles, so the next optimization we can do is to reduce the number of iterations in the distance where the effect is barely noticeable. A useful trick for this is to use the mipmap level that the texture sampler uses for a given fragment. We can query this by using another GLES extension called EXT_texture_query_lod.

If we simply use this to control the number of steps, there will be a visible seam between two levels where the number of iterations changes. To overcome this we just set a desired min and max Lod value and fade out the intensity (height bias) of the effect between these thresholds. Once beyond the max Lod we can set the number of steps to just 1 which basically cancels the effect. As long as the range over which the effect fades out is not too small (at least 2 mip levels), and the camera is not moving too fast, this is quite effective.

Another trick we can introduce using one additional height map texture lookup is a horizon-based shadow for a single dynamic light source, but we’ll look at that later.

Height blended multi-layer terrain shader (Splatmap / vertex colours)

So we all know that one of the best ways to reduce storage space requirements (especially when you have large 3D surface areas to cover) is to use seamless tiled textures. This can unfortunately lead to very obvious repeating patterns on walls and floors… even more noticeable on outdoor terrain.

A technique that goes a long way to improve this is to define a splat map where each colour channel represents a different material. (Each of which can have different scales – which really helps to hide the repetitions) This can be fed to the shader either as texture or as vertex attributes, depending on the complexity of the underlying geometry. (Ie. A single quad floor plane for a whole room will not have enough vertices to make the effect convincing)

In our PBR workflow, a material often uses three RGBA textures:

Albedo (RGB) + Transparency (A)
Normals (RGB) + Height (A)
Roughness (R) + Metallic (G) + Occlusion/Cavity (B) + Emission (A)

This means only three of the available sixteen texture samplers are used… and we can assign others to additional sets for other materials.

For any given fragment we are shading, we sample both sets, and interpolate between them using the relative height map values. This produces much more realistic looking results than a simple linear fade region at the boundary between two areas.

IMAGE: SIMPLE VS HEIGHT FADE

A neat trick for gaining an “extra” material without additional texture lookups is to incorporate a wetness parameter, as this only needs to modify the roughness, normal and colour of the underlying material. We simply apply this to the lower areas of the height maps (below a tweakable threshold) to create the appearance of puddles.

//puddle & wetness calculations
//-----------------------------
vec3 sedimentColor = vec3(0.3, 0.3, 0.2);

//simple version based on heightmap
float waterHeight = vertexColor.b + waterBias;

//soft transition region size controlled by multiplier eg. 1/0.2 = 5 (simpler maths)
accumulatedWater = clamp((waterHeight - height) * 5, 0, 1);
roughness = max(0.05, mix(roughness, 0.05, accumulatedWater)); //limit minimum roughness to control highlight size

albedo *= mix(1.0, 0.3, accumulatedWater); //darken albedo as it gets wetter
albedo = mix(albedo, sedimentColor, clamp(0.3 * (1 - height) * (accumulatedWater - 0.2), 0, accumulatedWater)); //deeper water will turn to sediment colour
//albedo = mix(albedo, sedimentColor, clamp((accumulatedWater - 0.9), 0, 1));
normal = mix(normal, vec3(0,0,1), vec3(accumulatedWater)); //urrently hardcoded as tangent space "up" vector

If we like, we can send these values to the shaders as uniforms, allowing us to change the wetness and puddle depths dynamically during gameplay… the first building block of a full weather system.

In future work I would like to investigate another cool technique called “texture bombing” which looks to hide the transition between repeats by manipulating texture coordinates within a given tile to break up the straight edge.

Blending props with terrain: Dual-source Alpha vs Alpha-to-Coverage (w/ Bayer matrix)

What if we want to have geometry intersecting the floor? (eg. Mounds of dirt or debris) Normally this will create an obvious straight line where the two polygons overlap. Can’t we fade out the region where they intersect? Well, yes we can… and we actually have a couple of options on how to do it, depending on how much overdraw we are willing to accept.

Dual Source Alpha

The simplest way is to render the prop objects as transparent, and modify their alpha values at the intersection – very similar to the technique used for soft particles. (A common way to do this is to use the World Space height and fade out pixels below a given distance above the floor… eg. The last few centimetres. As long as you don’t stick your head right at ground level, then you don’t typically notice the fade.

Remember though that we use the alpha channel for our Luminance value… so any of these transparent objects would no longer contribute to the bloom effect. This may or may not be an issue for some objects, but there is actually a function called “Dual Source Blending” that allows us to output two final values from our shader – one controls the transparency, while the other controls the value that is actually written into the framebuffer. So we can have our cake and eat it!

The main drawback of this technique is that it means all fragments for the prop objects will be alpha blended, which increases overdraw, because you typically render transparent objects after opaque ones, in back to front order.

Alpha to Coverage

The next technique is most often used for rendering foliage. (Where traditional alpha blending causes problems with depth buffer sorting)

It takes the alpha transparency output from your shader and converts it into the nearest number of coverage samples that will be stored in your MSAA tile memory. Let’s say we are using 4x MSAA; ie. Each fragment can store up to 4 samples. If an object is 50% opaque, it will only store 2 of the samples of this new object (the other 2 samples already held there remain unchanged). So when the fragment is eventually resolved to a single value, the 4 samples are merged and the final colour is a 50/50 mix.

This provides us with 5 levels of potential transparency (0 – 4) which works well for foliage as the edge transition is usually quite quick, spanning only a couple of pixels. However our transition area could be quite large so we will see clear banding using only 5 levels. To mitigate this, we can use an older technique developed for the print world… dithering!

IMAGE: Bayer matrix pic and explanation

The disadvantage of this technique is that it can only use the first alpha value to control the coverage, so unfortunately it breaks our luminance buffer bloom pipeline for objects using this technique.

If anyone from Meta Reality Labs, Qualcomm or Khronos is reading this… a GLES extension for “Dual Source Alpha to Coverage” would be a fantastic addition to the drivers… just saying!

Pixel Depth Offset (Early-Z friendly depth writes)

When researching this, I found it rather difficult to find any definitive information on how to implement PDO directly in a shader. (Lots of info/tutorials on how to enable and use it in Unreal and how their implementation supposedly doesn’t work with shadows). So this is only the implementation that seemed to behave as expected for me and may not reflect exactly how it is supposed to work.

The principle behind this technique is to alter the value that is written to the depth buffer for a given fragment, so that it breaks up the hard line where two triangles intersect. (usually between areas of the ground and static props).

The problem here comes if we try to modify a fragment’s depth within a shader using gl_FragDepth, as it will effectively disable the Early-Z test, which is not ideal for reducing potential overdraw. This is because logically we cannot normally know if a fragment’s depth will increase or decrease before the shader for the next primitive runs.

However in our case, we know that Parallax mapping is an illusion created below a surface, so we know that the depth will only ever increase. So, using another GLES extension GL_EXT_conservativeDepth, with the “greater than” layout qualifier, we can make a promise to the hardware that the shader will only ever increase the depth value, which will still allow the Early-Z test to be performed beforehand. We declare it in the shader like this:

Layout (depth_greater) out float gl_FragDepth

Now we need to actually calculate the new depth value.

If we imagine that we are looking straight along the world Z (forward) axis facing a flat surface directly in front of the camera, the heightmap’s lowest values would be offset by the maximum amount away from us. If we then rotate the plane 90 degrees about the X axis so it is lying flat with its normal pointing at the sky. From our perspective it becomes a line and there would be no change to the depth values. (The height map values now lie along the Y(up) axis)

So we can use a simple clamped dot product between the view direction and the geometric normal of the plane to control the magnitude of the pixel depth offset. (This can be done in any space you like, but our lighting calcs are done in tangent space, so we already have those vectors without further calculation)

Then we take this linear distance and add it to the linear depth of the current fragment (which we’ve conveniently already calculated for the depth fog).

Once we have this new linear (distance + offset) value, we can convert it back to a post-projective depth value using the near and far plane from our projection matrix and output it as our new gl_FragDepth.

//Pixel Depth Offset from heightmap
//---------------------------------
float NdotV = clamp(dot(vec3(0,0,1), abs(TangentViewDir)), 0, 1); //ensure depth is only offset along geometry normal
PDOdepth = NdotV * pdoScale * (1.0 - height);
PDOdepth += linearDepth;
float Depth = (1/PDOdepth - 1/near_plane) / (1/far_plane - 1/near_plane);
gl_FragDepth = Depth;

Lighting & Materials

Physically based rendering has really taken off over the last few years… I was genuinely astounded how far rasterization has come when I first saw how close to the path-traced results Blender’s Eevee renderer could get… Meanwhile, the rise of virtual production and affordable camera tracking is pushing many industries further into fully real-time rendered photo-realistic techniques that are captured in-camera with minimal or even no post-production.

One of the recommendations for mobile VR was to bake as much of your lighting as possible… where, in the major engines, it is saved into a lightmap texture for static objects and into some sort of probes (usually using spherical harmonics) for dynamic objects. This method, as implemented in the major engines, unfortunately leads to only diffuse baked lighting and relies on the reflection probes to supply the specular. My personal opinion is that the end result looks incredibly flat and throttles the realism of many mobile VR titles.

Directional lightmap techniques do exist but neither of the main engines seem to incorporate the AHD (Ambient Highlight Direction) method for this. The best examples and information I’ve found on the techniques come from Ready at Dawn’s presentations on “The Order 1886” …however the studio is now owned by Meta and so recent examples of their tech would most likely be found in the Lone Echo games… which look phenomenal, but are PCVR only and use a proprietary engine. The same applies to Half-Life: Alyx… a phenomenal looking PCVR-only game making heavy use of pre-rendered AHD lightmaps running on a proprietary engine.

To make any headway in this area would require me to write an entire lightmap generator (basically a full CPU path-tracer… or more likely a global-ray-bundle GPU renderer) from scratch and then convert/encode the results to an AHD lightmap and Spherical-Gaussian probes… that would be a serious project… maybe someday, but for now I’m kind of stuck testing with simple environment cubemaps and a few realtime lights.

Line Light Source

Speaking of real-time light sources, there are usually only three types traditionally used in the vast majority of engines. These are:

Directional (sun / moon – infinitely far away, parallel rays)
Point (light bulb – shines evenly in all directions)
Spot (torch – limited cone shape, with soft edge falloff)

However, I felt that it wouldn’t take that much more calculation to introduce at least one more type:

Line (fluorescent / LED tube – fixed length)

To implement this, we use a technique called the “representative point” method, whereby we rotate the light vector for a given fragment so that it points to the closest point on the surface of our light’s shape, and then do the calculations for a normal point light. This can be done for a multitude of light shapes, but in our case we are doing a simple line using a start and end point combined with some neat trigonometry tricks to simplify the calculations and make it more efficient.

The method is explained in excellent detail here: https://www.elopezr.com/rendering-line-lights/

The best example of this technique I have seen implemented in mobile VR is the laser pointer in the Red Matter game series, which uses a custom version of the Unreal engine that the folks at Vertical Robot modified and compiled themselves.

Cheaper PBR realtime lighting Calculations

The full PBR lighting calculations used in desktop and offline applications first developed by Pixar require quite a lot of operations, however I found a really interesting 2015 Siggraph presentation which proposes a cheaper BRDF than the traditional Smith/GGX (it combines the Geometry & Fresnel terms) that produces excellent results:

One other advantage of this simplified method is that it does not require a LUT for the BRDF lobe shape… which means not only does it require fewer operations, but one less texture lookup per fragment as well… which is always a bonus!

IMAGE: SPECULAR HIGHLIGHT COMPARISON WITH GGX

Channel Packing & matching textures with ASSIMP

If we accept that most objects are going to use a realistic PBR shader, utilising the “metallic” workflow, then they are going to need information about the following:

RGB – Colour (albedo)
A – Transparency (fully opaque for this material)

RGB – Normals
A – Height

R – Roughness
G – Metalness (zero values for this material)
B – Ambient Occlusion / Cavity
A – Emission (zero values for this material)

Breaking these into the three groups shown above conveniently allows us to pack all the information into the four available channels of three RGBA8 textures. Which means we only use three of the sixteen samplers for a single material layer.

The tricky part comes when passing this information from our 3D modelling package to our render engine… as, if we want to avoid writing our own model importer, we must get the info through the “Open Asset Import Library” (ASSIMP). Which, during import of models, does its best to guess which material/textures match up to its own internal data types.

So far I’ve had the best luck using the, now somewhat outdated and inefficient, OBJ format as this uses an exterior MTL text file to reference which textures are for what. This means you can manually tweak the texture “types” so that ASSIMP will understand them.

The main crux of the matter is that ASSIMP expects an OBJ to have a pre-PBR set of texture types… eg. Diffuse, Specular, Bump etc. which have now been mostly replaced with the newer Albedo, Normals, Rough/Metal/AO.

So we have to trick it by setting them as follows in the MTL file:

Albedo + Transparency -> “Diffuse” -> map_Kd

Rough/Metal/AO/Emission -> “Specular” -> map_Ks

Normal + Height -> “Bump” -> map_Bump

Currently it is a pain to have to do this manually, (especially as it seems to be highly case-sensitive) but I found other formats like FBX and glTF have a whole host of other issues to sort out (missing / inverted backface geometry etc.) and this actually works, so for now we’re sticking with it.

Once we have done this, we can then set the relevant texture ID slots to match these import types in our renderer’s model class.

When first building this test bed I originally followed along with the learnOpenGL tutorials which start off doing all lighting calculations in World space… and I confess, in a competition between getting Tangent Space Lighting, Volumetric Raymarching and ASSIMP to work, I don’t think I could tell you which was the biggest pain in the @$$!

Shadows

Contact-Hardening Soft Shadows using blurred Penumbra maps

I researched so many papers on soft shadows and tried quite a few techniques to get to where I am now… especially ones that could make use of the extra channel in the blur chain I already have available… Exponential shadowmaps were probably the most interesting, but suffer from the shadow becoming fainter, the closer the object gets to the occluder, which is almost the exact opposite from reality. Almost all other techniques required either additional depth buffers (eg. Variance) or higher bit depth than the default 24bit.

Some of the original VR developer guidelines suggested that realtime shadows were probably too much of a performance hit for use on these chipsets, but if we know we are going to make the shadows softer, there are some things we can do to minimise their overhead, right from the start.

Render lower resolution shadowmaps eg. 512×512
Use lower LOD models when rendering shadowmaps
Only render dynamic objects into shadowmaps

The details of using sampler2Dshadow to take advantage of hardware accelerated percentage closer filtering have been covered extensively elsewhere, so we’ll skip a lot of this… But what we do need, if we are to take additional samples for creating soft shadows, is some way of determining which fragments are within the penumbra region of the shadow, so that we only take additional samples for these specific fragments.

We are going to take advantage of the spare channel in our blur chain for this. First we run a simple edge detection filter to create a black and white representation of the edges of the geometry in the shadowmap. The shader will make pixels white where there are large depth disparities between neighbouring pixels. This only requires 5 samples in a cross pattern, so has pretty good cache coherency and runs on a ¼ resolution buffer so does not use excessive bandwidth. This is done in our brightmask shader and is saved to the alpha channel of our original bloom texture.

(Larger kernels can be used by allocating additional samplers, without significant additional latency overhead, seeing as the brightmask shader only samples a couple of textures)

IMAGE: EDGE DETECT KERNEL(S)

Once this has passed down/up the blur chain we wind up with a blurred representation of the edges of our shadow casters. We sample this using the same texture coordinates as our shadowmap to determine if the current fragment falls within this penumbra region. If it doesn’t, we know it is either fully lit or fully in shadow and only a single shadow sample is necessary.

If the fragment falls within the penumbra region then we initiate a two stage process for soft shadows:

First we need to know the distance between the current fragment and the occluder to set the number of samples (and the sample radius). We can’t just sample the shadowmap once because fragments in the outer penumbra region may fall outside the hard shadow border, so the depth of the occluder cannot be determined… ie. the shadowmap sample depth would match the fragment. So we take 4 samples in a cross pattern using a normal sampler2D and take the minimum of these values as our occluder depth. Now that we have our occluder depth we can calculate a ratio to control the sampling qty and radius.

(min_depth – current_depth) / current_depth

Using a simple min does lead to some over-softening artefacts where near and far occluders overlap. I have since switched to using the randomised disc pattern for this step as well to calculate an average. The gradient noise function is already being generated for the main shadow samples, so can be reused.

The next stage is to take up to a max number of samples into the shadowmap (16 in our case) using a vogel disc pattern. This uses a different rotation for each fragment, controlled using randomised gradient noise, generated in the shader using its screen space position “gl_fragCoord.”

It may be better for cache coherency to use a fixed grid sample pattern and vary it in discrete steps, (eg. 2×2, 3×3, 4×4) like the original paper suggested. However the method used here allows for any number of samples up to the max so is more flexible and also less likely to create noticeable transitions between softness levels.

There is still some tweaking to be done to balance the amount of blur in the mipchain (which controls the size of the penumbra region) with the sample radius in the shader… otherwise we will end up with noticeable edges of faint outer penumbra switching to 100% lit regions. (This seems to be less of an issue on the inner penumbra)

We have however successfully reduced both the number of fragments which need additional shadowmap samples and apportioned the relevant number of samples only to the fragments that need them… while simultaneously creating soft shadows that harden based on distance to the occluder.

Local Cubemap Shadows

I found the next method in another presentation made by Arm. It uses a similar concept to a light cookie, where a texture is used to control the intensity of a light source; the end result of which would normally end up working very much like a video projector.

Where this method differs is that a light cookie is fixed to the light source, this new technique uses a local cubemap texture which is fixed to the environment, allowing the light source to move independently. The environment’s opacity value is stored in the 4th channel of reflection probes when they are created. (Or even a new RGBA cubemap if you would like the option to have coloured shadows) The example Arm showed used a directional light outside of a room to cast the shadow of a window frame onto the walls/floor inside.

One of the main advantages of the technique is that it uses precomputed cubemaps with mipmaps and trillinear filtering, so soft shadows are actually less expensive than hard shadows. (As they sample lower resolution mips) Softness can also be controlled using a distance parameter to create contact hardening, just like we did for our penumbra shadows.

For any given fragment we can’t just use the alpha channel value from when we sample the cubemap for reflections, because that doesn’t point towards the light source. We need to use our to-light direction vector, then perform a local correction on it, to compensate for the fact that the cubemap may not have been created at the centre of the environment it represents.

We do this by calculating the intersection point of our light vector and the environment’s axis-aligned bounding box (AABB). Then we calculate the vector from the cubemap origin to this point, which is used as our sampling direction. This is all done in World space.

The technique opens up all sorts of interesting possibilities for enriching environments with external light sources eg. Vehicles whizzing by basement windows… dungeons with rotating cages… noir-inspired old elevators… underground trains rocketing through stations… luminous fairies in woodland glades… lightning flashes in the mad professor’s laboratory.

Heightmap Shadows

This is a very specific case, but can really sell the effect of an uneven surface if a light source is very close to it eg. In VR we might want to add a light source to our teleport destination pointer or throw a flare onto the ground a small distance from us, or perhaps a have flickering camp fire in a rocky cave.

Once again we are going to use our heightmap to take a single additional sample at a fixed distance in the direction of the light source. (If you have multilayer materials, you’ll need to take one sample for each heightmap and blend them as before)

If the light vector in tangent space is normalised, its z component basically indicates height/angle above the horizon:

0 = sunset

1 = midday (zenith)

(…While its x & y components indicate UV offset in texture space)

If we take a new sample (h2) a fixed distance from the original point (h1) we can treat this point as a new horizon. (The fixed distance should ideally be half of the illusory vertical displacement due to parallax mapping – this will lead to full shadowing halfway up a slope)

The new horizon height as viewed from h1 is:

max(h1 – h2, 0);

If the z component of the light vector is lower than this horizon then the point is “self-shadowed.”

Rather than using “if” statements in shaders, we can use step functions… or better yet smoothstep functions to ramp in/out of the shadowed areas. This both removes branching in our shaders and softens the shadows at the same time!

Like parallax mapping, this technique is most effective if the changes in the heightmap are gradual.

This is obviously a very crude approximation of raymarching, but for small distances (and with a suitable amount of smoothing) it can look quite convincing.

Screen Space Contact Shadows

The next effect is an attempt to make objects appear more grounded in their environment. This is often done using various ambient occlusion/obscurance techniques which, after a lot of testing, didn’t seem to provide a good cost/benefit proposition… ie. The effect was too subtle to warrant the additional processing it required. (An additional separate buffer taking many depth samples to calculate occlusion)

Though many titles attempt to use AO as a sort of magic bullet to create a crude approximation of global illumination, the effect does not actually behave the way light does in the real world. Its original incarnation was actually for a vast open world game with lots of foliage in relatively even outdoor lighting (Crysis), where light maps would not have been a viable solution.

Contact shadows on the other hand are hard directional shadows that are raymarched through the depth buffer from the current fragment towards a light source looking for occluders. To limit the number of required samples and potential visual artefacts, they are best used over small distances to fill in finer details that shadowmaps struggle to capture eg. details of our Pixel-depth offset terrain shader.

The effect once again requires a crude representation of the scene to search for occluders and so we will be using our depth buffer from the previous frame and reprojecting the results.

The standard way is raymarching in View space. (same as for volumetrics) If we do this naively, it leads to us sampling the depth buffer and then transforming every fragment position to view space using lots of matrix multiplication and inverse operations… which we would very much like to avoid doing in a fragment shader if possible.

(The second way is to transform our lighting vector instead so that we can march the reprojected depth texture directly in screen space. The non-linearity of the depth buffer and post-projected spaces like the screen makes this rather tricky… hence why, currently, I have not managed to get this working.)

So for now, View space it is… but we can do a lot of the pre-calculation in the vertex shader or on the CPU to make this significantly cheaper.

Seeing as we are rendering to a full screen quad, we can convert the corner coordinates to view space positions on the far plane, which will then be interpolated by the rasterizer to give us a “ViewRay” direction once we get to the fragment shader.

We also use our non-linear depth buffer value (ClipSpace.W/ViewSpace.Z mapped to 0-1 range) to recover the linear ViewSpace.Z value using parts of the projection matrix passed in as uniforms.

float ProjectionA = cameraFar / (cameraFar - cameraNear);
					
float ProjectionB = (-cameraFar * cameraNear) / (cameraFar - cameraNear);

float linearDepth = ProjectionB / (sampleDepth - ProjectionA);

The last part is the clever bit (that can easily catch you out) The ViewRay gives us the straight-line vector from the camera to our fragment position but the depth buffer stores the perpendicular depth from the near plane, so we can’t simply multiply the ViewRay by the linearized depth.

IMAGE: of straight grid vs curved grid.

However if we do a simple dot product between the ViewRay and the camera’s forward vector (0,0,-1 in view space) and multiply our previous result by this, it will give us what we need; our View space position to start our raymarching.

(There are other ways to do this like clamping the ViewRay Z value, or projecting onto the far plane… but this is the one that worked for me and somehow seems more intuitive… as it is similar to how I control my Pixel Depth Offset magnitude anyway)

Similarly we can pre-calculate our view space lighting vectors / positions too… and so far I have had the best results with a directional light… which is also the simplest as the toLight vector is the same for all fragments!

As we did with our volumetrics, we offset the rayStartposV using blue noise, which greatly increases our spatial sampling density to (mostly) eliminate stepping artefacts when using low sample counts (eg. 4) …but how do we remove the noise?

The naive approach would be to take multiple samples into this buffer when rendering our main geometry pass… however we can save some bandwidth by reprojecting the history of this shadow buffer onto itself. (Similar to some practices widely adopted in Temporal Anti-Aliasing algorithms) The main advantage of this approach is that it will increase the temporal sampling density and create softer/smoother shadows, especially if the camera is moving slowly. (we can also sub-pixel jitter the frustum to compensate for when the camera is static… though in VR this is probably not going to happen very often)

However we need to be careful of ghosting and occlusion/disocclusion artefacts. In TAA the most common approach is to use some form of neighbourhood clamping/clipping… but this would require us to have access to a small region of the newly calculated samples… which don’t exist yet. What we do have is the depth buffer from frame_n-1which we can sample in our main pass to compare to the depth of the fragment we are shading… though personally I found this creates more noticeable artefacts than the mild shadow ghosting from history feedback.

Also, seeing as we are calculating this on the previous frame’s depth buffer and reprojecting it, we need to account for missing information at the edges of the screen as the camera moves. Probably the least distracting way to achieve this (IMHO) is to set the texture sampling properties for the history to clamp the texture to the edge value. This does result in some smearing, but if we also use the camera motion vector as the mix factor between the history and the newly calculated (noisy) values, they are updated faster and you don’t really notice the noise if the camera is moving. Once the camera is static again, the samples will be smoothed/updated as normal.

The main limitation of this technique is that our motion vectors are calculated from the camera, so it works best for static geometry. Any moving object will blur/disappear in the shadow buffer. The same applies for very fast moving light sources… the buffer will not update quick enough. This is why TAA implementations render dynamic objects into a velocity buffer… however that is going to take up too much bandwidth for us.

Overall, I am not too concerned about the limitations as I remember that lots of the shadows for dynamic objects in Half Life: Alyx did not update in realtime… they faded/stippled in over time after you dropped an object and it became static. (Though further testing has led me to conclude that was more of an efficient shadowmap/mask update method for stationary light sources)

Waste of Time So Far

SSAO – effect was expensive to render requiring many samples and the overall end result was much too subtle to be worth the extra cost… plus it is a very crude attempt at GI and does not really reflect how light behaves in the real world.

Future Work

Profiling / Performance metrics
ImGUI
Fullscreen Quad vs Single Triangle cache coherency
Soft particles (disable depth writes) – EXT framebuffer fetch
Decals (view space vs. Clipped mesh?)
Box projected reflections (using AABB?)
Local cubemap ie. reflection probe generation
Switch volumetrics to alpha blend proxy geometry? (Two sided depth?) …or wait for clusters?
Clustered forward shading
Directional light maps (global ray bundles)
Spherical Gaussian light probes
Non-linear IBL roughness for mipmap selection. Unity uses: roughness *= 1.7 – 0.7 * roughness * maxLOD. (Also need implement smoother convolution for mip generation than bilinear… eg. Cosine, box, Gaussian, Kawase)

Render Loop Summary:

Update Lights, Process Input, Update Matrices etc.

Calculate Exposure – Sample 5×5 grid from previous frame into a 1×1 texture then glRreadPixels into value on CPU side
Draw Shadowmap(s) – Eventually only dynamic objects
Calc PostProcessing – ¼ res Volumetrics & Bloom (RGB) + Penumbra Maps (A) both sample ShadowMap directly but reproject previous frame’s Depth & RGB*A textures.
Process Blur Chain – 4 steps using asymmetric sampling patterns on downscale vs upscale.
Draw scene geometry – Shaders write luminance into alpha (for HDR, Bloom & Exposure)
Draw the skybox – currently 8bit but could go to 16 for HDRI environment map support
Draw PostProcessing Effects – single quad additively blended (but does not write to Alpha)
Final Resolve to Main memory

Resources:

Mobile VR Rendering Guidelines – https://developer.oculus.com/blog/pc-rendering-techniques-to-avoid-when-developing-for-mobile-vr/
Reverse Reprojection – https://gfx.cs.princeton.edu/pubs/Nehab_2007_ARS/NehEtAl07.pdf
Volumetric Fog – https://community.arm.com/arm-community-blogs/b/graphics-gaming-and-vr-blog/posts/clustered-volumetric-fog
Line Light Source – https://www.elopezr.com/rendering-line-lights/
Local Cubemap Shadows – https://community.arm.com/arm-community-blogs/b/graphics-gaming-and-vr-blog/posts/dynamic-soft-shadows-based-on-local-cubemap
Screen Space Contact Shadows – https://panoskarabelas.com/posts/screen_space_shadows/