The emerging range of 3D virtual reality headsets such as the Oculus Rift, Google Cardboard, Samsung GearVR, HCT Vive VR and Sony’s Project Morpheus provide unprecedented levels of immersion and presence, in devices that are affordable and soon to be widely available to the public. Two of these devices, the Google Cardboard and Samsung GearVR are powered by a consumer mobile phone, and the remaining three are powered by external computers or games consoles with dedicated graphics cards.
Developing software applications for these devices presents a different set of challenges when compared to normal 3D applications. The graphics card processing requirements are higher, the perceived visual quality is lower, and due to the immersive nature of the devices, failure to maintain a high frame rate with low levels of latency not only interferes with the user’s
the user knows that it isn’t real, they know that they are wearing a headset, but at a low level their brain does not.experience, but can also cause physiological side effects, such as motion sickness and dizziness. In addition to this, as physical movements within the real world are more closely mirrored in the virtual world, the physical constraints within both real and virtual worlds need to be considered for both user safety, and enhancement of the experience. This article is the first in a series of articles looking at development of and interactions within immersive virtual environments. Starting with the current limitations of the hardware, the article describes how to overcome some of these limitations, and how to get the most out of the current generation of devices.
A key factor in successful immersive virtual environments is the illusion of presence. When this illusion is achieved, the environment tricks the brain into thinking that what is being perceived is actually the real world. Now, the user knows that it isn’t real, they know that they are wearing a headset, but at a low level their brain does not. This means that psychological responses such as anxiety, fear and excitement can be induced in the virtual environment. We know from movies that all these emotional responses can also be induced by a 2D screen. However, in an immersive 3D environment, these responses can be significantly heightened, as the brain can more easily be convinced that what it is experiencing is real.
Low resolution – high performance
The new generation of virtual reality headsets work by having the screen very close to the user’s eye, using lenses to focus. As the visual display is so close to the eye and magnified, the effective resolution is very small when compared with modern monitors. This means that the image can appear to be pixelated, and blocky. This is commonly referred to as a screen-door effect, and it gives the feeling of looking at the scene through a semi-translucent barrier. Due to the immersive nature of the display, and the fact that the image displayed to the user needs to change quickly whenever they move their head, a very high frame rate is needed. The current Oculus Rift development kit recommends a frame rate of 75 frames per second (fps), which means that the graphics card has to re-draw the scene 75 times per second, per eye – so 150 times every second. Normal computer monitors can display at most 60 frames per second, so the demand on the graphics card will be at least 2.5 times greater. This frame-rate requirement is only going to get higher, with the upcoming HTC Vive headset reporting a 90 fps limit and Sony’s Morpheus 120 fps.
Even though the perceived resolution is quite low, the actual resolution is still quite high, and requires high performance graphics cards to maintain a high frame rate. As the resolution and refresh rates of the headsets become higher, so will the required GPU performance. This in itself poses a problem for the industry, as the most powerful commercial graphics cards may not yet be powerful enough to allow for high quality rendering of complex scenes at such a high frame rate. The top of the range cards will do a reasonable job, but will initially be prohibitively expensive for most people, and certainly more expensive than the VR headsets themselves. In order to reach the widest audience, application developers have to ensure that their application consistently runs as high a frame rate as possible, on the widest range of hardware, while at the same time providing a good experience.
Virtual reality headsets are notorious for causing nausea in users. Whether a user feels sick or not is dependent on a number of factors, and some people are more susceptible than others. Put simply, this occurs when what a person sees does not match up with what their brain expects to see. For example, if you move your head to the side and what your eyes see does not change in the way it normally does – in the way that the brain expects it to – then this can very quickly lead to headaches and sickness. Other factors that can cause discomfort are poor resolution, unresponsiveness or jerkiness, low frame rates and juddering. Flashing lights, fast changing scenes and a in-game constraints preventing movement can also cause issues.
When developing games and other interactive applications that traditionally have cut-scenes, static menus and fixed images, it is important to take this into consideration, and provide the user with some suitable feedback when they move. For example, a menu could be displayed as a semi-transparent overlay onto a simple virtual room, with the menu fixed in place but the room moving in the background. Cut-sequences could be recorded as character animation sequences instead of static videos, so the user will still be able to look around as if they are a part of the scene, or have it displayed on a screen in front of them, so it will be like watching a TV within the virtual room.
Although the new generation of VR headsets all provide a full 360 degree view based on head orientation, they are not all capable of detecting subtle lateral head movements. Without this, when you move your head side to side, or lean forward, the headset is unable to detect the movement and the displayed image remains the same. This not only breaks the illusion of presence, but it can also cause motion sickness. This is a key limitation of the mobile phone based solutions, such as Google Cardboard and Samsung GearVR. This issue has been tackled in different ways by the computer-driven solutions. The Oculus Rift has the ability to track the lateral movement of the user using an IR camera which is placed in front of the user and is synchronised with the headset to combine the rotational and lateral movement of the head. This is designed to detect subtle movements, and not to track the user moving around the room. Morpheus works in a similar way, using the Playstation camera to track both head orientation and movement. The HTC Vive stands out in this regard, as it is the only whole-room solution. Using wall-mounted sensors, it can track movement within a wide area.
Leaving the HTC Vive aside for the moment, The solution implemented in the Oculus Rift and Morpheus leads to another issue is the way that this movement is interpreted once inside a virtual environment. By moving your head to the side, it is possible to move the camera position in the 3D world outside the range of traditional in-game constraints. Although it is designed to compensate for small head movements, it can actually track quite a wide area, so if the user physically moves themselves to the side, the virtual camera will move accordingly. Depending on the type of application, this may or may not be a serious issue. If, for example, the user is flying through an open scene with no physical object constraints, then this will not be an issue. However, if the user’s character is physically constrained in some way, for example sitting in a chair, the range of possible motion will potentially move the camera so that the person is out of their seat. This could be made worse if it allowed them to move through a wall. Traditionally, the physical movement of a character is constrained by the movement controls, whereas the orientation and positional offset of the Oculus Rift headset is applied after, and only alters the view. What this means is, that the constraints that are placed on character movement also need to be enforced when the user moves their head. This may result in a break from the illusion of presence, if the users head moves in the real world, but is constrained within the virtual world.
This is particularly disturbing when the users view is positioned on a virtual avatar, so when the user looks down, they see a body and hands. When lateral movement is applied, the user will become completely disconnected from their body. One possible solution to this is instead of offsetting the camera position by the lateral movement, it is linked to a rotation of the hip, thereby offsetting the camera position implicitly (assuming a bone structure where the camera is fixed above the neck, and all bones are linked correctly). This will ensure that the camera remains fixed on the neck, without losing the benefits of lateral movement. It also means that the amount the head moves in the virtual environment can be partially constrained based on the skeletal kinematics of the character.
For devices which are not designed for wide range tracking, there is currently no built in way to distinguish between a small head movement and the user physically moving themselves. However, there are a few options that could be used to make this distinction. Video and depth camera tracking, such as that provided by the Microsoft Kinect, could be used to distinguish between simple head movement and physical movement of the user. The ability to detect this distinction could potentially be built into Sony’s Morpheus when it is released. Other alternatives such as the Sixense positional tracking controllers could also be used. The latter is potentially easier to integrate into a VR headset, and will in the long term provide the most complete range of motion, which could even be used to replace the built in position tracking which has limited range. Alternatives to this also include joint and positional tracking using real time motion capture suits, such as the PrioVR, or constraining physical movement using walking platforms such as the Virtuix Omni.
By extending this range in which a person’s movement is tracked, a new problem is introduced, which is how to stop the user from inadvertently colliding with the now hidden real world. By extending the amount that the virtual world can detect real movement, you now run the risk of the user moving too far in the real world and injuring themselves. When immersed in a virtual environment, you completely lose where you are in the real world, how close you are to the desk, the screen, and other obstacles in the room such as chairs and tables. This is not such an issue if you remain seated, but if you stand up and start walking around, it can become a problem. The HTC Vive provides in-game mechanisms for showing the walls or the room, but does require the room to be empty of obstacles, which may not be feasible for most people.
When developing any virtual environment, it is important get the right balance between quality and performance. As already mentioned, higher frame-rates will increase the realism and reduce the risk of nausea. Even with state-of-the-art graphics cards, this is not always achievable with complex scenes. Many of the scene partitioning and optimisation techniques that are used in modern 3D applications can still be used. However, in addition to standard scene optimisation techniques, there are additional steps that can be taken to help improve performance on virtual headsets. These optimisations were applied to our Virtual Forest simulation, to enable fast visualisation of a massive cloud point data set with over 350 million points.
Single-step scene culling. Scene culling refers to the removal of objects from the scene before the graphics card is told to render them to the screen. Ordinarily, the scene is culled from the perspective of the position and orientation of the virtual camera, removing objects that are outside the camera’s view frustum. However, for 3D headsets, there are 2 virtual cameras, 1 for each eye, at slightly different positions within the scene. This means that the scene is effectively rendered twice from a different position and that scene culling would normally be applied twice per render loop. Instead, by extending the view frustum used for culling and placing the starting point in-between and slightly behind the two view cameras, the objects that are not visible in either camera eye can be accurately culled in a single step.
View dependent render quality. Ordinarily, complex 3D scenes are broken down into collections of objects, where each object has multiple versions with different levels of detail. If an object is close to the virtual camera, then a high quality version of the object is used. If the object is further from the camera, then a lower quality version can be used without having a large impact on the visual quality perceived by the user. This reduces the complexity of the scene at any one time and can significantly increase the frame rate. When using a VR headset, the objects at the centre of the users vision is the area that the user is going to focus on most. Objects at the edge of the camera and especially the corners are going to be distorted and hidden due to the “fish-eye” distortion that is applied to compensate for the lenses. Therefore, in addition to reducing the quality of objects further from the camera, the quality objects in the edges can also be reduced. This means that only objects that are directly in front of the user need to be high quality, and all others can be reduced based not only on distance from the camera, but also distance from the centre of the current camera view. This optimisation can be calculated at the same time as the scene is culled, and areas on the periphery can be marked so that lower levels of detail are displayed.
Motion based adaptive degradation. In reality, when a person turns their head, they do not see the world as clearly and in as much detail as they do when their head is still. We can take advantage of this, and link the render quality to the speed in which the head is moving. If the user is turning their head quickly, the quality of the objects being rendered to the screen can be reduced, thereby increasing the frame rate for a smoother experience when turning. Depending on the type of scene being rendered, techniques such as level of detail and reduction in cloud point density can be used without the user noticing.
This combination of techniques will result in a higher frame rate, and smoother experience, without any perceived reduction in visual quality.
The technical landscape of virtual reality is changing rapidly, and new devices and ideas are emerging to completely change the way we think about and use 3D technology. These new technologies have applications in areas far more diverse than just games, and can provide new ways of working, communicating, interacting with people and understanding problems. These devices bring their own challenges, and we need to develop new sets of standards and best practices to keep up with the rapid changes in front of us.