Mixed Reality Lecture Notes

Mixed Reality for Immersive Experience

Immersive Technologies

  • Immersive technologies include:
    • Augmented Reality (AR)
    • Virtual Reality (VR)
    • Mixed Reality (MR)
    • Virtual Environment

Mixed Reality (MR)

  • MR combines real environment with virtual reality and augmented reality.
  • Reference: Milgram et al., "A Taxonomy of Mixed Reality Visual Displays," 1994, IEICE Trans. Information Systems.

Lecture Coverage

  • Marker-based Augmented Reality (AR)
  • Markerless Augmented Reality (AR)
  • Visual SLAM

Marker-based Augmented Reality

  • Process:
    • Video stream from camera.
    • Image converted to binary image.
    • Black marker identified.

Marker Detection

  • Why is it easy to detect the marker?
    • Simple computation.
    • Relies on edge and corner detection.
    • Surface color discontinuity.
    • Illumination discontinuity.

Marker Pose Calculation

  • Position and orientation of the marker relative to the camera are calculated.
  • T=P,RT = {P, R}
  • 3D Transformation = {position and orientation}

Virtual Object Rendering

  • Use the transformation (T) of the marker to position and orient the 3D virtual object.
  • T=P,RT = {P, R}
  • The virtual object is rendered in the video frame.
  • Marker registration.
  • Augmentation in origin.
  • Marker tracking.

Coordinates for Marker Tracking

  • Marker coordinates: (X<em>m,Y</em>m,Zm)(X<em>m, Y</em>m, Z_m)
  • Camera coordinates: (X<em>c,Y</em>c,Zc)(X<em>c, Y</em>c, Z_c)
  • Ideal screen coordinates: (x<em>c,y</em>c)(x<em>c, y</em>c)
  • Image distortion function: (x<em>d,y</em>d)(x<em>d, y</em>d)
  • Observed screen coordinates.
  • Registration error: incorrect pose (localisation and orientation) estimation during the tracking process.

Advantages of Marker-based AR

  • Easy to use and implement.
  • Efficient and real-time performance (low latency).
  • Feature-based tracking, which is very stable.

Disadvantages of Marker-based AR

  • If the camera moves away from the marker, the virtual content disappears.
  • Does not work with reflected light.
  • Marker must have strong borders and contrast.
  • Does not work with occlusion.

Image-based Augmented Reality

  • Using the marker as an image.
  • Feature detection algorithm.
  • Marker-based Revisited: Continuous tracking and tracking stability

Challenges in Image-based AR

  • Continuous tracking and tracking stability are challenging.
    • Keeps continuous track of feature points in each frame with respect to the next frame.
    • Keeps continuous track of image pose over time, thus detects outliers (pose calculation/pose estimation).
    • If the frame rate is slow, the pose may change significantly between frames (augmentation “jumps”).

Image-based AR Process

  • Video stream from camera.
  • Continuous feature detection.
  • Pose calculation.
  • Use the transformation (T) of the marker to position and orient the 3D virtual object.
  • Target image registration.
  • The virtual object is rendered in the video frame.
  • Augmentation in real world.
  • T=P,RT = {P, R}

Marker-less Augmented Reality

  • Optical Tracking
    • Marker tracking (e.g. ARToolKit square markers or known features in an image).
    • Available for more than 10 years.

Marker-less AR - Optical Tracking Types

  • Unprepared tracking: tracking in unknown environment (e.g. visual SLAM tracking).
  • SLAM (Simultaneous Localization and Mapping): this is a very important problem in mobile robotics.

Visual SLAM

  • Early SLAM system (1986-now).
  • Using computer vision and sensors.
  • Using cameras only, such as stereo view.
  • MonoSLAM (single camera) developed in 2007.

Visual SLAM Steps

  • Step 1: Tracking a set of points through camera frames.
  • Step 2: Using these tracks to triangulate their 3D position.
  • Step 3: Simultaneously use the estimated point location to calculate the camera which could have observed them.
  • Observing enough points can solve both structure and motion (camera path and scene structure).

Challenges for Visual SLAM

  • Camera moves through an unchanged scene.
  • Not suitable for person tracking, gesture tracking.
  • Outdoor tracking.

Mixed Reality Components using HoloLens Example

  • See-through display.
    • Aspect Ratio: 3:2
    • Resolution: 2K
    • Display Rate: 120-240Hz
  • Depth camera
  • Image sensors
  • Short and long-throw IR illuminators
  • Inertial Measurement Unit (IMU)
  • Light Engine
  • Color video camera
  • 4 gray-scale cameras
  • See-through Holographic Lens

Sensors Calibration

  • Intrinsic properties (Optical Centre, scaling): [f<em>x0C</em>x][f<em>x \quad 0 \quad C</em>x][0f<em>yC</em>y][0 \quad f<em>y \quad C</em>y]
    • Estimates the camera parameters.
  • Extrinsic properties (Camera Rotation and translation):
    [r<em>11r</em>12amp;r<em>13t</em>1 r<em>21r</em>22amp;r<em>23t</em>2 r<em>31r</em>32amp;r<em>33t</em>3]\begin{bmatrix} r<em>{11} & r</em>{12} &amp; r<em>{13} & t</em>1 \ r<em>{21} & r</em>{22} &amp; r<em>{23} & t</em>2 \ r<em>{31} & r</em>{32} &amp; r<em>{33} & t</em>3 \end{bmatrix}

Transformation Matrix in Mixed Reality

[a<em>11a</em>12amp;a<em>13a</em>14 a<em>21a</em>22amp;a<em>23a</em>24 a<em>31a</em>32amp;a<em>33a</em>34 0amp;0amp;0amp;1]\begin{bmatrix} a<em>{11} & a</em>{12} &amp; a<em>{13} & a</em>{14} \ a<em>{21} & a</em>{22} &amp; a<em>{23} & a</em>{24} \ a<em>{31} & a</em>{32} &amp; a<em>{33} & a</em>{34} \ 0 &amp; 0 &amp; 0 &amp; 1 \end{bmatrix}

  • Hololens Coordinate System (Zw, Yw)
  • World Coordinate System (yw)

Spatial Mapping

  • Definition: the process of a mixed reality device mapping the real space, for the device to create an understanding of it.
  • A mesh is created that lays over the real environment. A mesh looks like a series of triangles placed together, like a fishing net.
  • This is done through computational geometry and computer vision (visual SLAM).

Spatial Mapping Usage

  • Visualisation and navigation to position and display the virtual object correctly and grant the virtual object/agent/character the ability to navigate around
  • Physics and occlusion to perform physics simulation, e.g. the virtual object can bounce across the floor

Mapping Recognition

  • The process of mapping, registration, and recognition of non-static elements of the real world, which allows one to communicate between the real world and virtual objects
  • the user's hands are recognised and interpreted as left and right- hand skeletal models
  • five colliders are attached to the five fingertips of each hand skeletal model
  • Microsoft HoloLens

Collider Details

  • The collider is a sphere collider, which can be visually rendered to provide better cues for near targeting.
  • The sphere's diameter should match the thickness of the index finger to increase touch accuracy.

Interaction Models

  • Direct interaction, where 10 collidable fingertips are used can cause unexpected and unpredictable collisions.
  • 3D object manipulation using a bounding box.
  • Bounding box provides better depth through its proximity shader.
  • Gaze and head interactions (eye and head tracking).
  • Voice-based interaction
  • Microsoft HoloLens

References

  • Rokhsaritalemi, Somaiieh, Abolghasem Sadeghi-Niaraki, and Soo-Mi Choi. "A review on mixed reality: Current trends, challenges and prospects." Applied Sciences 10.2 (2020): 636.
  • Speicher, Maximilian, Brian D. Hall, and Michael Nebeling. "What is mixed reality?." Proceedings of the 2019 CHI conference on human factors in computing systems. 2019.
  • Kruijff, Ernst, J. Edward Swan, and Steven Feiner. "Perceptual issues in augmented reality revisited." 2010 IEEE International Symposium on Mixed and Augmented Reality. IEEE, 2010.