Study Notes on Pose Estimation and Tracking

Pose Estimation and Tracking

Introduction to Camera Calibration

  • Camera calibration must precede any camera-based visual tracking tasks.

  • Previous discussions involved camera calibration and registration.

Augmented Reality (AR) Applications

  • Real-time pose tracking enhances performance through frame-by-frame registration.

  • Registration aligns virtual objects with real-world counterparts.

  • AR requires tracking technologies for user poses or display poses.

    • Registration: Positioning and orienting virtual objects relative to real-world objects.

    • Tracking: Aligning virtual objects with moving cameras or objects.

  • Frame-by-frame pose estimation intertwines estimation and tracking in marker-based AR.

Objectives of the Chapter

  • Different pose estimation techniques and tracking types will be explored.

  • Focus on:

    • Stationary, sensor-based, optical, and hybrid tracking techniques.

    • Mechanical, electromagnetic, and ultrasound tracking systems (Section 3.4).

    • Mobile sensor tracking (GPS, Wi-Fi, magnetometer, gyroscope).

3.1 POSE ESTIMATION

  • Definition: The relative pose between a camera 'C' and an object 'A'.

  • The camera's pose consists of its position and orientation in the world frame.

  • Pose estimation involves calculating the camera's relative pose in six degrees of freedom (6DOF).

    • Example: A single bike on a track has one degree of freedom (1DOF).

    • Rigid bodies in space have 6DOF: three translations (X, Y, Z) and three rotations (around X, Y, Z).

  • In AR, 6DOF pose relates the camera to real-world objects.

Absolute vs. Relative Pose
  • Absolute pose: Calculated with respect to a fixed real-world position.

  • Relative pose: Calculated between the camera and marker, independent of motion.

  • Application Dependent: Whether absolute or relative pose is required depends on the specific application context.

    • Example: City design with paper markers requires only relative pose until gravity direction is considered where absolute pose becomes essential.

6DOF Pose Calculation
  • Techniques like ARToolKit can estimate pose from camera input without additional sensors.

  • The transformation between camera and marker is vital for augmented visualization.

    • Black-and-white square marker pose estimation typically requires four corner points to create a transformation matrix.

    • The overall transformation enables virtual objects to align correctly with their respective markers.

3.1.1 POSIT Algorithm
  • Definition: POSIT stands for POS with ITeration, utilized for estimating position and orientation in 3D.

  • The algorithm iterates to find the 6DOF via a minimum of four non-coplanar points.

    • Center: (0, 0, 0) serves as the convoy reference point; each selected point's coordinates aid in the transformation matrix calculation.

  • Outputs:

    • 3 x 1 translation vector.

    • 3 x 3 rotational matrix (collectively forming a transformation matrix).

  • Transformation Application: Multiplying the transformation matrix by real-world object coordinates shifts the model from world to camera coordinates, facilitating augmented views.

3.1.2 Pose Estimation from Homography
  • Definition: Homography involves mapping points between two planes.

    • A point P in one image and point P' in another image correlate via a 3 x 3 homography matrix H.

  • The relation is expressed mathematically as P' = HP, extending to warped image transformations.

  • Points in 3D to 2D mapping involves intrinsic camera parameters for correcting perspective distortions.

  • Homography Application: In marker-based tracking, four marker corners form a plane for image interaction.

    • Relation to camera parameters can recalibrate intrinsic and extrinsic parameters, thus aiding in camera calibration.

3.1.3 Pose Estimation from Vanishing Points
  • Definition: A vanishing point is where two parallel lines appear to converge in 3D space.

  • Represented mathematically in both world coordinates and captured images, facilitating object orientation understanding during pose estimation.

  • General principles indicate that based on where a vanishing point is located in relation to the camera and image, the 3D orientation is inferred from the measurements.

3.1.4 Pose Estimation from Epipolar Geometry
  • Concept Overview: Two images from different camera angles yield relative translations and rotations through epipolar geometry.

  • The relationship between points in 2D images can be depicted by epipolar lines and alignments.

  • Arc through essential matrices derived from image projection assists in locating corresponding features in 3D space.

3.1.5 Pose Estimation from Perspective-n-Point (PnP) Problem
  • Overview: The perspective n-point problem calculates pose based on 2D-3D correspondences using known parameters from calibrated cameras.

  • Methods: Solved via Direct Least Squares, P3P, and EPnP as some algorithms widely adopted for efficiency and flexibility based on different spatial contexts.

3.1.6 Pose Estimation from Visual Odometry and SLAM
  • Visual Odometry: Estimation of camera movement relative to a fixed point using a series of images to track the 6DOF path.

    • Monocular vs. stereo visual odometry models impact setup complexity and accuracy in real-time environments.

  • Simultaneous Localization and Mapping (SLAM): Challenges arise as mapping of the environment occurs in tandem with the camera's positional abilities.

3.1.7 Robust Pose Estimation
  • Utilizes RANSAC to sort inliers and outliers while fitting the models to observed data; powerful for real-world applications plagued with noise.

3.1.8 Pose Estimation Using a Square Marker
  • A systematic approach includes real-time detection of squares in captured imagery iteratively.

    • Steps involve identification, transformation to correct positions, and fine-tuning the estimated poses to ensure stability in AR environments.

3.2 POSE TRACKING IN AR

  • Tracking measures camera motion relative to real-world markers to update overlaid virtual content dynamically.

    • Enforces structural and positional promises that state that virtual elements must transition in accordance when real objects shift.

3.4 MOBILE SENSOR-BASED TRACKING

  • Tracking through various mobile sensor technologies like GPS, compass, accelerometers, and gyroscopes.

  • Application Examples: The inclusion of sensors enhances interactions in the AR environment by supplementing positional data with environmental influences.

Conclusion

  • Understanding the mechanisms through which pose estimation and tracking are accomplished forms the backbone of efficient AR applications. Technologies to enhance tracking, whether grounded in markers or natural features, need nuanced applications to achieve the sophistication and practical effectiveness in dynamic environments.