Detailed Notes on Real Time Hand Gesture Recognition Project

Overview of Real Time Hand Gesture Recognition

  • Team Introduction: Aditi, Gloria, Kunal
  • Focus: Exploring, experimenting, debugging for a final project on real-time hand gesture recognition.
  • Scope of the Presentation:
    • Review original research.
    • Discuss dataset utilized.
    • Improvements made to the model.
    • Results achieved.
    • Real world applications and ethical considerations in AI.

Importance of Hand Gesture Recognition

  • Significance: Vital tool for enhancing human-computer interaction in various applications:
    • Virtual reality navigation
    • Surgical robot control
    • Sign language interpretation
  • Challenges in Accuracy:
    • Dynamic Nature of Gestures: Variance in gestures among individuals (e.g., shape, speed, angle).
    • Real Time Performance Requirement: Latency issues can disrupt user experience or hinder assistive technologies.
    • Environmental Variability: Impact of lighting, background movement, and occlusions on model performance.

The Central Challenge

  • Question Addressed: How can we improve accuracy and generalizability of hand gesture recognition models using accessible resources while ensuring efficiency for real-time use?
  • Reference Research: Influential paper by NVIDIA (02/2016): "Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D CNNs."
  • Modeling Framework: Two-stage deep learning framework:
    • Gesture Detector: Shallow 3D CNN for gesture detection in video streams.
    • Gesture Classifier: Deep ResNEXT101 network for gesture classification.
    • Key Feature: Early classification allows gesture detection before completion.

Dataset Utilized and Results

  • Datasets: Eco Gesture dataset and NVIDIA dynamic hand gesture dataset with thousands of samples.
  • Performance Metrics: Model achieved over 90% classification accuracy and reduced inference times.
  • Limitations Noted: Extensive computational resources and unavailable full source code.

Implementation Using Keras Image Data Generator

  • Baseline Model: Initial setup using IPN hand dataset with over 4,000 labeled gestures (video segments paired with annotations).
  • Early Model Limitations: No gesture detection stage or data augmentation; struggled with overfitting and lacked temporal modeling.
  • Data Augmentation Strategy: Used Keras Image Data Generator for applying transformations such as:
    • Random rotations.
    • Brightness variation.
    • Horizontal flips.
    • Width and height shifts.

Model Improvement via Data Augmentation

  • Training Accuracy Results:
    • Baseline accuracy: 49%
    • After augmentation: 53.4% with validation accuracy growing from 50% to 57.7%.
  • Stability Observed: Improved stability in training loss curves indicated better learning without mere memorization of training data.
  • Goal Achieved: Addressed overfitting and improved generalization without altering model architecture.

Strengths and Limitations of Data Augmentation

  • Strengths:
    • Easy implementation, no structural changes needed.
    • Enhanced model generalization and reduced overfitting, especially in early training stages.
  • Limitations:
    • Cannot substitute for complex temporal modeling necessary for capturing motion.
    • Accuracy plateaued with excessive augmentation.

Ethical Considerations in Gesture Recognition

  • Privacy Concerns: Use of always-on cameras raises surveillance issues.
  • Bias Risk: Inadequate diversity in datasets may lead to unequal performance among different users.
  • Potential Misuse: Spoofing gestures to trigger unintended actions poses risks.
  • Environmental Effects: Real-time processing can increase carbon emissions and device overheating.

Recommendations for Ethical Implementation

  • Transparency: Provide users information about gesture tracking.
  • Diverse Datasets: Train models on varied data to minimize bias.
  • Energy Efficiency: Utilize edge devices for processing.

Project Takeaways

  • Technical Insights:
    • Smooth setup of CNN and data loader in Google Colab.
    • Simple data augmentation implementation yielded significant results improvement.
  • Challenges Faced:
    • Raw data matching and label formatting required debugging.
  • Future Directions:
    • Explore full video clips and more advanced architectures (e.g., 3D CNNs, LSTM).
    • Work towards real-time systems operational on low-cost devices.

Conclusion

  • Project Results:
    • Enhanced 2D CNN with data augmentation outperforming baseline model on IPN hand dataset.
    • Observed stability in training and reduced overfitting while utilizing lightweight approaches.
  • Future Goals:
    • Assess capabilities of advanced architectures and optimization techniques for improved gesture recognition solutions.