Detailed Notes on Real Time Hand Gesture Recognition Project
Overview of Real Time Hand Gesture Recognition
- Team Introduction: Aditi, Gloria, Kunal
- Focus: Exploring, experimenting, debugging for a final project on real-time hand gesture recognition.
- Scope of the Presentation:
- Review original research.
- Discuss dataset utilized.
- Improvements made to the model.
- Results achieved.
- Real world applications and ethical considerations in AI.
Importance of Hand Gesture Recognition
- Significance: Vital tool for enhancing human-computer interaction in various applications:
- Virtual reality navigation
- Surgical robot control
- Sign language interpretation
- Challenges in Accuracy:
- Dynamic Nature of Gestures: Variance in gestures among individuals (e.g., shape, speed, angle).
- Real Time Performance Requirement: Latency issues can disrupt user experience or hinder assistive technologies.
- Environmental Variability: Impact of lighting, background movement, and occlusions on model performance.
The Central Challenge
- Question Addressed: How can we improve accuracy and generalizability of hand gesture recognition models using accessible resources while ensuring efficiency for real-time use?
- Reference Research: Influential paper by NVIDIA (02/2016): "Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D CNNs."
- Modeling Framework: Two-stage deep learning framework:
- Gesture Detector: Shallow 3D CNN for gesture detection in video streams.
- Gesture Classifier: Deep ResNEXT101 network for gesture classification.
- Key Feature: Early classification allows gesture detection before completion.
Dataset Utilized and Results
- Datasets: Eco Gesture dataset and NVIDIA dynamic hand gesture dataset with thousands of samples.
- Performance Metrics: Model achieved over 90% classification accuracy and reduced inference times.
- Limitations Noted: Extensive computational resources and unavailable full source code.
Implementation Using Keras Image Data Generator
- Baseline Model: Initial setup using IPN hand dataset with over 4,000 labeled gestures (video segments paired with annotations).
- Early Model Limitations: No gesture detection stage or data augmentation; struggled with overfitting and lacked temporal modeling.
- Data Augmentation Strategy: Used Keras Image Data Generator for applying transformations such as:
- Random rotations.
- Brightness variation.
- Horizontal flips.
- Width and height shifts.
Model Improvement via Data Augmentation
- Training Accuracy Results:
- Baseline accuracy: 49%
- After augmentation: 53.4% with validation accuracy growing from 50% to 57.7%.
- Stability Observed: Improved stability in training loss curves indicated better learning without mere memorization of training data.
- Goal Achieved: Addressed overfitting and improved generalization without altering model architecture.
Strengths and Limitations of Data Augmentation
- Strengths:
- Easy implementation, no structural changes needed.
- Enhanced model generalization and reduced overfitting, especially in early training stages.
- Limitations:
- Cannot substitute for complex temporal modeling necessary for capturing motion.
- Accuracy plateaued with excessive augmentation.
Ethical Considerations in Gesture Recognition
- Privacy Concerns: Use of always-on cameras raises surveillance issues.
- Bias Risk: Inadequate diversity in datasets may lead to unequal performance among different users.
- Potential Misuse: Spoofing gestures to trigger unintended actions poses risks.
- Environmental Effects: Real-time processing can increase carbon emissions and device overheating.
Recommendations for Ethical Implementation
- Transparency: Provide users information about gesture tracking.
- Diverse Datasets: Train models on varied data to minimize bias.
- Energy Efficiency: Utilize edge devices for processing.
Project Takeaways
- Technical Insights:
- Smooth setup of CNN and data loader in Google Colab.
- Simple data augmentation implementation yielded significant results improvement.
- Challenges Faced:
- Raw data matching and label formatting required debugging.
- Future Directions:
- Explore full video clips and more advanced architectures (e.g., 3D CNNs, LSTM).
- Work towards real-time systems operational on low-cost devices.
Conclusion
- Project Results:
- Enhanced 2D CNN with data augmentation outperforming baseline model on IPN hand dataset.
- Observed stability in training and reduced overfitting while utilizing lightweight approaches.
- Future Goals:
- Assess capabilities of advanced architectures and optimization techniques for improved gesture recognition solutions.