2_Processing Real-Time Data
Real-Time Stream Processing
- Real-time stream processing involves analyzing data as it is generated, enabling immediate actions and insights.
- Example: An online gaming company collects player interaction data in real-time and uses it to enhance in-game experiences and increase player engagement.
Streaming Pipeline Workflow
- Producers:
- Data sources (e.g., game interactions) that generate and push data onto a stream.
- Stream:
- Consumer Applications:
- Applications that retrieve data from the stream, process it, and deliver results to a destination (e.g., the game).
- Architecture depends on:
- Data acquisition methods.
- Processing requirements.
- Destination of results.
- Required speed of results.
Amazon Kinesis Family of Services
- Includes
- Amazon Kinesis Data Firehose
- Amazon Kinesis Data Streams
- Amazon Kinesis Video Streams
- Additional Services
- Amazon Managed Service for Apache Flink
- Amazon Managed Streaming for Apache Kafka (Amazon MSK)
Amazon Kinesis Data Firehose
- Captures, transforms, and delivers streaming data to AWS storage destinations (e.g., S3, Redshift).
- Minimal developer overhead.
- Offers near real-time delivery.
Amazon Kinesis Data Streams
- Provides tools for building custom producer and consumer applications.
- Supports multiple consumers processing data from the same stream concurrently.
Amazon Managed Service for Apache Flink
- Apache Flink is a framework for stream processing.
- Enables real-time analytics on streaming data without separate consumer applications.
- Processing occurs as data passes through the stream.
Amazon Kinesis Video Streams
- Designed for capturing and processing large volumes of live video.
- Suitable for playback, machine learning, and analytics applications.
Amazon Managed Streaming for Apache Kafka (Amazon MSK)
- Based on Apache Kafka, an open-source distributed event streaming platform.
- Requires more infrastructure management compared to Kinesis Data Streams.
- Suitable for teams already using Apache Kafka.
Considerations When Choosing a Service
- Complexity of stream processing varies.
- Goal: Understand the distinguishing features of each service and their applicability to specific use cases.