Real-time stream processing involves analyzing data as it is generated, enabling immediate actions and insights.
Example: An online gaming company collects player interaction data in real-time and uses it to enhance in-game experiences and increase player engagement.
Streaming Pipeline Workflow
Producers:
Data sources (e.g., game interactions) that generate and push data onto a stream.
Stream:
A conduit for data flow.
Consumer Applications:
Applications that retrieve data from the stream, process it, and deliver results to a destination (e.g., the game).
Architecture depends on:
Data acquisition methods.
Processing requirements.
Destination of results.
Required speed of results.
Amazon Kinesis Family of Services
Includes
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Additional Services
Amazon Managed Service for Apache Flink
Amazon Managed Streaming for Apache Kafka (Amazon MSK)
Amazon Kinesis Data Firehose
Captures, transforms, and delivers streaming data to AWS storage destinations (e.g., S3, Redshift).
Minimal developer overhead.
Offers near real-time delivery.
Amazon Kinesis Data Streams
Provides tools for building custom producer and consumer applications.
Supports multiple consumers processing data from the same stream concurrently.
Amazon Managed Service for Apache Flink
Apache Flink is a framework for stream processing.
Enables real-time analytics on streaming data without separate consumer applications.
Processing occurs as data passes through the stream.
Amazon Kinesis Video Streams
Designed for capturing and processing large volumes of live video.
Suitable for playback, machine learning, and analytics applications.
Amazon Managed Streaming for Apache Kafka (Amazon MSK)
Based on Apache Kafka, an open-source distributed event streaming platform.
Requires more infrastructure management compared to Kinesis Data Streams.
Suitable for teams already using Apache Kafka.
Considerations When Choosing a Service
Complexity of stream processing varies.
Goal: Understand the distinguishing features of each service and their applicability to specific use cases.