2_Processing Real-Time Data

Real-Time Stream Processing

  • Real-time stream processing involves analyzing data as it is generated, enabling immediate actions and insights.
  • Example: An online gaming company collects player interaction data in real-time and uses it to enhance in-game experiences and increase player engagement.

Streaming Pipeline Workflow

  • Producers:
    • Data sources (e.g., game interactions) that generate and push data onto a stream.
  • Stream:
    • A conduit for data flow.
  • Consumer Applications:
    • Applications that retrieve data from the stream, process it, and deliver results to a destination (e.g., the game).
  • Architecture depends on:
    • Data acquisition methods.
    • Processing requirements.
    • Destination of results.
    • Required speed of results.

Amazon Kinesis Family of Services

  • Includes
    • Amazon Kinesis Data Firehose
    • Amazon Kinesis Data Streams
    • Amazon Kinesis Video Streams
  • Additional Services
    • Amazon Managed Service for Apache Flink
    • Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Amazon Kinesis Data Firehose

  • Captures, transforms, and delivers streaming data to AWS storage destinations (e.g., S3, Redshift).
  • Minimal developer overhead.
  • Offers near real-time delivery.

Amazon Kinesis Data Streams

  • Provides tools for building custom producer and consumer applications.
  • Supports multiple consumers processing data from the same stream concurrently.
  • Apache Flink is a framework for stream processing.
  • Enables real-time analytics on streaming data without separate consumer applications.
  • Processing occurs as data passes through the stream.

Amazon Kinesis Video Streams

  • Designed for capturing and processing large volumes of live video.
  • Suitable for playback, machine learning, and analytics applications.

Amazon Managed Streaming for Apache Kafka (Amazon MSK)

  • Based on Apache Kafka, an open-source distributed event streaming platform.
  • Requires more infrastructure management compared to Kinesis Data Streams.
  • Suitable for teams already using Apache Kafka.

Considerations When Choosing a Service

  • Complexity of stream processing varies.
  • Goal: Understand the distinguishing features of each service and their applicability to specific use cases.