Looks like no one added any tags here yet for you.
What is Apache Storm?
A distributed, fault-tolerant framework for real-time computation that processes data streams from sources like Kafka, Kinesis, and RabbitMQ.
What are the main Storm Concepts?
Topology, Stream, Spouts, Bolts, Tasks.
What is a Topology in Apache Storm?
A graph of computations that defines how data flows and is processed across the cluster.
What is a Stream in Apache Storm?
An unbounded sequence of tuples (data records) that flows through the topology.
What are Spouts in Apache Storm?
Components that act as sources of streams, emitting tuples into the topology.
What are Bolts in Apache Storm?
Components that process incoming tuples, performing operations like filtering, aggregating, or joining data.
What are Tasks in Apache Storm?
Parallel threads that execute spouts and bolts within worker processes to handle data processing.
What are the different Stream Groupings in Storm?
Shuffle, Field Grouping, All/Global, Direct.
What are the main components of a Storm Cluster?
Nimbus, Supervisor, Zookeeper.
What is Nimbus in a Storm Cluster?
The master node that manages topologies and distributes tasks to supervisors.
What is the role of the Supervisor in Storm?
Executes worker processes that run spouts and bolts as part of the topology.
What is Zookeeper used for in a Storm Cluster?
Coordinates the cluster by managing configuration, synchronization, and leader election.
What is Spark Streaming?
A high-throughput, fault-tolerant stream processing component of Apache Spark.
What are DStreams in Spark Streaming?
Sequences of Resilient Distributed Datasets (RDDs) representing data from specific time intervals.
What are the primary sources for Spark Streaming?
Kafka, HDFS, custom connectors, and other streaming data sources.
How are DStreams similar to RDDs in Spark?
DStreams represent data over time intervals, allowing for batch-like processing of streaming data.
What are the two types of DStream Transformations?
Stateless Transformations and Stateful Transformations.
What are Window Operations in Spark Streaming?
Operations that compute over sliding data windows, allowing aggregation and analysis within specified time frames.
Name some Window Operations in Spark Streaming.
Window, CountByWindow/ReduceByWindow, ReduceByKeyAndWindow, CountByValueAndWindow, UpdateStateByKey.
What is Apache Flink?
A framework for real-time, stateful stream processing that supports both bounded and unbounded data streams.
What are the main APIs provided by Apache Flink?
DataStream, DataSet, Table, CEP, Gelly, FlinkML.
What are Streaming Dataflows in Apache Flink?
Directed Acyclic Graphs (DAGs) consisting of sources, transformations, and sinks.
What deployment options does Flink Architecture support?
Local, cluster, and cloud deployments.
What libraries does Apache Flink provide?
Libraries for graph processing, machine learning, and event processing.
How does Apache Flink handle stateful stream processing?
By maintaining and managing state information across events.
What makes Apache Flink suitable for both bounded and unbounded data streams?
Its flexible architecture supporting batch and real-time processing paradigms.
Compare Apache Storm and Spark Streaming.
Storm focuses on real-time computation with a topology-based approach; Spark Streaming utilizes DStreams for micro-batch processing.
Compare Apache Flink with Spark Streaming.
Flink provides true stream processing with low latency; Spark Streaming uses micro-batching.
What is the primary advantage of using Apache Flink for real-time analytics?
Its ability to handle both batch and stream processing seamlessly.
What is a Tuple in Apache Storm?
A data record emitted by spouts and processed by bolts.
How does Shuffle Grouping work in Storm?
Distributes tuples evenly across all target bolts.
What is Field Grouping in Storm?
Groups tuples based on specific fields.
What is the purpose of All/Global Grouping in Storm?
Broadcasts each tuple to all bolt instances.
What does Direct Grouping enable in Storm?
Allows the sender to specify the exact bolt instance for each tuple.
What ensures fault tolerance in Apache Storm?
The distributed architecture with supervisors and the use of Zookeeper for coordination.
What is the main use case for Spark Streaming?
High-throughput, fault-tolerant stream processing for real-time data analytics.
How do DStreams achieve fault tolerance in Spark Streaming?
By using RDD lineage information and checkpointing.
What is checkpointing in Spark Streaming?
A mechanism to save the state of DStreams to reliable storage.
How does Apache Flink achieve high performance in stream processing?
Through advanced scheduling, efficient state management, and support for event-time processing.
What is CEP in Apache Flink?
Complex Event Processing API for detecting patterns in event streams.
What is Gelly in Apache Flink?
Flink’s API for graph processing.
What is FlinkML?
Flink’s machine learning library for scalable algorithms.
How do Flink's Streaming Dataflows differ from traditional batch processing?
They process data continuously as it arrives.
What is the Driver in a Spark Cluster?
The program that creates a SparkContext to coordinate task execution.
What is the role of the Cluster Manager in Spark?
Allocates resources across the cluster and manages task distribution.
What are Executors in Spark?
Processes on worker nodes that run application code.
What is the Driver Program in Spark?
The process that runs the main function of the application.
How does Apache Flink support multi-tenancy?
By managing resources and isolating jobs from different users.
What is the primary difference between Apache Storm and Apache Flink?
Storm focuses on unbounded stream processing; Flink supports both bounded and unbounded with advanced state management.
What is event-time processing in Apache Flink?
Processing events based on the time they occurred.
What is a DAG in the context of Apache Flink?
Directed Acyclic Graphs that represent the flow of data and transformations.
What are the main advantages of using Apache Flink for real-time analytics?
True stream processing, robust state management, support for complex event processing, and flexibility to handle data streams.