chapter 9

studied byStudied by 0 people
5.0(1)
Get a hint
Hint

What are the primary source types for data acquisition in big data systems?

1 / 65

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

66 Terms

1

What are the primary source types for data acquisition in big data systems?

Publishing big data in batches, microbatches, or streaming real-time data.

New cards
2

How is velocity defined in the context of data acquisition?

The speed at which data is generated and how frequently it is produced.

New cards
3

What characterizes high velocity data?

Real-time or streaming data generated and processed continuously.

New cards
4

What are the two main ingestion mechanisms for data?

Push and pull mechanisms, driven by the data consumer.

New cards
5

What is Kafka?

A high-throughput distributed messaging system used for building real-time data pipelines and streaming applications.

New cards
6

In Kafka, what is a broker?

A server that manages topics, handles persistence, partitions, and replicates data.

New cards
7

What is a topic in Kafka?

A stream of messages of a particular type, similar to tables in databases.

New cards
8

How does Kafka store messages?

On disk using partitioned commit logs.

New cards
9

What roles do producers and consumers play in Kafka?

Producers publish messages to topics, while consumers subscribe to topics and process messages.

New cards
10

What is a partition in Kafka?

A division of a Kafka topic that allows messages to be consumed in parallel, maintaining an ordered and immutable sequence of messages.

New cards
11

How does Kafka achieve parallel consumption of messages?

By dividing topics into multiple partitions and allowing multiple consumers to read from different partitions simultaneously.

New cards
12

What is the role of a server leader in Kafka?

It handles read and write operations for a partition.

New cards
13

What are replicas in Kafka?

Followers that replicate the data of the leader to ensure fault tolerance.

New cards
14

Describe Kafka's publish-subscribe messaging framework.

Producers publish messages to topics, and consumers subscribe to those topics to receive messages.

New cards
15

What is an offset in Kafka?

A unique sequence ID assigned to each message within a partition.

New cards
16

What is a consumer group in Kafka?

A group of consumers that work together to consume messages from one or more topics, ensuring each message is processed by only one consumer in the group.

New cards
17

How does Kafka handle log storage?

Messages are stored in append-only, ordered, and immutable logs.

New cards
18

What is log compaction in Kafka?

A process to clean out obsolete records by retaining only the latest message for each key within a log segment.

New cards
19

What are log segments in Kafka?

Portions of a topic partition's log stored as directories of segment files.

New cards
20

What determines when log segments are deleted in Kafka?

Reaching size or time limits as defined by the delete policy.

New cards
21

What is Amazon Kinesis?

A managed service for ingesting, processing, and analyzing real-time streaming data on AWS.

New cards
22

What are Kinesis Data Streams?

Services that allow ingestion and processing of streaming data in real-time.

New cards
23

What are Firehose Delivery Streams in Kinesis?

Services that collect, transform, and load ETL streaming data into destinations like S3, Redshift, and Splunk.

New cards
24

What does Kinesis Analytics do?

Runs continuous SQL queries on streaming data from Kinesis Data Streams and Firehose Delivery Streams.

New cards
25

What are Kinesis Video Streams used for?

Streaming live video from devices to the AWS cloud for real-time video processing and batch-oriented analytics.

New cards
26

What is AWS IoT?

A service for collecting and managing data from Internet of Things (IoT) devices.

New cards
27

What is the Device Gateway in AWS IoT?

Enables IoT devices to securely communicate with AWS IoT.

New cards
28

What is the Device Registry in AWS IoT?

Maintains resources and information associated with each IoT device.

New cards
29

What is a Device Shadow in AWS IoT?

Maintains the state of a device as a JSON document, allowing applications to interact with devices even when they are offline.

New cards
30

What does the Rules Engine in AWS IoT do?

Defines rules for processing incoming messages from devices.

New cards
31

What is Apache Flume?

A distributed system for collecting, aggregating, and moving large amounts of data from various sources to a centralized data store.

New cards
32

What is a checkpoint file in Flume?

Keeps track of the last committed transactions, acting as a snapshot for data reliability.

New cards
33

What are the main components of Flume Architecture?

Source, Channel, Sink, and Agent.

New cards
34

What is a Source in Flume?

The component that receives or polls data from external sources.

New cards
35

What is a Channel in Flume?

Transmits data from the source to the sink.

New cards
36

What is a Sink in Flume?

Drains data from the channel to the final data store.

New cards
37

What is an Agent in Flume?

A collection of sources, channels, and sinks that moves data from external sources to destinations.

New cards
38

What is an Event in Flume?

A unit of data flow, consisting of a payload and optional attributes.

New cards
39

Name the types of Flume Channels.

Memory channel, File channel, JDBC channel, and Spillable Memory channel.

New cards
40

What is a Memory Channel in Flume?

Stores events in memory for fast access.

New cards
41

What is a File Channel in Flume?

Stores events in files on the local filesystem for durability.

New cards
42

What is a JDBC Channel in Flume?

Stores events in an embedded Derby database for durable storage.

New cards
43

What is a Spillable Memory Channel in Flume?

Stores events in an in-memory queue and spills to disk when the queue is full.

New cards
44

What is Apache Sqoop?

A tool for importing data from relational databases into Hadoop Distributed File System (HDFS), Hive, or HBase, and exporting data back to RDBMS.

New cards
45

How does Sqoop import data?

By launching multiple map tasks to transfer data as delimited text files, binary Avro files, or Hadoop sequence files.

New cards
46

What are Hadoop Sequence Files?

A binary file format specific to Hadoop for storing sequences of key-value pairs.

New cards
47

What is Apache Avro?

A serialization framework that provides rich data structures, a compact binary data format, and container files for data storage and processing.

New cards
48

What is RabbitMQ?

A messaging queue that implements the Advanced Message Queuing Protocol (AMQP) for exchanging messages between systems.

New cards
49

What is the Advanced Message Queuing Protocol (AMQP)?

A protocol that defines the exchange of messages between systems, specifying roles like producers, consumers, and brokers.

New cards
50

In RabbitMQ, what are producers and consumers?

Producers publish messages to exchanges, and consumers receive messages from queues based on bindings and routing rules.

New cards
51

What is ZeroMQ?

A high-performance messaging library that provides tools to build custom messaging systems without requiring a message broker.

New cards
52

What messaging patterns does ZeroMQ support?

Request-Reply, Publish-Subscribe, Push-Pull, and Exclusive Pair.

New cards
53

What is RestMQ?

A message queue based on a simple JSON-based protocol using HTTP as the transport, organized as REST resources.

New cards
54

How do producers interact with RestMQ?

By making HTTP POST requests with data payloads to publish messages to queues.

New cards
55

What is Amazon SQS?

A scalable and reliable hosted queue service that stores messages for distributed applications.

New cards
56

What are the two types of queues in Amazon SQS?

Standard queues and FIFO (First-In-First-Out) queues.

New cards
57

What are the characteristics of Standard Queues in Amazon SQS?

Guarantees message delivery but not order. Supports unlimited transactions per second. Operates on an at-least-once delivery model, occasionally delivering duplicate messages.

New cards
58

What are the characteristics of FIFO Queues in Amazon SQS?

Ensures messages are received in the exact order they were sent. Supports up to 3,000 messages per second with batching or 300 messages per second without batching. Provides exactly-once processing.

New cards
59

What are Connectors in the context of messaging systems?

Interfaces that allow data to be published to and consumed from messaging queues, often exposing REST web services or other protocols.

New cards
60

How does a REST-based Connector work?

Producers publish data using HTTP POST requests with data payloads, and the connector processes the requests and stores the data to the sink.

New cards
61

What is a WebSocket-based Connector?

A connector that uses full-duplex communication, allowing continuous data exchange without setting up new connections for each message.

New cards
62

What is an MQTT-based Connector?

A lightweight, publish-subscribe messaging protocol designed for constrained devices, suitable for IoT applications.

New cards
63

In MQTT-based systems, what are the main entities?

Publisher, Broker/Server, and Subscriber.

New cards
64

What is the role of a Publisher in MQTT?

Publishes data to topics managed by the broker.

New cards
65

What does the Broker/Server do in MQTT?

Manages topics and forwards published data to subscribed subscribers.

New cards
66

What is the role of a Subscriber in MQTT?

Receives data from topics to which it has subscribed.

New cards

Explore top notes

note Note
studied byStudied by 82 people
... ago
5.0(1)
note Note
studied byStudied by 8 people
... ago
5.0(1)
note Note
studied byStudied by 22 people
... ago
5.0(1)
note Note
studied byStudied by 14 people
... ago
5.0(2)
note Note
studied byStudied by 5 people
... ago
5.0(1)
note Note
studied byStudied by 2 people
... ago
5.0(1)
note Note
studied byStudied by 18 people
... ago
5.0(1)
note Note
studied byStudied by 12 people
... ago
5.0(1)

Explore top flashcards

flashcards Flashcard (134)
studied byStudied by 8 people
... ago
5.0(1)
flashcards Flashcard (156)
studied byStudied by 12 people
... ago
5.0(1)
flashcards Flashcard (120)
studied byStudied by 2 people
... ago
5.0(1)
flashcards Flashcard (119)
studied byStudied by 42 people
... ago
4.5(2)
flashcards Flashcard (92)
studied byStudied by 4 people
... ago
5.0(1)
flashcards Flashcard (93)
studied byStudied by 40 people
... ago
5.0(1)
flashcards Flashcard (32)
studied byStudied by 22 people
... ago
5.0(3)
flashcards Flashcard (100)
studied byStudied by 82 people
... ago
5.0(4)
robot