QM457-CH6

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/11

flashcard set

Earn XP

Description and Tags

Flashcards for review of Big Data Processing Concepts lecture.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

12 Terms

1
New cards

Parallel Data Processing

Involves the simultaneous execution of multiple sub-tasks that collectively comprise a larger task to reduce execution time, typically within a single machine with multiple processors or cores.

2
New cards

Distributed Data Processing

Similar to parallel data processing but always achieved through physically separate machines networked together as a cluster, applying the divide-and-conquer principle.

3
New cards

Hadoop

An open-source framework for large-scale data storage and processing compatible with commodity hardware, serving as a de facto industry platform for Big Data solutions and implementing the MapReduce processing framework.

4
New cards

Batch Processing

Also known as offline processing, involves processing data in batches, usually imposing delays and resulting in high-latency responses, typically involving large quantities of data with sequential read/writes.

5
New cards

Transactional Processing

Also known as online processing, data is processed interactively without delay, resulting in low-latency responses, involving small amounts of data with random reads and writes.

6
New cards

Cluster

Provides the mechanism to enable distributed data processing with linear scalability, ideal for Big Data processing as large datasets can be divided and processed in parallel.

7
New cards

MapReduce

A widely used implementation of a batch processing framework that is highly scalable and reliable and is based on the principle of divide-and-conquer, which provides built-in fault tolerance and redundancy.

8
New cards

Task Parallelism

Parallelization of data processing by dividing a task into sub-tasks and running each sub-task on a separate processor, generally on a separate node in a cluster.

9
New cards

Data Parallelism

Parallelization of data processing by dividing a dataset into multiple datasets and processing each sub-dataset in parallel.

10
New cards

Realtime Mode (Big Data)

Data is processed in-memory as it is captured before being persisted to disk. Response time generally ranges from a sub-second to under a minute. Also called event or stream processing.

11
New cards

Speed, Consistency, Volume (SCV) Principle

States that a distributed data processing system can be designed to support only two of the following three requirements: Speed, Consistency, and Volume.

12
New cards

Event Stream Processing (ESP)

During ESP, an incoming stream of events, generally from a single source and ordered by time, is continuously analyzed. The analysis can occur via simple queries or the application of algorithms that are mostly formula-based.