QM457-CH6

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/11

flashcard set

Earn XP

Description and Tags

Flashcards for review of Big Data Processing Concepts lecture.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

12 Terms

1
New cards

Parallel Data Processing

Involves the simultaneous execution of multiple sub-tasks that collectively comprise a larger task to reduce execution time, typically within a single machine with multiple processors or cores.

2
New cards

Distributed Data Processing

Similar to parallel data processing but always achieved through physically separate machines networked together as a cluster, applying the divide-and-conquer principle.

3
New cards

Hadoop

An open-source framework for large-scale data storage and processing compatible with commodity hardware, serving as a de facto industry platform for Big Data solutions and implementing the MapReduce processing framework.

4
New cards

Batch Processing

Also known as offline processing, involves processing data in batches, usually imposing delays and resulting in high-latency responses, typically involving large quantities of data with sequential read/writes.

5
New cards

Transactional Processing

Also known as online processing, data is processed interactively without delay, resulting in low-latency responses, involving small amounts of data with random reads and writes.

6
New cards

Cluster

Provides the mechanism to enable distributed data processing with linear scalability, ideal for Big Data processing as large datasets can be divided and processed in parallel.

7
New cards

MapReduce

A widely used implementation of a batch processing framework that is highly scalable and reliable and is based on the principle of divide-and-conquer, which provides built-in fault tolerance and redundancy.

8
New cards

Task Parallelism

Parallelization of data processing by dividing a task into sub-tasks and running each sub-task on a separate processor, generally on a separate node in a cluster.

9
New cards

Data Parallelism

Parallelization of data processing by dividing a dataset into multiple datasets and processing each sub-dataset in parallel.

10
New cards

Realtime Mode (Big Data)

Data is processed in-memory as it is captured before being persisted to disk. Response time generally ranges from a sub-second to under a minute. Also called event or stream processing.

11
New cards

Speed, Consistency, Volume (SCV) Principle

States that a distributed data processing system can be designed to support only two of the following three requirements: Speed, Consistency, and Volume.

12
New cards

Event Stream Processing (ESP)

During ESP, an incoming stream of events, generally from a single source and ordered by time, is continuously analyzed. The analysis can occur via simple queries or the application of algorithms that are mostly formula-based.

Explore top flashcards