QM457-CH6

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/11

Earn XP

Description and Tags

Flashcards for review of Big Data Processing Concepts lecture.

Information and Communication Technology

A-Level Business

AQA

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

12 Terms

New cards

Parallel Data Processing

Involves the simultaneous execution of multiple sub-tasks that collectively comprise a larger task to reduce execution time, typically within a single machine with multiple processors or cores.

New cards

Distributed Data Processing

Similar to parallel data processing but always achieved through physically separate machines networked together as a cluster, applying the divide-and-conquer principle.

New cards

Hadoop

An open-source framework for large-scale data storage and processing compatible with commodity hardware, serving as a de facto industry platform for Big Data solutions and implementing the MapReduce processing framework.

New cards

Batch Processing

Also known as offline processing, involves processing data in batches, usually imposing delays and resulting in high-latency responses, typically involving large quantities of data with sequential read/writes.

New cards

Transactional Processing

Also known as online processing, data is processed interactively without delay, resulting in low-latency responses, involving small amounts of data with random reads and writes.

New cards

Cluster

Provides the mechanism to enable distributed data processing with linear scalability, ideal for Big Data processing as large datasets can be divided and processed in parallel.

New cards

MapReduce

A widely used implementation of a batch processing framework that is highly scalable and reliable and is based on the principle of divide-and-conquer, which provides built-in fault tolerance and redundancy.

New cards

Task Parallelism

Parallelization of data processing by dividing a task into sub-tasks and running each sub-task on a separate processor, generally on a separate node in a cluster.

New cards

Data Parallelism

Parallelization of data processing by dividing a dataset into multiple datasets and processing each sub-dataset in parallel.

New cards

Realtime Mode (Big Data)

Data is processed in-memory as it is captured before being persisted to disk. Response time generally ranges from a sub-second to under a minute. Also called event or stream processing.

New cards

Speed, Consistency, Volume (SCV) Principle

States that a distributed data processing system can be designed to support only two of the following three requirements: Speed, Consistency, and Volume.

New cards

Event Stream Processing (ESP)

During ESP, an incoming stream of events, generally from a single source and ordered by time, is continuously analyzed. The analysis can occur via simple queries or the application of algorithms that are mostly formula-based.