map and reduce

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/12

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

13 Terms

1
New cards

spark cluster

runs a driver programme & a set of executor programms

2
New cards

driver

maintatin spark application
respond to users programscheduling work on executors

3
New cards

executor

process data assigned by driver\\\\\\\\\\\\\\\\\\\\\\\\\\\\

read and write data to external sources
store computation
interact with storage

4
New cards

RDD

resilient: each rdd is stored on many nodes
distributed: different rdd parts stored on different nodes (runs parallel)immutable

5
New cards

DAG

done on rdd functions
parallel
splits equation

6
New cards

DAG lazy evaluation

optimise execution plan, records and combines transformation
only evaluated when called

7
New cards

job scheduler

coordinates execution for nodes

8
New cards

rdd operations

transformation:
applied to rdd to make new rdd
e.g. map makes a new rdd by applying function to elements

actions:
creates result from rdd and returns or stores a value

9
New cards

map

applies function to elements in rdd and makes a new rdd with results

10
New cards

flatmap

applies function to rdd elements and makes results into 1 collection

11
New cards

narrow transformation

each partition of parent rdd is used by at most 1 child rdd

12
New cards

wide transformation

multiple child rdd partition may rely on 1 parent rdd partition

13
New cards

dataframe

distributed data stored in colums
rows are facts: ‘john doe’
columns are properties: name