1/24
Flashcards reviewing Google's core infrastructure technologies: MapReduce, Google File System (GFS), and BigTable.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Google's major software systems include MapReduce, GFS (Google File System), and __.
BigTable
MapReduce is a methodology for exploiting __ in computing clouds.
parallelism
In the context of search engines, MapReduce is used for building Google's Search Index and __.
Article clustering for Google News
Examples of modern Internet applications that require managing immense amounts of data quickly include Dish network click collection and __ data collection.
Tesla car usage
MapReduce solves the problems of parallelization, fault tolerance, I/O scheduling, and __ for the programmer.
Monitoring & Status updates
The Map/Reduce paradigm involves breaking records into segments, mapping to extract something of interest, grouping intermediate results, reducing to aggregate results, and generating __.
final output
In MapReduce, the __ manages the parallel execution and coordination of tasks automatically.
system
In a MapReduce computation, map tasks turn a chunk into a sequence of __.
key-value pairs
The Map function parses a document, extracts each word and uses each word as a __.
key
The Reduce function aggregates intermediate results by __.
key
The master controller knows how many __ tasks there will be.
Reduce
The Reduce function is generally __ and commutative.
associative
The Google File System (GFS) is designed for efficient, reliable access to data using large clusters of __.
commodity hardware
GFS supports automatic sharding of large files, automatic recovery from failures, and is optimized for __ access to huge files.
sequential
In GFS, files are divided into fixed-size chunks of __ megabytes.
64
In GFS, the __ server holds all metadata, like namespace, access control, and chunk locations.
Master
In GFS, data transfers happen directly between __ and chunkservers.
clients
GFS is optimized for __ files rather than rewrites.
appended
Bigtable is a compressed, high performance, proprietary data storage system built on top of __.
the Google File System
A table in Bigtable is sparse, distributed, persistent, multidimensional, and __.
sorted map
In Bigtable, data is treated as __.
uninterpreted strings
In Bigtable, rows are ordered __.
lexicographically
In Bigtable, each cell contains a unique __ version of the data for that row and column.
timestamped
In Bigtable, columns have two-level name structure consisting of family and __.
optional qualifier
In Bigtable, Timestamps are used to store __ versions of data in a cell.
different