1/19
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
big data
describes the explosion of data generation, storage and usage in recent years
Seven V's of big data
1. value
2. volume
3. variety
4. velocity
5. veracity
6. variability
7. volatility
volume
the amount of data collected and measured in increasing orders of magnitude (gigabytes, terabytes, petabytes)
variety
refers to both the source and the form of data
velocity
the speed at which data are generated by and collected from source systems
- how quickly data can be processed so as to provide a feedback loop
variability
the changes in the meaning of data over time or in context
veracity
the reliability or truthfulness of data
volatility
describes the lifespan of data
value
providing insights or support for decisions
- the final and driving force of big data
Drivers of big data
1. the world becoming increasingly digital
2. the world is becoming more connected
3. electronics around the work are becoming more economical
4. the digital world have revolutionized communication and community
Apache Hadoop
an open source software framework that supports distributed computing for very large datasets
MapReduce
a software programming model for processing large datasets
Map and reduce
the input dataset is split into independent chunks that are processed by the "map tasks"
- aka map-shuffle or map sort-reduce
map tasks
assign the data chunks to computer nodes
SAP HANA
a leading-edge technology that stores all relevant data in random access memory rather than a hard drive
In-memory databases
utilize several innovations in conjunction with RAM to achieve incredible improvements in database operations
- IMDB
IMDB Innovations
1. Data are stored in memory
2. columnar data storage
3. Indexing
4. Data compression
5. Parallel data Processing
6. Partitioning data
Real-time Analytics
involves processing big data almost instantaneously to provide feedback as quickly as possible
Complete syntax for SELECT statement
SELECT [column list]
FROM [Tablelist]
WHERE conditionlist
GROUP BY columnlist
HAVING conditionlist
ORDER BY column list ASC DESC