Chapter 7

studied byStudied by 1 person
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 70

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

71 Terms

1
Big data has been used to describe the ...
massive volumes of data analyzed by huge organizations like Google or research science projects at nasa
New cards
2
Big data exceeds the reach of commonly used ....
hardware environments and/or capabilities of software tools to capture, manage and process it within a tolerable time span for its user population
New cards
3
Big data has become a popular ...
term to describe the exponential growth, availability and use of information, both structure and unstructured
New cards
4
Big data is not new. What is new is ....
the definition and the structure of Big Data constantly change
New cards
5
The V's that define big data
(6)
Volume
Variety
Velocity
Veracity
Variability
Value proposition
New cards
6
The V's that define big data - Volume
most common trait of Big Data
New cards
7
The V's that define big data - Volume issues
how to determine relevance
how to create value from data that is deemed to be relevant
New cards
8
The V's that define big data - variety (2)
Comes in all...
x% of all ....
Comes in all types of formats
80 - 85% of all organizations' data are in some sort of unstructured or semistructured format
New cards
9
The V's that define big data - Velocity (2)
how x data is ...
most x ....
-how fast data is being produced and how fast the data must be processed to meet the need or demand
- most overlooked characteristic of Big data
New cards
10
Data stream analytics
in motion analytics
New cards
11
The V's that define big data - Veracity (3)
Refers to ...
Tools and techniques are often used to handle ...
Refers to conformity to facts : accuracy, quality, truthfulness, or trustworthiness of the data
Tools and techniques are often used to handle Big Data's veracity by transforming the data into quality and trustworthy insights
New cards
12
The V's that define big data - variability (2)
Data flows....
X peak data loads....
Data flows can be highly inconsistent with periodic peaks
Daily, seasonal, and event triggered peak data loads can be highly variable and this challenging to manage
New cards
13
The V's that define big data - Value proposition (3)
Contains more ...
Organization can ...
X insights and x decisions...
Orgs can gain greater business value that they may not have otherwise
Greater insight and better decision, something that every organization needs
New cards
14
Most critical success factor for Big data analytics (5)
  1. A clear business need (alignment with the vision and the strategy)

  2. Strong, committed sponsorship (executive champion)

  3. Alignment between the business and IT strategy

  4. A fact based decision making culture

  5. A strong data infrastructure

New cards
15
Most critical success factor for Big data analytics (5) - A clear business need (alignment with the vision and the strategy)

Main driver for big data analytics ...
should be the needs for the business, at any level- strategic, tactical, and operations
New cards
16
Most critical success factor for Big data analytics (5) - Strong, committed sponsorship (executive champion)

Sponsorship needs to be ....
Sponsorship needs to be at the highest levels and organization wide
New cards
17
Most critical success factor for Big data analytics (5) - Alignment between the business and IT strategy (2)

Essential to make sure that...
Analytics should play the....
-Essential to make sure that the analytics work is always support the business strategy

-Analytics should play the enabling role in successfully executing the business strategy
New cards
18
Most critical success factor for Big data analytics (5) - A fact based decision making culture

X drive decision making
The numbers not intuition or gut feelings driver decision making
New cards
19
Most critical success factor for Big data analytics (5) - A fact based decision making culture

To create a fact based decision making culture senior management needs to (5)
Recognize..
Be a ...
Stress that ...
Ask to see ....
Link....
Recognize that some people can't or won't adjust

Be a vocal supporter

Stress that outdated methods must be discontinued

Ask to see what analytics went into decisions

Link incentives and compensation to desired behaviors
New cards
20
Most critical success factor for Big data analytics (5) - A strong data infrastructure

Success requires ...
Success requires marrying the old with the new for a holistic infrastructure that works synergistically
New cards
21
High performance computing to keep up with computational needs of Big data (4)
In memory analytics
In database analytics
Grid computing
Appliance
New cards
22
High performance computing to keep up with computational needs of Big data (4) - In memory analytics
Solves complex problems in near real time with highly accurate insights by allowing analytics computations
New cards
23
High performance computing to keep up with computational needs of Big data (4) - In database analytics
Speeds time to insights and enables better data governance by performing data integration and anlytics functions inside the DB so you won't have to move or convert data repeatedly
New cards
24
High performance computing to keep up with computational needs of Big data (4) - Grid computing
Promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources
New cards
25
High performance computing to keep up with computational needs of Big data (4) - Appliances
Brings together hardware and software in a physical unit that is not only fast but also scalable on an as needed basis
New cards
26
Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6)
Data volume
Data integration
Processing capabilities
Data governance
Skills availability
Solution cost
New cards
27
Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Data volume
Ability to capture, store, and process a huge volume of data at an acceptable speed so latest info is available to decision makers
New cards
28
Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Data integration
Ability to combine data that is not similar in structure or source and to do so quickly and at a reasonable cost
New cards
29
Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Processing capabilities
Ability to process data quickly as it is captured
New cards
30
Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Data governance
Ability to keep up with the security, privacy, ownership and quality issues of Big data
New cards
31
Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - Skills availability
Shortage of people with skills to do job
New cards
32
Challenges that are found by business executives to have a significant impact on successful implementation of Big Data (6) - solution cost
Crucial to reduce the cost of the solutions used to find the value
New cards
33
Business problems addressed by big data analytics - Top business problems (2)
process efficiency
cost reduction
New cards
34
Problems that can be addressed using Big Data analytics (10)
-process efficiency and cost reduction
-Brand management
-Revenue maximization
-Enhance customer experience
-Churn identification, customer recruitiing
-Improved customer service
-Identifying new products, market opportunities
-Risk management
-Regulatory compliance
-Enhanced security capabilities
New cards
35
Big data technologies (3)
Mapreduce
Hadoop
NoSQL
New cards
36
MapReduce
technique popularized by Google that distributes the processing of vary large multi structured data files across a large cluster of machines
New cards
37
Map Reduce - High performance is ...
achieved by breaking the processing into small units of work that can be run in parallel across the hundreds, potentially thousands, of nodes in the cluster
New cards
38
Map Reduce is a X model ...
Map reduce is a programming model, not a programming language, that is, it is designed to be used by programmers, rather than business user.
New cards
39
Why use MapReduce? MapReduce aids organizations in ...
in processing and analyzing large volumes of multi structured data
New cards
40
The procedural nature of MapReduce makes it ...
easily understood by skilled programmers
New cards
41
Advantage of MapReduce
Develops do not have to be concerned implementing parallel computing- this is handled transparently by the system
New cards
42
Hadoop
an open source framework for processing, storing, and analyzing massive amounts of distributed, unstructured data
New cards
43
Hadoop designed to ...
handle petabytes and exabytes of data distributed over multiple nodes in parallel
New cards
44
Hadoop clusters run on ...
inexpensive commodity hardware so projects can scale out without breaking the bank
New cards
45
How does Hadoop work? A client accesses unstructured and semistructured data ...
from sources including log files, social media feeds, and internal data stores
New cards
46
How does Hadoop work? It breaks the data up into ...
"parts" which are then loaded into a file system made up of multiple nodes running on commodity
New cards
47
Hadoop distributed file system
adept at storing large volumes of unstructured and semistructured data as they do not required data to be organized into relational rows and columns
New cards
48
Hadoop technical components (5)
Hadoop distributed file system (HDFS)

Name Node

Secondary node

Job tracker

Slave nodes
New cards
49
Hadoop technical components (5) - Hadoop distributed file system (HDFS)
Default storage layer in any given Hadoop cluster
New cards
50
Hadoop technical components (5) - Name Node
provides the client info on where the cluster particular data is stored and if any nodes fail
New cards
51
Hadoop technical components (5) - Secondary node
periodically replicates and stores data from the name node
New cards
52
Hadoop technical components (5) - job tracker
initiates and coordinates Mapreduce jobs or the processing of the data
New cards
53
Hadoop technical components (5) - slaves nodes
store data and take direction to process it from job tracker
New cards
54
Hadoop Pros (3)
-Allows enterprises to...
-Enterprises no longer must ...
- x to get started with Hadoop
-Allows enterprises to process and analyze large volumes of unstructured and semistructured data

-Enterprises no longer must rely on sample data sets but can process and analyze all relevant data

- Inexpensive to get started with Hadoop
New cards
55

Hadoop cons (3) -are x and x -x and x Hadoop clusters and performing ... -a X of x developers

Immature and still developing

Implementing and managing Hadoop clusters and performing advanced analytics on large volumes of unstructured data require significant expertise, skill and training

A dearth of Hadoop developers and take advantage of complex Hadoop clusters
New cards
56
NoSQL
Process large volumes of multistructured data
New cards
57
NoSQL databases are ...
aimed for the most part, at serving up discrete data stored among large volumes of multi structured data to end user and automated Big Data applications
New cards
58
NoSQL capability is ...
sorely lacking from relational database technology, which simply can't maintain needed application performance levels at a Big Data scale
New cards
59
NoSQL of downside
trade ACID (atomicity, consistency, isolation, durability) compliance for performance and scalability
New cards
60
Use cases for Hadoop -Differentiators (2)
Hadoop is the repository and refinery for raw data.

Hadoop is a powerful, economical, and active archive.
New cards
61
Use cases for Hadoop -Differentiators (2) - Hadoop is the repository and refinery for raw data.
Capture all the data reliably and cost effectively

Hadoop refine raw data
New cards
62
Use cases for data warehousing (3)
Data warehouse performance

Integrating data that provides business values

Interactive BI tools
New cards
63
Coexistence of hadoop and data warehouse (5)
  1. Use hadoop for strong and archiving multistructure data

  2. Use hadoop for filter, transforming, and/or consolidating multistructured data

  3. Use hadoop to analyze large volumes of multistructured data and publish the analytical results

  4. Use a relational DBMS that provides MapReduce capabilities as an investigative computing platform

  5. Use a front end query tool to access and analyze data

New cards
64
How to succeed with Big Data (7)
  1. Simplify

  2. Coexist

  3. Visualize

  4. Empower

  5. Integrate 6.Govern

  6. Evangelize

New cards
65
Stream Analytics
Term commonly used for the analytics process of extracting actionable information from continuously flowing/streaming data.
New cards
66
Perpetual analytics
Evaluates every incoming observation against all prior observation, where there is no window size. Recognizing how the new observation relates to all prior observations enables the discovery of real time insight
New cards
67
Critical event processing
A method of capturing, tracking, and analyzing streams of data to detect events (out of normal happenings) of certain types that are worthy of the efforts
New cards
68
Critical event processing is a application of ...
stream analytics that combines data from multiple sources to infer events or patterns of interest either before they actually occur or as soon as they happen
New cards
69
Critical event processing goal
take rapid action to prevent these events from occurring or in the case of a short window of opportunity take full advantage within the allowed time
New cards
70
Data stream mining
the process of extracting novel patterns and knowledge structures from continuous, rapid data records.
New cards
71
Applications of stream analytics (7)
e-Commerce
Telecommunications
Law Enforcement and cybersecurity
Power industry
Financial services
Health sciences
Government
New cards
robot