1/52
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What are the four Vs of big data?
Data volume
Data velocity
Data variety
Data veracity
Who uses big data term and why?
Companies use this term to describe the massive amounts of data they now capture, store, and analyze
What is data volume?
amount of data created and stored by an organization
What is data velocity?
speed at which data is created and stores
What is data variety?
different forms data can take
What is data veracity?
quality or trustworthiness of data
What is an analytics mindset?
is a way of thinking that centers on the correct use of data and analysis for decision making
What is an analytics mindset include the ability to according to EY?
Ask the right questions
Extract, transform, and load relevant data
Apply appropriate data analytic techniques
Interpret and share the results with stakeholders
Who said this quote: “the significant problems we face today cannot be solved at the same level of thinking when we created them” ?
Albert Einstein
A good data analytic question helps establish…
SMART:
Specific
Measurable
Achievable
Relevant
Timely
What does specific mean in SMART?
needs to be direct and focused to produce a meaningful answer
in order to ask the right questions what should be followed?
SMART
What does measurable mean in SMART?
must be amendable to data analysis and thus the inputs to answering the question must be measurable with data
What does achievable mean in SMART?
should be able to be answered and the answer should cause a decision make to take an action
What does relevant mean in SMART?
should relate to the objectives of the organization or the situation under consideration
What does timely mean in SMART?
must have a defined time horizon answering
What is the ETL process and what does it stand for?
Extract, transform, and load relevant data and is the most time-consuming part of the analytics mindset process
What are the three steps to extracting data in the extraction process?
Understand the data needs and the data available
perform the data extraction
verify the data extraction quality and document what you have done
What is the different types of organization for data:
Structure data
semi-structured data
unstructured data
What is structured data?
highly organized (e.g. accounting data)
What is semi-structured data?
not sufficiently structured to be inserted into a database (e.g. CSV file)
What is unstructured data?
most publicly available (e.g. images, tweets, text files)
you cannot meaningfully analyze the data
dark data
What are three alternative structures?
Data warehouse
Data Mart
Data Lake
What is a data warehouse?
large database containing detail and summarized data for a number of years but not used in transaction processing but used for analysis
What is a data mart?
hold structured data for a subset
Ex: international company with data from different regions and you may want to organize data separately
What is a data lake?
Structured, Semi-structured, and unstructured data that is stored in a single location
THE BIGGEST ONE
What are the four steps to transforming data?
understand the data and the desired outcome (cleaning the data is extremely important)
standardize, structure, and clean the data
validate data quality and verify data meets data requirements
document the transformation process
What are important considerations when loading data?
The transformed data must be stored in a format and structure acceptable to the receiving software
Programs used for analysis may treat some data formats differently than expected. It is important to understand how the new program will interpret data formats.
What is important to do once data is successfully loaded into the new program?
update or create a new data dictionary
What are four categories of data analytics?
Descriptive analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive analytics
What is descriptive analytics?
info that results from the examination of data to understand the past; answers to the question “what happened?”
What is diagnostic analytics?
build on descriptive analytics and try to answer the question “why did this happen?”
What is predictive analytics?
are info that results from analyses that focus on predicting the future; answers the question “what might happen in the future?”
What is prescriptive analytics?
info that results from analyses to provide a recommendation of what should happen; answers question “what should be done?”
What is a common way people interpret results incorrectly?
relation to causation and correlation
What is a second common misinterpretation of results?
psychology research
What is correlation?
tells if two things happen at the same time
Ex: if wear a purple shirt and it rains does it mean that every time you wear a purple shirt it will rain and answer is no
What is causation?
tells that the occurrence of one thing will cause the occurrence of a second thing
Ex: if light something on fire in your house then it will cause it to smoke up
Why can two people disagree with the same data?
They interpret it differently
Example: hertz rental car co. decision between older stock of cars with higher maintenance costs or newer fleet of cars with less maintenance costs
What are good principles of visualization design include?
Selecting the right type of visualization
Presenting the data in a simplified manner
Emphasizing important aspects of the data
Representing the data in an ethical manner
What is automation?
the application of machines to automatically perform a task once performed by humans
What is Robotic process automation (RPA)?
computer software that can be programmed to automatically perform tasks across applications just as human workers do
How can RPA be used?
to automate ETL tasks
Data analytics is NOT always the right tool to reach the best outcome. T/F?
True
Data can help us make better decisions, but we need to remember the importance of
Intuition
Expertise
Ethics
Other sources of knowledge that are not easy to quantify but can have a significant impact on performance
An accounting firm is trying to understand if its external audit fees are appropriate. They compute a regression using public data from all companies in their industry to understand the factors associated with higher audit. What type of analytics is this an example of?
Diagnostic analytics
A self-driving car company uses artificial intelligence to help clean its historic social media data so they can analyze trends
Descriptive (AI is foil)
An airline downloads weather data for the past 10 years to help build a model that will estimate future fuel usage for flights.
Predictive
A shipyard company runs a computer simulation of how a tsunami would damage its shipyards, computing damages in terms of destruction and lost production time
Predictive
An online retail company tracks past customer purchases. Based on the amount customers previously spent, the program automatically computes purchase discounts for current customer purchases to build loyalty.
Prescriptive
An all-you-can-eat restaurant uses automated conveyer belts to bring cold food to the chefs for preparation. The conveyer belts bring the food to the chefs based on algorithms that monitor the number of people entering and leaving the restaurant
Prescriptive
A large manufacturer of farm equipment continuously analyzes data sent from engine sensors to understand how load, temperature, and other factors influence engine failure.
Diagnostic
A small tax services business provides its financial statements to a bank to get a loan so it can buy a new building to grow its business.
Descriptive