1/47
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Database
Collection of data stored in computer system
Data life cycle
Plan → capture → manage → analyze → archive → destroy
Plan
What data do we need? How will it be managed? Who’s responsible for it? What are the optimal outcomes?
Capture
Collecting data from variety of sources and brought into the organization
Manage
Where to store data? What tools to keep it secure? Actions needed for proper maintenance?
Analyze
Data is used to solve problems, make decisions, support business goals
Archive
Storing data in a place where it’s available, but may not be used again
Destroy
Important for protecting company’s private information and private data about customers
Steps of data analysis
Ask→ Prepare → Process → Analyze → Share → Act
Ask
Define problem and make sure we understand stakeholder expectations.
Defining problem involves looking at current state and identify how it’s different from the ideal state.
Who are the stakeholders? Maintain strong communication with stakeholders.
Stakeholder
People who help make decisions, influence actions and strategies, and have specific goals they want to meet.
Prepare
Collect and store data that will be used for analysis process.
Process
Find and eliminate errors/inaccuracies that can get in the way of results
Cleaning data, transforming it into more useful format, combining datasets, removing outliers
Fix typos, inconsistencies, or missing/inaccurate data
Veryfing and sharing data cleansing with stakeholders
Analyze
Using tools to transform/organize info to make useful conclusions, make predictions, and drive informed decision-making
Share
Interpreting results and sharing them with others to help stakeholders make effective data-driven decisions.
Data visualization is key
Act
Business taking all insights you have provided and uses them to solve the original business problem.
Formula
Set of instructions that performs a specific calculation using the data in a spreadsheet.
Function
Preset command that automatically performs a specific process or task using the data in a spreadsheet.
Query language
Programming language that allows you to retrieve and manipulate data from a database.
Database
A collection fo data stored in a computer system.
Query
Request for data/info from a database
Issue
Topic/subject to investigate
Business task
Question/problem data analysis answers for a business
Fairness
Ensuring that your analysis doesn’t create or reinforce bias
Structured thinking
Process of recognizing the current problem or situation, organizing available info, revealing gaps/opportunities, and identifying options
Making predictions problem type
Using data to make informed decision about how things may be in future
Categorizing things problem type
Assigning info to different groups or clusters based on common features
Spotting something unusual problem type
Identifying data that’s different from norm
Identifying themes problem type
Grouping categorized info into broader concepts
Discovering connections problem type
Finding similar challenges faced by different entities and combining data and insights to address them
Finding patterns problem type
Using historical data to understand what happened in the past and is therefore likely to happen again
Closed-ended questions
Only answered with yes or no, doesn’t really provide useful insights
SMART questions
Specific - simple, significant, focused on single topic or a few closely related ideas
Measurable - can be quantified and assessed
Action-oriented - encourage change
Relevant - matter, important, have significance to the problem you’re solving
Time-bound - specify the time to be studied
Data-inspired decision-making
Explores different data sources to find out what they have in common
Report
Static collection of data given to stakeholders periodically
Pros:
High-level historical data
Easy to design/use
Pre-cleaned and sorted data
Cons:
Continual maintenance
Less visually appealing
Static
Dashboard
Monitors live incoming data
Pros:
Dynamic, automatic, interactive
More stakeholder access
Low maintenance
More visually appealing
Cons:
Labor-intensive design
Can be confusing
Long time to fix bugs
Potentially uncleaned data
Pivot table
Data summarization tool used in data processing, used to summarize, sort, reorganize, group, count, total, or average data stored in database
Metric
Single, quantifiable type of data that can be used for measurement
Can help calculate customer retention rates
Metric goal
Measurable goal set by company and evaluated using metrics
Mathematical thinking
Looking at problem and logically breaking it down step-by-step so you can see the relationship of patterns in data, using that to analyze the problem
Small data
Specific
Short time period
Day-to-day decisions
Ex:) How much water you drink a day
Big data
Large and less specific
Long time period
Usually need to be broken down
Big decisions
Operator
Symbol that names type of operation or calculation to be performed
Cell reference
A cell or range of cells in a worksheet that can be used in a formula
RowNum like A1
Common errors
#ERROR! - Formula can’t be interpreted as input (parsing error)
#N/A - data in formula can’t be found
#NAME? - formula/function name isn’t understood
#NUM! - formula/function can’t be performed as specified
#VALUE! - general error that could indicate problem with formula or referenced cells
#REF! - formula is referencing a cell that is no longer value or has been deleted
Problem domain
Specific area of analysis that encompasses every activity affecting or affected by the problem
Scope of work (SOW)
An agreed-upon outline of the work you’re going to perform on a project
Before communicating…
Who is my audience?
What do they already know?
What do they need to know?
How can I communicate that effectively to them?