1/95
Flashcards based on the C207 Master Story Guide, covering essential definitions and keywords for Data-Driven Decision Making.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai | Chat |
|---|
No analytics yet
Send a link to your students to track their progress
Big Data
Data so large and complex that it is difficult to process using traditional database and spreadsheet tools, often including both structured and unstructured data.
Big Data Warehouse
A storage environment for Big Data often utilizing cloud storage, third-party storage, or multiple servers.
Data Mining
The process of discovering useful patterns in large datasets.
Organizations' Motivation for Data Collection
To transform data into useful information to make better business decisions and encourage specific buying behavior.
Structured Data
Data in a fixed, preformatted format that is easy to classify, count, or filter, such as multiple choice survey responses or check boxes.
Unstructured Data
Data without a fixed format that often requires interpretation or theme analysis, such as medical notes, emails, or free-text reviews.
Quantitative Data
Numerical data that can be counted or measured, such as price, revenue, or temperature.
Qualitative Data
Categorical data that describes qualities, labels, or categories, such as car color or vehicle type.
Discrete Data
Quantitative data that uses whole-number counts where decimals are not possible, such as the number of children in a household.
Continuous Data
Quantitative data that is measured and can include decimals or intervals, such as height, weight, or distance.
Nominal Level
Categorical data with no natural order or sequence, such as colors or yes/no responses.
Ordinal Level
Categorical data with a meaningful sequence or ranking, but unequal intervals between categories, such as economy, business, and first class.
Interval Level
Numerical data where zero is a placeholder on a scale and does not mean absence, such as temperature in Fahrenheit.
Ratio Level
Numerical data where zero means the absolute absence of what is being measured, such as money, price, or distance.
Reliability
The consistency and repeatability of a measurement instrument's results.
Validity
The extent to which data measures the intended concept or target.
Data Quality
The process of cleaning data to ensure it does not contain mistakes, missing values, or impossible entries before analysis.
Out-of-Range Error
A value in a dataset that is impossible or suspicious because it falls outside the expected range, such as a car listed at 188MPG.
Omission Error
A missing value or blank field within a record in a dataset.
Systematic Error
A consistent, repeated bias that pushes results in the same wrong direction and must be fixed.
Random Error
Error caused by chance or temporary noise that may average out over time.
Observational Study
A study where researchers collect information without applying a treatment to the subjects.
Cohort Study
An observational study focused on a specific group that shares a characteristic, place, or time frame.
Experimental Study
A study where a treatment is applied to a unit to examine the effect on a response.
Blind Study
An experimental setup where participants do not know which treatment they are receiving.
Double-Blind Study
An experimental setup where neither the participant nor the researcher knows which treatment was assigned.
Triple-Blind Study
An experimental setup where the participant, researcher, and data analyst are all unaware of the treatment assignments.
Faulty Operationalization
A flaw in research design where a variable or concept is not clearly defined or is measured with an incorrect target.
Measurement Bias
Bias introduced by the method of sample selection or how data collection is performed.
Information Bias
Bias resulting from inaccurate or distorted information provided by respondents or records after data collection has started.
Response Bias
Bias caused by the presence of an interviewer or the pressure felt by a respondent to answer in a certain way.
Conscious Bias
Bias introduced through the use of leading or persuasive wording in a question.
Association vs. Causation
The principle that two variables moving together (association) does not prove that one directly causes the other (causation).
Alpha Level (Significance Level)
The cutoff used to decide statistical significance, which in this course is set at .05.
p-value rule for Significance
If p<.05, the result is significant and the null hypothesis is rejected; if p>.05, the result is not significant and the null hypothesis is accepted.
Null Hypothesis
A statement asserting that there is no significant difference or no significant relationship between variables.
Chi-Square Analysis
A statistical test used to compare frequency counts or categories for nominal data.
T-Test
A statistical test used to compare the means or averages of exactly two groups.
ANOVA (Analysis of Variance)
A statistical test used to compare the means or averages of three or more groups.
Independent Variable (X)
The predictor or input variable used in a regression model to predict an outcome.
Dependent Variable (Y)
The outcome or response variable being predicted in a regression model.
Linear Regression
A statistical tool that uses one independent variable to predict one numeric dependent variable.
Multiple Regression
A statistical tool that uses two or more independent variables to predict a single dependent variable.
Logistic Regression
A statistical tool used to predict a dependent variable that is binary or nominal, such as yes/no or pass/fail.
R-Squared
A measure of the goodness of fit for a regression model, indicating how much variation in Y is explained by X, with values closer to 1 being stronger.
Homoscedasticity
A condition where the scatter plot shows consistent variation or spread of data points around the trend line.
Heteroscedasticity
A condition where the scatter plot shows changing or unequal variation, often forming an ice-cream-cone shape.
Decision Tree Analysis
A tool used to choose between alternatives under risky or uncertain conditions by identifying the highest expected value.
Expected Value
A weighted outcome value calculated by combining payoffs and their respective probabilities.
Linear Programming
A mathematical method used to find an optimal solution (maximize or minimize) while meeting specific constraints.
Break-Even Analysis
A method used to determine the point where total revenue equals total cost and profit begins.
Cross-Over Analysis
A method used to compare cost-per-volume between alternatives to find the best option based on variable usage.
Cluster Analysis
A method used to group similar observations or customers together, often used for market segmentation.
Monte Carlo Simulation
A simulation technique that uses many random outcomes to model uncertainty and forecast possible results.
Probability
The chance an event occurs, calculated as favorable outcomes divided by total opportunities.
Complement
The probability that an event does not occur, calculated as 1−P(A).
Intersection
The probability that two events happen together (AND), often requiring multiplication.
Union
The probability that one event or the other occurs (OR), calculated by adding the individual probabilities and subtracting any overlap.
Conditional Probability (Bayes Theorem)
The probability of an event occurring given that another event is already known to have happened.
Combination
A counting technique used to determine how many possible groups can be formed from a set of items where order does not matter.
Mean
The arithmetic average calculated by adding all values and dividing by the count; it is sensitive to outliers.
Median
The middle value in a sorted dataset, representing the 50th percentile.
Mode
The value that appears most frequently in a dataset.
Standard Deviation
The measure of data spread or volatility, calculated as the square root of high variance.
Empirical Rule
A rule for normal distributions stating that approximately 68%, 95%, and 99.7% of data falls within 1, 2, and 3 standard deviations of the mean.
Z-Score
A standardized score that indicates how many standard deviations a specific value is above or below the mean.
Interquartile Range (IQR)
The distance between the third quartile (Q3) and the first quartile (Q1), representing the middle 50% of data.
Box Plot
A visual summary showing the five-number summary: minimum, Q1, median, Q3, and maximum, along with potential outliers.
PDCA (Deming Cycle)
A four-stage systematic cycle for solving quality problems: Plan (investigate), Do (test/pilot), Check (measure), and Act (standardize).
SIPOC
A high-level process map identifying the Supplier, Input, Process, Output, and Customer.
Quality Assurance (QA)
Proactive activities focused on preventing defects before they occur, such as employee training.
Quality Control (QC)
Reactive activities focused on identifying and fixing defects after they occur, such as inspection and repair.
Run Chart
A line graph showing performance or data points over a specific period of time.
Control Chart
A run chart that includes upper and lower control limits to determine if process variation is within an acceptable range.
Common Cause Variation
Normal, routine noise or variation that occurs within expected control limits.
Special Cause Variation
Unusual or outlier variation that falls outside of expected control limits.
Cause and Effect Diagram
Also known as a fishbone or Ishikawa diagram; used to brainstorm the 'why' behind a problem.
Flow Chart
A diagram showing the step-by-step sequence of a process to identify 'where' a failure might be occurring.
Check Sheet
A simple tool used to collect and tally frequency data.
Histogram
A chart showing the distribution of numerical data across specific ranges or bins.
Pareto Chart
A bar graph that ranks categories from highest to lowest frequency to help prioritize problem-solving efforts.
Scatter Diagram
A visual tool used to show the relationship between two variables using dots on an X-Y chart.
Lean
A quality management program focused primarily on the reduction of waste and improvement of efficiency.
Six Sigma
A quality management program focused on reducing process variation and defects.
Just-in-Time (JIT)
An operational approach focused on reducing inventory levels by receiving materials only when needed.
Results-Based Management (RBM)
An ongoing monitoring framework focused on achieving intended results, outcomes, and long-term impact, often used in nonprofits.
Index Number
A comparison of a current value relative to a base period, often expressed as basecurrent×100.
Incidence
A metric representing the number of new cases of a disease or event within a specific time.
Prevalence
A metric representing the total number of existing cases in a population at a specific point in time.
Observed Score
A score calculated as the sum of the True Score plus any random and systematic error.
Criterion-Referenced Score
A score compared against a fixed standard or 'cut score' rather than against other individuals.
Key Performance Indicator (KPI)
A single, specific metric used to measure progress toward an organizational goal.
SMART Goal
A goal that is Specific, Measurable, Achievable, Relevant, and Time-bound.
Dashboard
A visual display that allows managers to monitor multiple important metrics and KPIs at a glance in one place.
Balanced Scorecard
A strategic framework viewing performance from four perspectives: Financial, Customer, Internal Process, and Learning and Growth.
Net Promoter Score (NPS)
A metric used to measure customer loyalty based on their willingness to recommend a company to others.