1/49
Statistics and Six Sigma
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
What is the difference between a population and a sample?
Population: A collection of samples you could take, entire set.
Sample: A subset of the population that is selected for analysis
In practice, which is better to use, a population or a sample?
In practice, using a sample is almost always better than using a population because it is more time-efficient, cost-effective, and practical for large groups
What is the definition of nominal, ordinal, interval, and ratio data? What is the difference?
Nominal - Categorical, no order, no numerical value. EX: Marital status, eye color, country of origin.
Ordinal - Categorical, meaningful order, but unequal spacing between values. EX: Race results (1st, 2nd, 3rd), customer satisfaction (poor, good, excellent).
EX: Interval - Quantitative, order, equal spacing between values, no true zero Temperature in Celsius or Fahrenheit, IQ scores.
EX: Ratio - Quantitative, order, equal spacing, and a true zero point Height, weight, age, time, Kelvin temperature.
What do you call terms like mean, median, and mode? What do they show you, and how do you calculate them?
They are measures of central tendency for approximately the middle.
Mean - Represents the average value. Add up all the values in the data set and divide by the number of values
Median - Shows the middle point of the data, which can be more useful than the mean for data with extreme values (outliers) because it is not affected by them. Middle number or average of two middle numbers.
Mode - Most common number in a dataset. A data set can have one mode, multiple modes, or no mode at all.
The 50th percentile represents what measure of central tendency?
Median
_% of the data falls between 3 standard deviations from the mean?
99.7%
What is the difference between mean and standard deviation?
The mean is a measure of the central tendency or average of a dataset, while the standard deviation is a measure of the dispersion or spread of the data around that average.
How do you calculate variance of a dataset?
Find the mean (μ).
Subtract the mean from each value → deviations.
Square each deviation.
Add the squared deviations.
Divide by N (total number of values)
Alternatively, it is simply the standard deviation squared.
Discrete vs Continuous Data
Discrete = countable, separate values
Number of students
Number of cars
Shoe sizes (even though they include decimals, they come in fixed steps like 7, 7.5, 8)
Continuous = measurable, any value in a range
Height
Weight
Temperature
Time
What does skewness mean? What is the difference between positive and negative skewness?
Skewness describes the side of the distribution where the few, extreme data points lie.
Positive = right skew
Negative = left skew
What does DFSS mean in DFSS-DMADV?
Designing for Six Sigma
What does CTQ mean?
Critical-to-Quality. It refers to the key product or service features that customers value most and are essential for a high-quality outcome, it must be controlled to guarantee that you deliver what the customer wants.
In Six Sigma, what does DMAIC stand for and what is it used for?
DMAIC = Define, Measure, Analyze, Improve, Control. It is used to improve utilization for existing processes.
What does GEMBA mean?
Going to the source/plan of action. In a supply chain context, Gemba refers to the physical locations where work is done and value is added, such as:
The factory floor or manufacturing line
What are common sources for collecting the voice of the customer?
Existing company information
Employees, customers, suppliers
Trade associations
Competitors
What are some methods for collecting the voice of the customer?
Observations
Interviews
Focus Groups (8-12 people)
Surveys
What is the difference between VOC and VOP?
VOC = Voice of Customer
External
Not under company control
Leads to specification limits
Specification limits are chosen by customer, they don’t have to actually say something
VOP = Voice of Process
Internal
Can usually be controlled
Leads to control limits
Control limits are determined statistically
Teacher shows up on time
What is the difference between a standard and benchmark?
Standard = Minimum requirement
comes from market or government
Benchmark = Aspirational/Voluntary goal
We want to achieve 1,000,000 sales
What does an affinity diagram do?
It groups all effects into one big effect.
All parking complaints are grouped into one group (parking, parking fees, availability of parking)
What does QFD stand for and what is it used for?
Quality Function Deployment is a structured planning and decision-making tool that translates customer needs and wants (“the voice of the customer”)
→ into specific product or process requirements. House of quality is usually used in DFSS.
What does DPMO stand for and what does it measure?
Defects per million opportunities. DPMO measures how many defects occur per one million opportunities for a defect to happen in a process.
It tells you:
How often errors occur
How good (or poor) a process is performing
How close a process is to meeting Six Sigma quality levels
D / (N * O) times 1,000,000
What does Gage R&R stand for and what is it used for?
Gage Repeatability and Reliability
If the same operator repeats then we would say it is repeatable
Reproducibility is when someone else repeats the prices
What is an operational definition?
The precise definition of the characteristic and how it is measured. It removes ambiguity and ensures all data collectors collect data in the same way.
What does a scatter chart measure?
One variable relative to another.
What are the 5 values which the box and whisker plot specifically measures?
Minimum, Q1, Median (or Q2), Q3, Maximum.
True or False: A scatter chart can be a combination of any 2 types of variables?
True, it can be both independent, both dependent, or one independent and one dependent.
The number _ represents no correlation between variables.
0
What does the correlation coefficient mean? It takes values from _ to _ ?
The correlation coefficient is a statistical measure that tells you how strongly two variables are related and in what direction. -1.0 to 1.0
The grey area is the box in the box and whisker plot, and it takes what % of the data?
50%
If two sets of data have the same range, do they also have the same standard deviation?
No, not necessarily.
What is the IQR formula? What is it used for?
IQR = Q3 - Q1. The interquartile range (IQR) is used to measure the spread or variability of the middle 50% of a dataset.
What are the formulas for the largest value and smallest value within the upper and lower limit, respectively?
Largest value within the upper limit:
Q3 + 1.5(Q3 - Q1)
Smallest value within the lower limit
Q1 - 1.5(Q3 - Q1)
WHat does RPN stand for and how is it measured?
Risk Priority Number and the formula:
= Severity * Occurrence * Detection
The scale for each individual value is from 1-10, and is more severe the higher the number
What is the range of possibilities for what the Risk Priority Number can be?
1 to 1000
You want to ____ severity, occurence, and ____ improve detectability.
Reduce, Improve.
Who introduced statistical process control (SPC) ?
Waller Shewhart
What is the difference between common cause variation and special cause variation?
Common Cause: Always present and you cannot get rid of it.
Ex: Small variations in your daily commute time (23 mins instead of 25 mins)
Special Cause: You can avoid it
Ex: Major car accident
What is a box and whisker plot?
A box and whisker plot is a graphical representation of a dataset's distribution using its five-number summary: minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. It visually displays the spread of the data, the central tendency, and potential outliers by showing the middle 50% of the data in the box and the lowest and highest values in the whiskers.
Median has 5 measurements, what are they?
Q2
5th decile
50th percentile
Middle value
Central value
What is type 1 and what is type 2 error?
Type 1:
Definition: Rejecting the null hypothesis (H₀) when it is actually true.
Meaning: You think there is an effect or difference, but really there isn’t.
Example: A company tests a new drug.
Null hypothesis (H₀): The drug has no effect. Type I error: You conclude the drug works, but in reality it doesn’t.
Type 2:
Definition: Failing to reject the null hypothesis when it is actually false.
Meaning: You miss a real effect — you think nothing is happening, but actually it is.
Example: Same drug test:
Null hypothesis (H₀): The drug has no effect. Type II error: You conclude the drug doesn’t work, but in reality it does.
True or False: Process Variation includes common cause variation and special cause variation?
True
In improving a process, we first focus on reducing ____
Variability
True or False: Six sigma was first introduced by General Electric (GE) ?
False. It was first introduced by Motorola.
What is the difference between an independent and dependent variable?
Independent = I control it
Dependent = Depends on the independent variable
Ex: Testing the effect of study hours on exam scores
Independent variable: Hours spent studying
Dependent variable: Exam scores (what you are measuring)
The primary purpose of six sigma is…
Reducing Defects
What is the minimum number of oriectors required?
None of the above; 2
In benchmarking, we select a similar ____ from any industry to follow
Process
True or False: A control chart plots metrics from process over time and provides a basis for comparing process performance.
True
What is the z score formula?
x - mean / standard deviation