business analytics
use of data and information to gain insight and knowledge and aid business decision maker
mean
summing up all variable values/number of entries
median
use instead of mean when extreme values exist
mode
most frequently observed value
percentile
at least “p” percent of the observations are less than or equal to the value of the “pth” percentile and (100-p)% > to this value
quartiles
1st, 2nd, 3rd quartile: 25th, 50th, and 75th percentile
range
difference between maximum and minimum values observed
interquartile range
middle 50% of the data = Q3-Q1
reference lines
can show average, median, and quartiles, constant line: very helpful when distinguishing between categories comparing using target value (numerical)
covariance
measure of how much two random variables change together, indicates positive or negative direction
skewness
relative to extreme values among observations
positive skew
most values smaller and extreme values skew right (goes downward)
negative skew
most values higher and extreme values skew left (going upward)
histogram
we use this to represent the shape of distribution
box and whisker plot
we use this to represent variability
numerical variables
can display all of the numerical measures described above: quartiles, standard deviation, covariance
categorical variables
we can use different segmentation and comparison across the category to examine relationships: days of week, quality ratings, segments of business
crosstabulation
displays the relationship between two variables in a table: frequency, proportions/marginals, comparisons using numerical variable, visualizations
stacked bar chart
unique way (similar to pie chart) to display crosstabulation across multiple categories
variables
service or product characteristics that can be measured, such as weight, length, volume, or time
attributes
service or product characteristics that can be quickly counted for acceptable performance
decrease variation
larger sample sizes tend to __________ ___________
common causes
are random, unidentifiable sources of variation that are unavoidable with the current process
assignable causes
any variation causing factors that we can identify and eliminate
statistical control
a process is in _________ ___________ when the location, spread, or shape of its distribution does not change over time
r chart
monitors process variability
x chart
checks the process output to determine: on average, consistent with target variable, current performance, consistent with past performance
p-chart
a chart used for controlling the proportion of defective services or products generated by the process
c-chart
a chart used for controlling the number of defects when more than one defect can be present in a service or product
alternative hypothesis
finding the defendent guilty based on evidence
null hypothesis
we do not find enough evidence to find the defendant guilty
p-value
the smaller, the stronger evidence against the null hypothesis
z-test
comparing two subgroups where variance of both subgroups is known
t-test
comparing two subgroups where variance of subgroups is unknown
ANOVA
comparing three or more subgroups
normal distribution (try ln if not), equal variances (levene’s test)
assumptions of anova include
kruskal-wallis test
comparing three or more subgroups