Looks like no one added any tags here yet for you.
statistics
the science of using an assortment of methods to systematically collect, organize, summarize, analyze, and interpret information
descriptive statistics
tabular, graphical, or numerical summaries of data for a particular group
statistical inference
using data collected from a sample in order to make estimates and test hypotheses about the characteristics of a larger inference
population
the set of all elements of interest in a particular study
census
collecting data for the entire population
sample
subset of the population
sample survey
collecting data for a sample
data
information that we collect and analyze
data set
all of the data that is collected for a study
elements
the subjects of a study; entities on which data is collected
variable
attribute of the subjects/elements we are interested in studying
observation
set of all measurements collected for one subject/element
total data values
elements x variables
measurement scale
the nature of the values that are assigned to variables
nominal
categorical variables; does not indicate ranking (ex. gender, zip code)
ordinal
ranked data; distance between not equal/known (ex. socioeconomic class, TRACE scale)
interval
always numeric; distance between integers are equal; no absolute zero (0 doesn’t mean “absence of”) (ex. temperature, dress size)
ratio
lowest value is always zero; has an absolute zero (0 means “absence of”) (ex. age)
qualitative
nominal or ordinal; use words (or rank number) to describe subjects/elements
quantitative
interval or ratio; use numbers to describe subjects/elements
continuous
variables can take on values between whole numbers (fractions/decimals)
discrete
usually only take on whole number values (except shoe size)
experiment
a variable is specifically manipulated by the researcher
constant
a characteristic of elements/subjects that does not vary from one subject to the next
control variable
held constant in a research study by observing only one of its levels
observational research
levels of independent variable already exist (ex. gender, age); cannot make causal statements; looks for relationships between some set of variables
cross-sectional studies
provide a “snapshot” of different groups at one point in time
time series studies
longitudinal; use data that are collected on the same subjects/elements over several points in time; observe changes over time
effects
changes in data patterns
cyclical effect
any usual/consistent variation in daily, weekly, monthly, or annual data not related to change in season
seasonal effect
change in data that can be explained by/attributed to annual calendar-related events
irregular effect
any change in the data is not related to a regular cycle or season; caused by unusual events
business analytics
the use of data, tehcnology and statistical analysis to answer questions
descriptive analytics
use of data to understand past and current business performance
predictive analytics
use of historical data to identify patterns or relationships and to make predictions about what will happen in the future
prescriptive analytics
identify the best alternatives to minimize or maximize some objective
parameters
greek symbols representing descriptive measure of a population
sample statistics
roman letters representing descriptive measures of a sample
random sample
each member of population has an equal and independent chance of being included in the sample
sampling bias
when a sample is collected in a way that results in some members of the population being more or less likely to be included than others
raw data
data that has not been organized or summarized in any way
frequency distribution
the list of all frequencies for all categoreis
relative frequency distribution
the proportion of the observations that belong to a category f/n
percent frequency distribution
the percent of the observations that belong to a category f/n x 100
class intervals
data divided into sets with equal widths
class midpoint
the value half way between the upper and lower limit of an interval
cumulative frequency distribution
total number of items that have values less than or equal to the upper limit of each class
data vizualization
the process of displaying data meaningully in order to improve decision-making
dashboard
visually summarizes key business information
line charts
display data over time
pie chart
used to display relative frequency or percentage distributions
bar chart
used to visually present qualitative data; separated bars
histograms
display frequency distributions of quantitative variables; no spaces between bars
frequency polygon
points used to depict frequency for each class interval
scatter plot
displays relationship between two quantitative variables
trendline
depicts general direction of the relationship between variables
positive relationship
as x increases, y increases
negative relationship
as x increases, y decreases
symmetrical distribution
similar on both sides
negatively skewed distribution
skewed left; most data fall at upper end
positively skewed distribution
skewed right; most data fall at lower end
skewness
the measure of the symmetry of a data distribution
kurtosis
the measure of how peaked or flat a data distribution is relative to a normal distribution
excess kurtosis
kurtosis - 3
mesokurtic
kurtosis = 3, excess = 0
leptokurtic
kurtosis > 3, excess > 0
platykurtic
kurtosis < 3, excess <0
mildly skewed rule of thumb
skewness between -.5 and +.5
moderately skewed rule of thumb
skewness between -.5 and -1 or between +.5 and =1
highly skewed rule of thumb
skewness less than -1 or greater than +1
standard error calculation
plus or minus 3 times the standard error of skewness
mode
value with highest frequency of occurance; unaffected by outliers
median
the middlemost value when arranged in ascending order; not affected by outliers
median index
(n+1)/2
mean
best measure for normal data; can be pulled in direction of outliers
positive skew central tendencies
mode < Median < mean
negative skew central tendencies
mean < median < mode
mean of the means
if groups are the same size use ___
weighted mean
used to calculate the mean of two or more groups when their sample sizes are not equal
percentiles
divide rank-ordered data into 100 equal parts
quartiles
divide rank-ordered data into 4 equal parts
measures of variability
describe the spread of the data around the center
range
largest value - smallest value; sensitive to outliers
Interquartile Range
Q3-Q1; middle half of data
box and whisker plot
shows the center, the spread, and outliers of a data distribution
five number summary
smallest value(within inner fences), Q1, Q2, Q3, largest value(within inner fences)
inner fences
Q1 - (1.5 x IQR) Q3 + (1.5 x IQR)
outer fences
Q1 - (3 x IQR) Q3 + (3 x IQR)
middle 50% of data positively skewed
if median in box is on the left side; the ___
outer 50% of data positively skewed
if longest whisker is to the right of the box; the ___
deviation score
the distance from any score in teh data to the mean of the distribution
zero
deviation scores add up to ___
sum of squares
sum of squared deviation scores ; total variation in the data set
population variance
sum of squares divided by N; average variability
sample variance
sum of squares divided by n-1; average variability
standard deviation
square root of variance; average deviation around the mean of a distribution
degrees of freedom
the number of scores that are free to vary
empirical rule
applies to normal data; 68% between 1 standard deviation, 95% between 2 standard deviation, 99.7% between 3 standard deviation
chebyshev’s theorem
any data set (skewed); what percentage of data lie within k standard deviations of the mean (k>1)
coefficient of variation
how large the standard deviation is relative to the mean