1/35
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
random variable
a quantity with values not known with certainty
Variation
difference in a variable measured over observations.
frequency distribution
describes the values of a variable and how often they appear in the data.
categorical variable
data consist of labels or names for which arithmetical manipulation is impossible.
quantitative variable
data consist of numerical values for which arithmetical manipulation is possible.
Ways to make frewuecy distribution
Pivot table
CountIF
relative frequency of a bin
equals the proportion of items belonging to a class
= freq, of a bin/ n
percent frequency of a bin
the relative frequency times 100.
Percent Freq. of a bin = 100 ∙ Relative Freq. of a bin
A probability distribution
characterizes the variability of a random variable.
Benford’s Law
states that in many data sets, the proportion of observations in which the first digit is from 1 to 9, respectively, follows the distribution shown to the right.
A histogram
column chart with no spaces between the columns.
Kernel density chart
is a continuous alternative to a histogram.
It employs a smoothing technique known as kernel density estimation.
Not available in Excel.
Skewness
represents the lack of symmetry in a quantitative distribution.
skewed left
the left tail extends farther than the right one (example: exam scores)
symetric histogram
the two tails mirror each other
skewed right histogram
the right tail extends farther than the left one (example: housing prices)
a highly skewed histogram
one of the tails extends much farther than the other one (example: data on wealth and salaries are usually highly skewed right)
frequency polygon
is a visualization tool useful for comparing distributions
Like a histogram, a frequency polygon plots
count of observations in a set of bins
Unlike a histogram, a frequency polygon uses
uses lines instead of columns to connect the counts of different bins.
In Excel, use the option to create a frequency polygon.
Line Chart
trellis display
a vertical or horizontal arrangement of individual charts of the same type, size, scale, and formatting that differ only by the data they display.
A trellis display can be useful
when comparing three or more distributions that would otherwise appear cluttered if plotted using several frequency polygons on the same chart.
Using the standard deviation to describe variability:
≈ 68% of data values lie within one standard deviation of the mean
≈ 95% of data values lie within two standard deviations of the mean
≈ 99.7% of data values lie within three standard deviations of the mean
Violin chart
advanced visualization that combines the statistical descriptors of a box and whisker chart with a rotated and mirrored kernel density chart.
Statistical inference
is the process of collecting sample data to make estimates of or draw conclusions about one or more characteristics of a population.
confidence interval
parameter estimate such as the mean or the proportion of a population of interest.
margin of error
represents the uncertainty on the parameter estimate at a given confidence level, such as 95% or 99%.
confidence interval on a mean:
sample mean ± margin of error
confidence interval on a proportion:
sample proportion ± margin of error
The margin of error for a confidence interval on a mean depends on three factors:
1)The confidence level
2)The variability of sample values (the standard deviation)
3)The sample size
Time series data
is a sequence of observations on a variable measured at successive points in time.
A time series chart i
s a line chart with the time unit displayed on the horizontal axis and the values of the variable on the vertical axis.
percent frequency distribution
probability distribution
in a histogram, column height represents the
frequency of the corresponding bin
in a histogram and absence of space
reflects the continuous nature of the variable of interest