1/43
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
The distribution of a categorical variable
Given in form of a stable with
each possible category
Frequency (or number) of individuals who fall into each category
Relative frequency (or percentage) of individuals who fall into each category
The relative frequency for particular category
The percentage of the frequency that the category appears in the data set.
Formula for relative frequency
Frequency/number of observations
Percentage of relative frequency
Relative frequency x 100%
A frequency distribution table can be displayed as
A bar chart or a pie chart.
Bar chart
A graph of the distribution of a categorical variable showing the counts for each category next to each other.
will effectively show the frequency or percent in different categories
Pie chart
Will show the relationship between the parts and the whole using the slices.
used only when you want to emphasize each category’s relation to the whole.
Slice size formula
Category relative frequency x 360 degrees
Contingency table
Shows how individuals are distributed long each variable, contingent on the value of the other variable.
Exploring the relationship between 2 categorical variables
Marginal distribution
Each frequency distribution of its respective variable.
A distribution of either the row or column variable
Conditional distribution
Shows the distribution of one variable for just the observations that satisfy a condition on another variable.
Dependent variables
If the continual distribution of one variable is not the same for each category of another.
there is an association between these variables
Independent variables
If the conditional distribution of one variable in a contingency table is the same for each category of another.
there is no association between these variables
Mean
A set of numerical observations is the familiar arithmetic average. The balance point of the distribution.
is not resistant to outliers and skewness
May not represent the “center” of the data very well
In a symmetrical distribution
Mean = median = mode
In a positively (right) skewed distribution
Mean > median > mode
Negatively (left) skewed distribution
Mean < median < mode
Range
Max - min
Upper fence
Every measurement above this fence is an outlier.
Upper fence formula
Q3 + 1.5 x IOR
Lower fence
Every measurement beneath this fence is an outlier.
Lower fence formula
Q1 - 1.5 x IOR
Symmetric distribution
Median line in center of box and whiskers of equal length.
Skewed right
Median line left of center and long right whisker.
Skewed left
Median line right of center and long left whisker.
Numerical summaries
Earning full numbers that preserves the relevant features of the data set so that you can draw useful conclusions.
y
The variable for which we have sample data (variable of interest).
n
Sample size = number of observations of the variable y
y1
The first sample observation of the variable y
y2
The second sample observation of the variable y
yn
The nth same observation of the variable y.
Median (M)
The value that divides the ordered sample in two sets of the same size; one half of the data lies below M, and the other half above M.
Mode
The value that occurs with the highest frequency in a data set. The value where the distribution is the tallest.
Deviations
The observation yi from the mean y bar.
Variance
Denoted by s2, is the sum of squared deviations from the mean divided by n - 1.
Standard deviation (s)
The square root of the variance.
Interquartile range
The range of the middle half of the data. Like the median, is resistant to outliers. Based on quartiles that divide the data into 4 equal sections.
pth percentile
The value so that p% of the measurements fall below this and (100-p)% fall above it.
Lower quartile (Q1)
The 25% percentile, separated the bottom 25% of the measurements from the top 75%.
Upper quartile Q3
The 75th percentile that separates the top 25% of the measurements from the bottom 75%.
Interquartile Range (IQR) formula
Q3 - Q1
Five number summary
A set of measurements consists of the minimum, the first quartile, the median, the third quartile, and the maximum, These numbers give a good summary of a distribution of quantitative observations.
Boxplot
The five number summary leads to this visual representation of a data set. A powerful graphical tool for summarizing data. It shows the center, the spread, and the symmetry or the skewness at the same time these diagrams are particularly useful when comparing groups.
Time plots
A variable plots each observation against the time at which it was measured. Time is always used on the horizontal axis.