1/27
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
variability (what is it?)
Quantitative measure of the degree to which scores in a distribution are spread out or clustered together
range (what is it? Xmax? Xmin?)
range = the difference between the largest and smallest score
Xmax: largest score
Xmin: smallest score
exclusive range (what is it? how is it measured? most common of range variation? does "exclusive" need to be specified when discussing "range" Be able to calculate it for a given distribution)
Most Common, the assumed version when referring to range
(Xmax - Xmin)
computes difference between highest and lowest values in distribution
inclusive range (what is it? how is it measured? when is it useful? does "inclusive" need to be specified when discussing "range"? Be able to calculate it for a given distribution)
Computes number of values included between highest and lowest values
Xmax - Xmin + 1
Useful for discrete values
If data suggests we should emphasize number of values, we often use inclusive range, must specify inclusive range is needed
advantages / limitations of the range?
Advantage: simple to calculate and easy to understand
Limitations: determined by 2 extreme values and ignores every other score in distribution --> often fails to give accurate measure of variability
range generally considered crude and unreliable measure of variability.
quartiles (what are they? what measures of variability are they used to compute? how are they computed?)
Divides distributions into 4 equal parts
Used to compute interquartile range and semi-interquartile range
How to compute: First find median (Q2), first quartile (Q1) is median of lower half of distribution, and third quartile (Q3) is median of upper half
Lower half = everything at/below median (Q2)
Upper half = everything at/above median (Q2)
quartiles (what percentage of a distribution is below Q1? Below Q2? Below Q3? Q2 is the same as what measure of central tendency?)
Q1 seperates first 25% of distribution
Q2 has 50% of distribution (same as median)
Q3 has 75% of distribution below it
Computing quartiles ex 1 (find Q1, Q2(median) and Q3):
11, 3, 4, 7, 5, 9, 13, 10
First order them: 3, 4, 5, 7, 9, 10, 11, 13
Q2 (median): 8
(bc even number of scores and 8 falls in b/w 7 and 9!)
lower half: 3, 4, 5, 7 Upper half: 9, 10, 11, 13
Q1: 4.5
(median of lower half)
Q3: 10.5
(median of upper half)
calculating quartiles ex 2 (find Q1, Q2(median) and Q3):
15, 4, 7, 8, 21, 1, 5, 18, 11, 13, 10
First order them: 1, 4, 5, 7, 8, 10, 11, 13, 15, 18, 21
Q2: 10 (median)
Lower half: 1, 4, 5, 7, 8, 10 Upper half: 10, 11, 13, 15, 18, 21
Q1: 6 (median of lower half)
Q3: 14 (median of upper half)
interquartile range (what is it? how is it calculated? Be able to calculate it from a given distribution)
Distance between first and third quartile
Q3 - Q1
semi-interquartile range (what is it? how is it calculated? what does it provide a measure of? Be able to calculate it from a given distribution)
One half of the interquartile range
(Q3 - Q1) / 2
Provides descriptive measure of "typical" distance of scores from median (Q2)
semi-interquartile range (advantages/disadvantages?)
Advantages: focuses on the middle 50% distribution, so it's less likely to be influenced by extreme scores
- makes it better/stable measure of variability than range
Disadvantages: doesn't take into account actual distances b/w individual scores so it doesn't give complete pic of how scattered/clustered scores are
box plots (also known as? what statistical elements are represented on the plot? what are hinges? H-spread?)
also known as box-and-whisker plot
shows median, quartiles and range on plot
Hinges: the "box" of the plot determined by Q1 and Q3 (the left and right ends of the box!)
H-spread: interquartile range, distance between two "hinges" of the box plot
box plots (inner fence? adjacent values? nonadjacent values? Be able to calculate all of these values; Be able to identify all of these on a box plot)
inner fence: point that falls 1.5 times the H-spread (interquartile range) above or below the appropriate hinge
adjacent values: values in data that are no farther from median than the inner fences (inside inner fences)
anything outside inner fences is a nonadjacent value
inner fence ex: H-spread of 2, with hinges at 4 and 6
1.5 H-spread --> (1.5 2 = 3)
lower hinge is at 4, so lower fence will be: 4 - 3 = 1
upper hinger at 6, so upper fence will be: 6 + 3 = 9
box plot lines ("whiskers") (what values are they drawn through?)
Lines (whiskers) are drawn from hinges out through adjacent values
box plot outliers (what are they? non-adjacent values? how are they plotted? Be able to identify them in a distribution or on a box plot)
any value more extreme than the end of the whiskers (more extreme than adjacent values)
non-adjacent values are outliers
plotted as just dots on the outside of inner fences
two methods of calculating the range in R?
Method 1: range() function
Method 2: min() and max() functions
range() function (what does it do? What does it return?)
returns a vector with both the lowest and highest value in your data
(to get actual range, we subtract first element from the second element): range(VecRaw)[2] - range(VecRaw)[1]
max() and min() functions (how can they be used to compute the range?)
min() calls low value and max() calls high value, then subtract low value from high value:
max(VecRaw) - min(VecRaw)
fivenum() function (what does it do? what values does it return and in what order? how can we access individual values from returned values?)
Computes the quartiles, returns 5 values:
(1) the minimum
(2) first quartile (lower hinge)
(3) the median
(4) third quartile (upper hinge)
(5) the maximum
we can access by using brackets, ex, Q3 - Q1:
fivenum(VecRaw)[4] - fivenum(VecRaw)[2]
quantile() function (what does it do? what does it do that the fivenum() function does not? what does the "type" parameter change? Know how to use the "probs" parameter to calculate a series of percentiles)
allows you to select any percentiles you want, rather than just returning quartiles
can specify exactly which method you want to use by using "type" (type 1-9)
use probs parameter to compute specific percentiles, ex 30th and 70th percentile:
quantile(VecRaw, probs = c(.3, .7))
boxplot() function (default orientation? "horizontal" parameter? "outline" parameter?)
Default orientation of a boxplot is vertical
horizontal = TRUE orients boxplot horizontally
outline = FALSE suppresses display of outliers
boxplot() (What elements of plot can be adjusted and with what parameters? "medlwd" parameter? "medcol" parameter? "border" parameter? "col" parameter "lwd" parameter?)
medlwd: set width of the median
medcol: set color of the median
border: change color of the box border and whiskers
col: change color that fills the box
lwd: set the thickness of lines
boxplot() (Line types? "lty" parameter? What parameters adjust appearance of outliers? "pch" parameter? "cex" parameter? "outbg" parameter? "outcol" parameter?)
Line types: 0) Blank (no line), 1) solid line, 2) dashed line, 3) dotted line, 4) dot-dash line, 5) longdash line, 6) Twodash line
lty: sets all lines to same type
(can change individual lines using, boxlty, medlty, whisklty)
pch: adjusts plotting character for outliers (ex: pch = 15 makes them solod squares)
cex: changes size of plotting character
outbg: sets fill color of outlier
outcol: sets border color of outlier
multiple box plots on same graph (how do we get multiple box plots on the same graph? what data structure is sent to boxplot() function to create multiple plots on the same graph?)
Create a list containing both data sets then pass it to the barplot function:
DataList <-- list(DataSet1 = VecRaw, DataSet2 = VecRaw2)
boxplot(DataList, horizontal = TRUE)
vertical vs. horizontal box plot orientation (guidelines for when to choose vertical orientation? when to choose horizontal orientation?)
Vertical: when data groups differ by TIME
Horizontal: when putting a lot of plots on same graph and if the names of each of your groups (data sets) is fairly long, this lets you spell out whole group name on left side of chart
box plot label orientation ("las" parameter and its values?)
To change label orientation, use las:
0) labels parallel to axis, 1) labels always horizontal, 2) Labels perpendicular to axis, 3) Labels always vertical
Ex: if we want horizontal labels, we write las = 1