1/53
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
noise
EXPECTED variability
signal
variability due to certain CHARACTERISTICS
cases
objects or subjects described by a data set; WHERE your info came from
ex: 5 students being surveyed on their age/major
variable
a characteristic of a person/thing that can be assigned a number or category
-ex: age, major
value
the actual OUTCOMES of that variable (ex: 19 years old, Mathematics BA)
quantitative variable
a variable that records the AMOUNT of something (numerical) (speed limit, age, etc.)
categorical variable
records which of several groups or categories an individual belongs to (ex: hair color, zip codes, phone numbers)
numbers can be categorical variables when
the numbers do not actually have a meaning when on a continuum/not a measurement (ex: phone numbers, zip codes, etc.)
in a bar chart, the vertical bars are kept
SEPARATE
bar charts can display frequency or
relative frequency
relative frequencies should add up to
1.0
you cannot leave out certain categories in a pie chart because
it would not be a complete circle (100%)
stemplots can have ______ lines per stem
one OR two
purpose of having two lines per stem
might need to "zoom" in deeper and break the stem into pieces to see the distribution/spread of data
stem
ALL but the final digit
leaf
the final digit
histogram
a bar graph depicting a frequency distribution (frequency or relative frequency)
advantage of stem & leaf plot
displays all outcomes
disadvantage of stem & leaf plot
will not be very informative if the data set is too large (histogram will be more organized)
single value grouping
each vertical bar of the histogram represents a SINGLE possible value
single value grouping is rarely used and is best for
a small data set or small range of values
limit grouping
Use when the data are expressed as whole numbers and there are too many distinct values to employ single-value grouping
limit grouping only works on _______ variables
DISCRETE
-ex: number of used textbooks bought = [0-3], [4-7], [8-11], [12-15]
interval width for limit grouping
(upper limit - lower limit + 1)
-ex: the interval [4, 7] actually has a width of 4 (4, 5, 6, 7)
cutpoint grouping is used on __________ variables
CONTINUOUS
ranges of cutpoint grouping is expressed as
"[number] to under [number]"
ex: "54 to under 56"
[54, 56)
discrete variables
values that can be counted; a FINITE number of possible outcomes; integers only
-ex: # of books, # of people, etc.
continuous variables
can assume an infinite number of values between any two specific values; goes off into DECIMALS
-ex: weight, time, etc.
Patterns of Data:
1) shape (modality & symmetry/skewness)
2) center (central tendency)
3) spread (SD, range, interquartiles, etc.)
4) outliers
unimodal distribution
A distribution with one peak
bimodal distribution
a distribution with two modes
multimodal distribution
two or more peaks in a distribution curve
symmetric distribution
a distribution in which the data values are uniformly distributed about the mean; the mean is the best measure of center
left skew
mean > median
-clusters on the right
-long tail is on the left
right skew
mean < median
-clusters on the left
-long tail is on the right
in left and right skewed distributions, the _________ is the best measure of center
MEDIAN
measures of center
1) mean
2) median
3) mode
measure of center for categorical data
MODE
how many times must a value occur in a data set to be a mode?
TWICE
resistant measure
extreme values have little to no influence on its outcome
-not sensitive; does not respond strongly to outliers or changes in a few observations
range =
largest observation - smaller observation
-measures SPREAD of data
-NOT a resistant measure
deviation
difference between an observation and the mean
the SUM of all deviations from the mean
is always equal to zero
sample standard deviation
the AVERAGE of all deviations
sample standard deviation (s) tends to __________ population standard deviation
underestimate
quartiles
divides a set into 4 equal parts
(Q1, Q2, Q3)
interquartile range
Q3 - Q1
-RESISTANT
finding quartiles:
1) rearrange the data in ascending order
2) Q2 = median
3) Q1 = the median of the LOWER HALF of observations
4) Q3 = the median of UPPER HALF of observations
if you are finding the quartiles on a set with an odd number of observations,
you can either include or NOT include the median, but expect different results. include the median in your upper and lower halfs for exam
5 Number Summary
1) minimum value
2) Q1
3) Q2/median
4) Q3
5) maximum value
Outliers
lower fence = Q1 -1.5(IQR)
upper fence = Q3 + 1.5(IQR)
boxplot
displays distribution of a data set using 5 number summary
advantage of boxplot
clearly shows outliers and skew
z score
the number of standard deviations a particular score is from the mean