Unit 1 - One Variable Data

0.0(0)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/47

flashcard set

Earn XP

Description and Tags

Statistics

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

48 Terms

1
New cards
variable
holds information about the same characteristic for many subjects
2
New cards
categorical variable
where the data collected places the individuals in various categories or groups, this data is best represented by a table
3
New cards
quantitative variable
where the data collected is numerical and it makes sense to use it for numerical operations
4
New cards
frequency table
lists the categories for a categorical variable and displays the counts for each category
5
New cards
relative frequency table
lists the categories for a categorical variable and displays the percenatges for each category
6
New cards
Relative Count
count/total
7
New cards
distribution
\- tells us what values the variable takes and how often it takes these values.

\- Describes how a quantitative variable behaves. Generally include shape, center, spread, & unusual features. (visualize)
8
New cards
bar graph
\- x axis has categorical variable

\- y axis has displays counts/percentages

\- each category has its own bars, do not touch

\- order in not important on x axis
9
New cards
Simpson's Paradox
When averages are taken across different groups, they can appear to contradict the overall averages
10
New cards
histogram
a display for quantitative data that uses adjacent bars to represent counts or percentages of values falling in each interval
11
New cards
stem & leaf or plot
a display for quantitative data that uses place values to represent the distributions
12
New cards
dotplot
a display for either kind of data that uses a dot to represent each individual in the data set
13
New cards
measures of center
mean for distributions that are symmetric, median for all other distribution shapes
14
New cards
measures of spread
standard deviation for distributions that are symmetric, IQR for all other distribution shapes
15
New cards
uniform distribution
a distribution whose shape is evenly distributed throughout the values it takes
16
New cards
symmetric distribution
a distribution whose shape is unimodal and each side is roughly a mirror image of the other
17
New cards
left skewed distribution
a distribution that has a concentration of data on the upper end and the tail on the left
18
New cards
right skewed distribution
a distribution with a concentration of data on the lower end and the tail on the right
19
New cards
outliers
values that fall outside the overall pattern of the data

\- MUST find out why it occurs
20
New cards
mean
the average of the data values
21
New cards
median
the value in the center of an ordered data set
22
New cards
range
the maximum data value minus the minimum data value
23
New cards
first quartile
the value where 25 % of the data fall below it in an ordered list
24
New cards
third quartile
the value where 75% of the data falls below it in an ordered list
25
New cards
Interquartile Range (IQR)
the third quartile minus the first quartile
26
New cards
percentile
the place in the data where a certain percentage of the data falls below that value
27
New cards
5 number summary
includes the minimum, first quartile, median, third quartile, & the maximum
28
New cards
modified boxplot
a display for quantitative data that graphs the five-number summary on an axis and shows outliers of they exist
29
New cards
variance
the standard deviation squared, it is a measure of spread
30
New cards
advantage of stemplot
retains the actual data values from the data set
31
New cards
advantage of histogram
easy to see shape of distribution & good for large data sets
32
New cards
resistant
values that are not strongly affected by extreme values, the median is more resistant that the mean. The standard deviation is most strongly affected by extreme values
33
New cards
Discrete variables
are numerical values where counting makes sense, decimals would not be a good way to record the data
34
New cards
Continuous variables
are numerical values where decimals are appropriate; it usually involves some form of measuring
35
New cards
One-way Table
one variable
36
New cards
Bins
to make a histogram you have to put the data into bins which are even intervals that capture our data.
37
New cards
Bin Width
38
New cards
Stemplots
alternate way of illustrating data using a semi-graph, data is not lost (unlike histogram). Stem is first digit and leaf is second. needs a KEY
39
New cards
Back to back stemplots
created when you can separate the data into two categories. needs a KEY
40
New cards
Split Stem Leaf
when you have too many data values in a single stem, it can be helpful to split the stem; the same way we could create more bins on a histogram if our bin width resulted in a skyscraper
41
New cards
Dot plot
plotting data values with dots above corresponding values on a number line
42
New cards
Cumulative relative frequency graphs (ogives)
display percentiles
43
New cards
Percentile
will tell you what percent of data falls below a value
44
New cards
Factors of a Misleading Graph
\- putting a squiggle on the y axis, must start at a reasonable number

\- no labels on axes

\- different widths/volume on bars

\-
45
New cards
How to create an ogive
\- the first interval is 40-
46
New cards
If someone is at the 30th percentile what does that mean
less than 30% of the \[individuals\] were \[verb\] before the \[variable\] \[specific value\]
47
New cards
Describing Distributions
S: Shape

O: Outliers - typically visual on AP test

C: Center - mean/median, add context ex: symmetric

V: Variability - spread, standard of deviation
48
New cards
What is SOCV used to describe?
\- NOT USED to describe categorical variables

\- bc categories can be placed in any order on the x-axis, so shape would not make sense

\- same thing for center and spread, cannot find the avg zip code for ex.