1/26
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
individuals
are the objects described by a set of data. Individuals may be people, but they may also be animals or things.
variable
is any characteristic of an individual. A variable can take different values for different individuals.
categorical
variable places an individual into one of several groups or categories.
quantitative
variable takes numerical values for which arithmetic operations such as adding and averaging make sense.
distribution
of a variable tells us what values the variable takes and how often it takes these values.
The Bar Graph
Make sure you label your axes and title your graph
Scale your axes appropriately
Each bar should correspond to the appropriate count
Leave room between bars
Marginal distribution
What does Marginal distribution not tell us?
Marginal distribution tells us nothing about the relationship between two variables
conditional distribution
The probability distribution of one variable given the value of another variable. It shows how the values of one variable are influenced by the values of another variable.
Association
There is an association between two variables if knowing the value of one variable helps us predict the value of the other. If knowing the value of one variable does not help us predict the value of the other, then there is no association between the variables.
The Dotplot
Things to remember
You only need a properly labeled horizontal axis
You need to be neat
Title the graph
Each dot represents a count of 1
Works well with a small data set
When describing distribution what 5 things do you need to know?
Center (median, mean)
Shape (roughly symmetric, symmetric, normal, two clusters, skewed left, skewed right)
Variability (IQR, Standard deviation, range)
Outliers
CONTEXT (variables included in the experiment)
The Stemplot
Separate each piece of data into a stem (all but the rightmost digit) and a leaf (the final digit) For
example, if there are 24 students in the class, the 2 is the stem and the 4 is the leaf. If the temperature of a pizza oven was 539 degrees, the stem would be 53 and the leaf would be 9.
Write the stems vertically in increasing order from top to bottom.
Be very neat and make sure you leave the same amount of space inbetween leaves.
Title your graph
Include a key identifying what the stem and leaves represent.
Works well with a small data set
The Histogram
Things to remember
It is the most common graph of a quantitative variable.
The x-axis is continuous, so there should be no gaps between the bars (unless a class has zero
observations)
The graphing calculator can do a histogram for you
Title your graph and label the axes.
Properties of standard deviation
Standard deviation is only used to measure spread or dispersion around the mean of a data set.
Standard deviation is never negative.
Standard deviation is sensitive to outliers. A single outlier can raise the standard deviation and in turn, distort the picture of spread.
For data with approximately the same mean, the greater the spread, the greater the standard deviation.
all values of a data set are the same, the standard deviation is zero (because each value is equal
to the mean).
The units of measure of the standard deviation is the same as the original data
How do you calculate the IQR?
Q3-Q1
What is the Five number summary for a boxplot
Min
Q1
Median
Q3
Max
Outliers
Q1- (IQRx1.5)
Q3+(IQRx1.5)
Boxplot Tidbits
The boxplot is also known as the box and whisker graph.
The first quartile score is the score that separates the bottom 25% of the data from the top 75% of
the data.
The median is the score that separates the bottom 50% of the data from the top 50%.
The median is also known as Q2
The third quartile score is the score that separates the bottom 75% of the data from the top 25% of
the data.
The boxplot can be used to see if data is symmetric or if it is skewed.
The boxplot does not have a vertical axis, just a horizontal one.
mode
The mode is the value that appears most frequently in a dataset. It helps identify the most common or popular data point.
why is standard deviation a mean based statistic
because it has the mean inside of the formula to calculate it.
what is the symbol for sample mean?
x̄
What is the symbol for sample standard deviation?
Sx
What is the symbol for population mean?
μ
What is the symbol for population standard deviation?
σ
Can you determine if a data set is normal by looking at a boxplot?
No bloxplots tell symmetry not normality
Can you determine if a dataset is normal by looking at a histogram?
Yes because it shows how data is distributed.