Chapter 1: Exploring Data

0.0(0)

Studied by 7 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/26

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

27 Terms

New cards

individuals

are the objects described by a set of data. Individuals may be people, but they may also be animals or things.

New cards

variable

is any characteristic of an individual. A variable can take different values for different individuals.

New cards

categorical

variable places an individual into one of several groups or categories.

New cards

quantitative

variable takes numerical values for which arithmetic operations such as adding and averaging make sense.

New cards

distribution

of a variable tells us what values the variable takes and how often it takes these values.

New cards

The Bar Graph

Make sure you label your axes and title your graph

Scale your axes appropriately

Each bar should correspond to the appropriate count

Leave room between bars

New cards

Marginal distribution

New cards

What does Marginal distribution not tell us?

Marginal distribution tells us nothing about the relationship between two variables

New cards

conditional distribution

The probability distribution of one variable given the value of another variable. It shows how the values of one variable are influenced by the values of another variable.

New cards

Association

There is an association between two variables if knowing the value of one variable helps us predict the value of the other. If knowing the value of one variable does not help us predict the value of the other, then there is no association between the variables.

New cards

The Dotplot

Things to remember

You only need a properly labeled horizontal axis

You need to be neat

Title the graph

Each dot represents a count of 1

Works well with a small data set

New cards

When describing distribution what 5 things do you need to know?

Center (median, mean)

Shape (roughly symmetric, symmetric, normal, two clusters, skewed left, skewed right)

Variability (IQR, Standard deviation, range)

Outliers

CONTEXT (variables included in the experiment)

New cards

The Stemplot

Separate each piece of data into a stem (all but the rightmost digit) and a leaf (the final digit) For

example, if there are 24 students in the class, the 2 is the stem and the 4 is the leaf. If the temperature of a pizza oven was 539 degrees, the stem would be 53 and the leaf would be 9.

Write the stems vertically in increasing order from top to bottom.

Be very neat and make sure you leave the same amount of space inbetween leaves.

Title your graph

Include a key identifying what the stem and leaves represent.

Works well with a small data set

New cards

The Histogram

Things to remember

It is the most common graph of a quantitative variable.

The x-axis is continuous, so there should be no gaps between the bars (unless a class has zero

observations)

The graphing calculator can do a histogram for you

Title your graph and label the axes.

New cards

Properties of standard deviation

Standard deviation is only used to measure spread or dispersion around the mean of a data set.

Standard deviation is never negative.

Standard deviation is sensitive to outliers. A single outlier can raise the standard deviation and in turn, distort the picture of spread.

For data with approximately the same mean, the greater the spread, the greater the standard deviation.

all values of a data set are the same, the standard deviation is zero (because each value is equal

to the mean).

The units of measure of the standard deviation is the same as the original data

New cards

How do you calculate the IQR?

Q3-Q1

New cards

What is the Five number summary for a boxplot

Min
Q1
Median
Q3
Max

New cards

Outliers

Q1- (IQRx1.5)

Q3+(IQRx1.5)

New cards

Boxplot Tidbits

The boxplot is also known as the box and whisker graph.

The first quartile score is the score that separates the bottom 25% of the data from the top 75% of

the data.

The median is the score that separates the bottom 50% of the data from the top 50%.

The median is also known as Q2

The third quartile score is the score that separates the bottom 75% of the data from the top 25% of

the data.

The boxplot can be used to see if data is symmetric or if it is skewed.

The boxplot does not have a vertical axis, just a horizontal one.

New cards

mode

The mode is the value that appears most frequently in a dataset. It helps identify the most common or popular data point.

New cards

why is standard deviation a mean based statistic

because it has the mean inside of the formula to calculate it.

New cards

what is the symbol for sample mean?

x̄

New cards

What is the symbol for sample standard deviation?

New cards

What is the symbol for population mean?

New cards

What is the symbol for population standard deviation?

New cards

Can you determine if a data set is normal by looking at a boxplot?

No bloxplots tell symmetry not normality

New cards

Can you determine if a dataset is normal by looking at a histogram?

Yes because it shows how data is distributed.