Unit Two: Exploring Two-Variable Data- essential knowledge

studied byStudied by 10 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 34

35 Terms

1

Are apparent patterns and associations in data always meaningful?

No, apparent patterns and associations in data may be random or not meaningful.

New cards
2

What types of graphs can be used to represent one categorical variable broken down by categories of another categorical variable?

Side-by-side bar graphs, segmented bar graphs, and mosaic plots.

New cards
3

How can graphical representations of two categorical variables be used?

They can be used to compare distributions and determine if variables are associated.

New cards
4

What is a two-way table, and what does it summarize?

A two-way table, or contingency table, summarizes two categorical variables, with cell entries showing frequency counts or relative frequencies.

New cards
5

How is joint relative frequency calculated in a two-way table?

It is calculated by dividing a cell frequency by the total for the entire table.

New cards
6

What are marginal relative frequencies in a two-way table?

They are the row and column totals divided by the total for the entire table.

New cards
7

What is a conditional relative frequency?

It is a relative frequency for a specific part of the table, such as cell frequencies in a row divided by the total for that row.

New cards
8

What can summary statistics for two categorical variables be used for?

They can be used to compare distributions and determine if variables are associated.

New cards
9

What is a bivariate quantitative data set?

It consists of observations of two different quantitative variables made on individuals in a sample or population.

New cards
10

What does a scatterplot show?

It shows two numeric values for each observation, with one on the x-axis and one on the y-axis.

New cards
11

What is an explanatory variable?

It is a variable used to explain or predict the values of a response variable.

New cards
12

How do you describe a scatterplot?

By considering its form, direction, strength, and any unusual features.

New cards
13

What are the possible directions of association in a scatterplot?

The association can be positive (as one variable increases, the other increases) or negative (as one variable increases, the other decreases).

New cards
14

How can the form of association in a scatterplot be described?

It can be linear or non-linear.

New cards
15

How is the strength of association in a scatterplot described?

By how closely the points follow a pattern, such as linear, and it can be strong, moderate, or weak.

New cards
16

What are some unusual features in a scatterplot?

Clusters of points or points with large discrepancies between actual and predicted values.

New cards
17

What does the correlation coefficient (r) measure?

It measures the direction and strength of the linear association between two quantitative variables.

New cards
18

How is the correlation coefficient (r) usually determined?

The most common way is by using technology.

New cards
19

Does a correlation close to ±1 always indicate a linear relationship?

No, a correlation close to ±1 does not always mean that a linear model is appropriate.

New cards
20

What are key properties of the correlation coefficient (r)?

It is unit-free, always between -1 and 1, and an r of 0 indicates no linear association.

New cards
21

Does correlation imply causation?

No, a relationship between two variables does not mean that changes in one cause changes in the other.

New cards
22

What is a simple linear regression model?

It’s an equation that uses an explanatory variable (x) to predict a response variable (y).

New cards
23

How is the predicted response value (y-hat) calculated in a linear regression model?

y = a + bx, where a is the y-intercept, b is the slope, and x is the explanatory variable value.

New cards
24

What is extrapolation, and why is it risky?

Extrapolation is predicting a response value beyond the range of the data, which makes the prediction less reliable.

New cards
25

What is a residual in regression?

It’s the difference between the actual value and the predicted value: residual = y - y-hat.

New cards
26

What does apparent randomness in a residual plot indicate?

It suggests that the linear model is appropriate for the data.

New cards
27

What does the least-squares regression model do?

It minimizes the sum of the squares of the residuals and includes the point (x̄, ȳ).

New cards
28

How is the slope (b) of a regression line calculated?

b = r(s_y / s_x), where r is the correlation, s_y is the standard deviation of the response variable, and s_x is the standard deviation of the explanatory variable.

New cards
29

Does the y-intercept always have a logical interpretation in context?

No, the y-intercept might not always have a logical meaning in context

New cards
30

What are the coefficients in the least-squares regression model?

They are the estimated slope and y-intercept.

New cards
31

What does the slope represent in a regression line?

It represents the amount the predicted y-value changes for every unit increase in x

New cards
32

What does the y-intercept represent in a regression line?

It represents the predicted value of the response variable when the explanatory variable equals 0.

New cards
33

What is the coefficient of determination (r²) in linear regression?

It’s the square of the correlation (r) and indicates the proportion of variation in the response variable explained by the explanatory variable

New cards
34

Why might transformations of variables be used in regression?

Transformations can make data more linear, improving the model’s fit.

New cards
35

What does increased randomness in residual plots after data transformation suggest?

It suggests that the transformed data is a better fit for a linear model.

New cards
robot