Unit Two: Exploring Two-Variable Data- essential knowledge

0.0(0)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/34

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

35 Terms

1
New cards

Are apparent patterns and associations in data always meaningful?

No, apparent patterns and associations in data may be random or not meaningful.

2
New cards

What types of graphs can be used to represent one categorical variable broken down by categories of another categorical variable?

Side-by-side bar graphs, segmented bar graphs, and mosaic plots.

3
New cards

How can graphical representations of two categorical variables be used?

They can be used to compare distributions and determine if variables are associated.

4
New cards

What is a two-way table, and what does it summarize?

A two-way table, or contingency table, summarizes two categorical variables, with cell entries showing frequency counts or relative frequencies.

5
New cards

How is joint relative frequency calculated in a two-way table?

It is calculated by dividing a cell frequency by the total for the entire table.

6
New cards

What are marginal relative frequencies in a two-way table?

They are the row and column totals divided by the total for the entire table.

7
New cards

What is a conditional relative frequency?

It is a relative frequency for a specific part of the table, such as cell frequencies in a row divided by the total for that row.

8
New cards

What can summary statistics for two categorical variables be used for?

They can be used to compare distributions and determine if variables are associated.

9
New cards

What is a bivariate quantitative data set?

It consists of observations of two different quantitative variables made on individuals in a sample or population.

10
New cards

What does a scatterplot show?

It shows two numeric values for each observation, with one on the x-axis and one on the y-axis.

11
New cards

What is an explanatory variable?

It is a variable used to explain or predict the values of a response variable.

12
New cards

How do you describe a scatterplot?

By considering its form, direction, strength, and any unusual features.

13
New cards

What are the possible directions of association in a scatterplot?

The association can be positive (as one variable increases, the other increases) or negative (as one variable increases, the other decreases).

14
New cards

How can the form of association in a scatterplot be described?

It can be linear or non-linear.

15
New cards

How is the strength of association in a scatterplot described?

By how closely the points follow a pattern, such as linear, and it can be strong, moderate, or weak.

16
New cards

What are some unusual features in a scatterplot?

Clusters of points or points with large discrepancies between actual and predicted values.

17
New cards

What does the correlation coefficient (r) measure?

It measures the direction and strength of the linear association between two quantitative variables.

18
New cards

How is the correlation coefficient (r) usually determined?

The most common way is by using technology.

19
New cards

Does a correlation close to ±1 always indicate a linear relationship?

No, a correlation close to ±1 does not always mean that a linear model is appropriate.

20
New cards

What are key properties of the correlation coefficient (r)?

It is unit-free, always between -1 and 1, and an r of 0 indicates no linear association.

21
New cards

Does correlation imply causation?

No, a relationship between two variables does not mean that changes in one cause changes in the other.

22
New cards

What is a simple linear regression model?

It’s an equation that uses an explanatory variable (x) to predict a response variable (y).

23
New cards

How is the predicted response value (y-hat) calculated in a linear regression model?

y = a + bx, where a is the y-intercept, b is the slope, and x is the explanatory variable value.

24
New cards

What is extrapolation, and why is it risky?

Extrapolation is predicting a response value beyond the range of the data, which makes the prediction less reliable.

25
New cards

What is a residual in regression?

It’s the difference between the actual value and the predicted value: residual = y - y-hat.

26
New cards

What does apparent randomness in a residual plot indicate?

It suggests that the linear model is appropriate for the data.

27
New cards

What does the least-squares regression model do?

It minimizes the sum of the squares of the residuals and includes the point (x̄, ȳ).

28
New cards

How is the slope (b) of a regression line calculated?

b = r(s_y / s_x), where r is the correlation, s_y is the standard deviation of the response variable, and s_x is the standard deviation of the explanatory variable.

29
New cards

Does the y-intercept always have a logical interpretation in context?

No, the y-intercept might not always have a logical meaning in context

30
New cards

What are the coefficients in the least-squares regression model?

They are the estimated slope and y-intercept.

31
New cards

What does the slope represent in a regression line?

It represents the amount the predicted y-value changes for every unit increase in x

32
New cards

What does the y-intercept represent in a regression line?

It represents the predicted value of the response variable when the explanatory variable equals 0.

33
New cards

What is the coefficient of determination (r²) in linear regression?

It’s the square of the correlation (r) and indicates the proportion of variation in the response variable explained by the explanatory variable

34
New cards

Why might transformations of variables be used in regression?

Transformations can make data more linear, improving the model’s fit.

35
New cards

What does increased randomness in residual plots after data transformation suggest?

It suggests that the transformed data is a better fit for a linear model.