Lecture 9 - Statistical Inference

0.0(0)

Studied by 0 people

Call with Kai

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/6

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

7 Terms

New cards

Null Model (for linear regression)

Would only have the intercept and a slope of zero, just a horizontal line telling us the average value (or median, or whatever metric).

There’d be no explanatory variable

With RCode:

lm(response_variable ~ 1, data = dataset)

New cards

One Explanatory Variable Linear Regression Model

With a quantitative explanatory variable, it’d make the normal linear regression we’d think of with one intercept and one slope (y = mx + b).

However, we can also do it with categorical variables. To do so, we’d use the concept of dummy variables in order to quantify the levels of a categorical variable. It would have the formula:

y = b + y = b + m₁x₁ + m₂x₂….

So, for example, if there are three categories (a, b, and c), and for an equation,

y = b + m_bx_b + m_cx_c
b would represent the average value of category a
m_b would represent the average value of category b
m_c would represent the average value of category c
Notice that category b and category c are being compared to category a
In RCode, the one that’s being compared to is the one that comes first in the alphabet

With RCode:

lm(response_variable ~ explanatory_variable, data = dataset)

New cards

Dummy Variables

A useful concept to use when applying a linear regression with cateogrical variables. Essentially, depending on the level we are investigating, we will assign either 0 or 1 to the level

So with the example above, if we wanted to do a point prediction for category b, then the formula would look like:

y = b + m_b(1) + m_c(0)

We substitute b for 1 and c for 0

New cards

Two or More Explanatory Variable Linear Regression Model: Additive

We could also do this with multiple explanatory variables. An additive model would only consider the main effect of the explanatory variables and each level will utilise the same slope, but there will be multiple lines, one for each of the levels of the categorical variable, and are parallel

We’d use the RCode:

lm(response_variable ~ explanatory_variable1 + explanatory_variable2, data = dataset)

New cards

Two or More Explanatory Variable Linear Regression Model: Interactive

Similar to the previous one but an interactive model would show us the main effect as well as any interaction terms that may be present. Essentially, each line for each level will have a different slope and won’t be parallel to each other.

We’d use the RCode:

lm(response_variable ~ explanatory_variable1 * explanatory_variable2, data = dataset)
or
lm(response_variable ~ explanatory_variable1 + explanatory_variable2 + explanatory_variable1 : explanatory_variable 2, data = dataset)

New cards

Comparing Models

We can compare them with anova() (two models):

Which gives us the p-value for how different the two models are from one another. Of course, we’d interpret this against the null hypothesis that

Or we can use AIC() (three+ models)

The smallest value model means the best model

After this, we’d also check that the chosen best model meets the assumptions of linear regression with gglm()

New cards

Confidence Intervals

A range of value, calculated from sample data, around a population estimate, that is believed to contain, with a certain confidence/probability, the true value of that population statistic

It’s a way to quantify uncertainty about the value of the parameter (the larger the more uncertain)

When interpreting one, we have to include:

Confidence value
Upper & lower limit
Biological context

An example would be:

We are 95% confident that the average bill depth of a Chinstrap penguin is between 1.5 and 2.4 mm shallower than the Adelie penguin, given the other variables in the model.

We’d use the RCode: