Data analysis

0.0(0)
studied byStudied by 2 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/132

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

133 Terms

1
New cards

Observed variable , Response variable, Dependent variable, Explained variable, Outcome variable

The variable that we aim to model or predict, typically denoted by y. It depends on the input variables.

2
New cards

Fit parameters, Regression coefficients

The unknown parameters of the model (usually denoted by Greek letters like theta) that are estimated from the data during model fitting.

3
New cards

Explanatory variable, Regressor, Independent variable, Covariate, Predictor

They are the inputs of the model, typically denoted by x.

Variables that are used to explain or predict the observed variable.

4
New cards

Noise, Error term, Disturbance term

It represents random fluctuations or measurement errors, usually denoted epsilon.

The part of the variation in the observed variable that is not explained by the model.

5
New cards

Design matrix, Predictor variable matrix

A matrix X that contains all the predictor values for the observations, structured so that each row corresponds to one observation and each column to one predictor.

6
New cards

linear ordinary least-squares (OLS)

Statistical method used to estimate the parameters of a linear regression model by minimizing the residual sum of squares (RSS) between observed responses and model predictions. All observations are treated equally; there is no weighting

7
New cards

Residual sum of squares (RSS) , Objective function, Chi² function, loss function, cost function

A function that quantifies the discrepancy between the observed data and the model predictions.

8
New cards

normal equations

They provide a direct (analytical) way to compute the best-fit parameters for a linear model.

9
New cards

R-style formulae

a concise and expressive syntax used to specify statistical models, originally from the R programming language

f.e. y ~ x₁ + x₂

so y= θ₀ + θ₁x₁ + θ₂x₂ + ε

10
New cards

Homoscedascity

Refers to the behavior of the variance of the error terms in a regression model.

The error terms have constant variance across all levels of the explanatory variables.

This is a core assumption of Ordinary Least-Squares (OLS) regression.

11
New cards

heteroscedascity

Refer to the behavior of the variance of the error terms (or residuals) in a regression model.

The error variance varies with the level of one or more explanatory variables.

Plot residuals vs. fitted values; a fan or funnel shape suggests heteroscedasticity.

12
New cards

expected value

The average value when repeating the experiment many many times

13
New cards

Residuals

Residuals are the differences between the observed values and the predicted values from the model.

Small residuals → good fit; large residuals → poor fit or model misspecification.

14
New cards

Predicted responses

These are the model’s best estimates of the observed variable under the assumed model.

Predicted responses are the fitted values.

15
New cards

t-value

The number of standard errors the fit parameter is away from zero.

A large absolute value of t suggests that the corresponding variable is statistically significant.

(A small t-value (close to 0) suggests that the coefficient may be zero, and thus the variable might not contribute meaningfully to the model.)

16
New cards

confidence interval

A confidence interval for a parameter (e.g. θ) is a range of values, derived from the data, that is likely to contain the true value of the parameter with a specified probability, assuming the model and its assumptions are correct.

For example, a 95% confidence interval for θ means:

“If we repeated the experiment many times, and each time constructed a 95% confidence interval, then about 95% of those intervals would contain the true value of θ.”

The more data you gather, the more confident you are in the estimated fit parameters

17
New cards

confidence level

The confidence level is the probability that the confidence interval procedure will produce an interval that contains the true parameter value. It is typically expressed as a percentage:

  • 95% confidence level → 5% of intervals may not contain the true value.

This level defines the width of the confidence interval: higher confidence → wider interval.

Note: A 95% confidence level corresponds to a split in two equal parts: 2.5% at either side of the interval .

18
New cards

joint confidence ellipsoid

It shows where the true values of several fit parameters are likely to lie together, taking into account their uncertainties and how they are correlated. It's like a multi-parameter version of a confidence interval.

19
New cards

Coefficient of determination (R²)

Is a statistical measure used to assess the goodness of fit of a regression model. It quantifies how much of the variance in the observed data is explained by the model.

R²=1: Perfect fit

An R² of 0.85 means that 85% of the variation in the observed variable is explained by the model.

20
New cards

Adjusted coefficient of determination

It tells how well the model explains the data, but also corrects for the number of predictors used. It only increases if adding a new variable actually improves the model. Don’t use it for model selection.

21
New cards

mean response, fitted value of the response, expected value of the response

The value you'd expect if you repeated the measurement many times under the same conditions.

22
New cards

t-multiplier

It is a value from the Student’s t-distribution used to construct confidence intervals.

α/2 quantile of the student’s t-distribution with N-K degrees of freedom

N= number of observations

K= number of fit parameters

23
New cards

multicollinearity

When one predictor can be written as a lineair combination of the other.

24
New cards

unidentifiability

It happens when the determinant is zero, so it’s impossible to uniquely determine the values of some fit parameters.

25
New cards

ill-conditioning

How sensitive the solution of a system of equations is to small changes in the input data. Thus the situation where the design matrix X of a regression model is numerically unstable

26
New cards

condition number

A numerical indicator that measures how ill-conditioned a matrix is. That is, how sensitive the solution of a system of equations is to small changes in the input data.

27
New cards

recentering

Means subtracting the mean from the responses and design matrix. Is applied when the values of one regressor is order of magnitude larger than those of another regressor.

28
New cards

unit-scaling or standardizing

means adjusting variables so they all have similar ranges, by subtracting the mean and dividing by the standard deviation for the responses and design matrix.

29
New cards

High-leverage points

Outliers in the x-direction, that can strongly influence the regression fit. They may or may not be outliers in y.

30
New cards

CHAPTER 2

31
New cards

consistent estimator

Is an estimated parameter that gets closer to the true parameter value as the sample size increases.

32
New cards

Weighted least-squares (WLS)

version of linear regression where each data point is given a weight, so points with more reliable measurements have more influence on the fit.

It's especially useful when the data show heteroscedasticity (non-constant noise levels).

33
New cards

Feasible Weighted Least-Squares

is a two-step version of WLS used when the error variances are unknown.

First, you estimate the error pattern, then use those estimates to assign weights and perform a weighted regression.

34
New cards

CHAPTER 3

35
New cards

nonlinear least-squares

is a fitting method used when the model depends nonlinearly on its parameters.

But unlike linear least-squares, it requires iterative algorithms to find the best-fit parameters.

36
New cards

expectation surface

It shows what the model expects on average, without noise, over the entire input space.

is the surface formed by the mean predicted values (or expected values) of the response variable across different combinations of predictor values.

37
New cards

a 100(1-α)% confidence region

Is the area (or volume) in parameter space where the true values of the parameters are expected to lie with (1−α)×100% confidence, based on the data and model.

For example, a 95% confidence region means we expect the true parameters to be inside that region 95% of the time in repeated experiments.

38
New cards

CHAPTER 5

39
New cards

Least trimmed squares (least trimmed sum of squares)

is a robust regression method that fits a model by minimizing the sum of the smallest squared residuals, ignoring the largest ones

(alters cost function so it can better deal with outliers)

40
New cards

least trimmed sum of absolute deviations

is a robust regression method that fits a model by minimizing the sum of the smallest absolute residuals, rather than squared ones.

41
New cards

Least quantile regression

Is a robust regression method that minimizes the median of the squared residuals, instead of minimizing the mean

(alters cost function so it can better deal with outliers)

42
New cards

Huber’s method

Rather than computing the square of the residual, we could use another function of the residual which would be less sensitive to outliers

(alters cost function so it can better deal with outliers)

43
New cards

CHAPTER 6

44
New cards

AIC score

Is a model selection metric used to compare statistical models. It balances model fit with model complexity, penalizing models with more parameters. It is used for comparing models on the same dataset.Lower AIC = better model (relative to others).

45
New cards

Likelihood

the probability of observing the data given specific parameter values

46
New cards

“we assume that observations are“ independent and identically distributed (i.i.d.).

Means that all data points are drawn from the same probability distribution and are statistically independent of each other.

47
New cards

method of maximum-likelihood

estimates model parameters by finding the values that maximize the likelihood

48
New cards

Akaike delta-score

difference in AIC values between a given model and the best (lowest-AIC) model.

49
New cards

Akaike weight

represent the relative likelihood of each model being the best among a set, given the data.

<p>represent the <strong>relative likelihood</strong> of each model being the best among a set, given the data.</p>
50
New cards

Cross-validation

is a technique to assess how well a model generalizes to new, unseen data. It works by splitting the data into parts: the model is trained on some parts and tested on the remaining part, repeating this process multiple times to get a reliable estimate of prediction performance.

51
New cards

Test error (also called generalization error)

is the error a model makes on new, unseen data. It reflects how well the model generalizes beyond the training data

52
New cards

K-fold cross-validation

is a method to estimate a model’s test error by splitting the data into K equal parts (folds). The model is trained on K−1 folds and tested on the remaining fold, repeating this K times so every fold is used once for testing. The average test error across all folds gives a reliable performance estimate.

53
New cards

Leave-one-out cross-validation

is a special case of K-fold cross-validation where K equals the number of data points. Each time, the model is trained on all data except one point, which is used for testing. This is repeated for every point, giving a nearly unbiased estimate of test error, but at high computational cost.

54
New cards

CHAPTER 7

55
New cards

ridge regression / Thikonov regularization

type of linear regression that adds a penalty term to the loss function to shrink the regression coefficients. This helps prevent overfitting and reduces the impact of multicollinearity by discouraging large coefficient values.

56
New cards

Regularization term or penalty term (λθ’θ)

is an extra part added to a model’s loss function to penalize large or complex parameter values. It helps prevent overfitting by encouraging simpler models.

57
New cards

λ penalty parameter / regularization parameter

controls how strongly the regularization term affects the model. A larger λ puts more penalty on large coefficients, leading to a simpler model, while a smaller λ keeps the fit closer to ordinary least squares.

58
New cards

Mean Squared Error

calculates the average of the squared differences between predicted and actual values.

<p>calculates the <strong>average of the squared differences</strong> between predicted and actual values.</p>
59
New cards

Ridge trace

A ridge trace is a plot that shows how the regression coefficients change as the ridge penalty λ\lambdaλ increases.

60
New cards

LASSO (Least Absolute Shrinkage and Selection Operator)

a regression method that adds a penalty on the absolute values of the coefficients. It not only shrinks coefficients like ridge regression but can also set some to zero, effectively performing variable selection.

61
New cards

CHAPTER 8

62
New cards

non-parametric resampling

method that generates new datasets by randomly sampling from the observed data, without assuming any underlying distribution.

63
New cards

non-parametric bootstrapping

resampling method where you repeatedly draw random samples with replacement from the original dataset to create many “new” datasets. (without assuming any specific data distribution.)

64
New cards

pairwise resampling OR random-x sampling

a type of bootstrap where you resample entire (x, y) pairs from the dataset.

This method preserves the relationship between inputs and outputs and is commonly used when both are considered random.

65
New cards

residual sampling or fixed x-sampling

is a non-parametric resampling method for regression where the predictor values are held fixed, and synthetic response values are generated by adding resampled residuals to the model's fitted values.

66
New cards

parametric bootstrap sampling

method where new datasets are generated by simulating from a specified probability model using the fitted parameters from the original data.

67
New cards

Percentile bootstrap interval

Build a confidence interval by taking the lower and upper percentiles from the sorted bootstrap estimates (e.g. 2.5% and 97.5% for a 95% interval).

68
New cards

A pivot

a quantity that has a distribution that does not depend on any unknowns.

69
New cards

bootstrap-t interval

A confidence interval made by standardizing bootstrap estimates using their standard errors, then using the percentiles of these t-like values to build the interval. It adjusts for both bias and variability.

70
New cards

Balanced bootstrap resampling

guarantees that each observation is selected equally often.

71
New cards

CHAPTER 9

72
New cards

Law of bayes

knowt flashcard image
73
New cards
74
New cards

Posterior probability distribution

the probability of the parameters given the observed data

75
New cards

prior

our initial beliefs about the parameters before measuring the data

76
New cards

evidence

the total probability of the data, acting as a normalization constant

77
New cards

CHAPTER 10

78
New cards

conjugate prior and conjugate distributions

A prior is a conjugate prior for a given likelihood function, if the resulting posterior belongs to the same distribution family as the prior (Same distribution but with different parameters).

The prior and the posterior are called conjugate distributions.

79
New cards

hyperparameters

Settings that control the behavior of a model or method but are not learned from the data — they must be set before or tuned during training.

80
New cards
81
New cards

precision

It measures how tightly data or parameter estimates are concentrated — higher precision means lower uncertainty.

<p>It measures how tightly data or parameter estimates are concentrated — <strong>higher precision means lower uncertainty</strong>.</p>
82
New cards

informative priors

If you have concrete information about a fit parameter, other than what the data is telling you, then the prior is the place where to include this information. Such priors are called informative priors.

83
New cards

uninformative priors

Quite often, you will not have any particularly useful prior information. In that case you would like to use a prior that is sufficiently vague so that you’re making only minimal assumptions

84
New cards

location parameter

A location parameter shifts the entire probability distribution left or right without changing its shape.

85
New cards

Scale parameter

Changing the scale parameter of a distribution stretches out the distribution

86
New cards

improper prior

Does not integrate to 1

87
New cards

proper prior

Does integrate to 1

88
New cards

Hyperprior

hyperprior = “a prior on a prior”

they have a probability distribution determined by another prior

89
New cards

CHAPTER 11

90
New cards

curse of dimensionality

If we have e.g. 5 model parameters (few...), and if we choose to evaluate the posterior in a rectangular grid of 100 points (not a lot!) in each of the 5 directions, we would need 1005 = 10’000’000’000 grid points. Unfeasible... This is called the curse of dimensionality

91
New cards

Normalization

Probability adds up to 1

<p>Probability adds up to 1</p>
92
New cards

Marginalisation

Marginalisation means summing or integrating out unwanted variables from a joint probability distribution to focus on the ones you care about

<p>Marginalisation means <strong>summing or integrating out</strong> unwanted variables from a joint probability distribution to focus on the ones you care about</p>
93
New cards

Expectation

the mean value of a parameter, given its posterior distribution

<p>the mean value of a parameter, given its posterior distribution</p>
94
New cards

posterior predictive distribution

The distribution of possible future observations, based on the posterior distribution of the model’s parameters. It shows what new data might look like, given what we’ve learned from the current data.

<p>The distribution of possible future observations, based on the <strong>posterior distribution</strong> of the model’s parameters. It shows what new data might look like, given what we’ve learned from the current data.</p>
95
New cards

Monte Carlo method

A computational method to sample an arbitrary probability distribution

96
New cards

cumulative distribution function

The CDF gives the probability that a random variable is less than or equal to a value x

<p>The CDF gives the <strong>probability that a random variable is less than or equal to a value</strong> x</p>
97
New cards

Quantile function or percentile-point function

inverse cumulative function

98
New cards

acceptance-rejection method.

A way to generate random samples from a complex distribution by sampling from a simpler one and accepting or rejecting each sample based on a probability rule.

99
New cards

importance sampling

A technique to estimate expectations by sampling from an easier distribution and reweighting the samples to reflect the target distribution

100
New cards

Markov Chain Monte Carlo

a method that explores the sample space using a Markov chain. • Such a chain walks from one point xn to the next xn+1 in such a way that the chain spends more time in the more important regions (where the density f() is high).

a Markov chain has no memory