Comp Stats Final

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/119

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 11:02 PM on 4/28/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

120 Terms

1
New cards

Tokenization

Mapping of words to numbers

2
New cards

Span

Plane created by a matrix of vectors

3
New cards

Orthogonal

X1^T*X2=0

4
New cards

Full rank

Spans the entire space, will always make it possible to make a perfect prediction

5
New cards

Perfect prediction

To predict a vector of n elements, you need to define enough variables to produce a matrix of rank n

6
New cards

Set

Any collection, group, or conglomerate with no repeated elements

7
New cards

Elements

Members of a set

8
New cards

Disjoint set

sets with no elements in common

9
New cards

Sample space

Set comprising of all possible outcomes associated with an experiment

10
New cards

Events

Subset of the sample space

11
New cards

Sigma algebra

For discrete sets: all of the subsets are the sigma algebras

12
New cards

Probability function

Maps a sigma algebra of a sample to a subset of the reals

13
New cards

Axioms of probability

For an event Pr(A) >= 0, all possible events Pr(all) = 1, if you take a countable collection of pairwise disjoint events in the σ-algebra, the probability of their union equals the sum of their individual probabilities.

14
New cards

Sigma algebra -> probability function

Needs to satisfy the axioms of probability

15
New cards

Random variable

Variables with probability associated

16
New cards

Probability mass function (pmf)

Probability distribution on a variable

17
New cards

Probability density function (pdf)

Does not represent the probability of a specific value, rather the probability that X falls in an interval, probability calculating function

18
New cards

Probability of a value on the pdf

0

19
New cards

Marginal distribution

Probability distribution of a single variable without regard for other variables

20
New cards

Functionals

Maps a function to a scalar

21
New cards

Expectation

Center

22
New cards

Covariance

A measure of linear association between two variables (if they change together). Positive values indicate a positive relationship; negative values indicate a negative relationship

23
New cards

Parameter

Constant (theta) which indexes a probability model belonging to a family of models

24
New cards

Parameterized probability model

Probability model for the random variable we are interested in

25
New cards

Bernoulli

family of probability models indexed by theta where can be between 0 and 1

26
New cards

Maximum likelihood estimation

function that takes in a sample and outputs a value that is our estimate of the true model parameter

27
New cards

Likelihood

function with the form of a probability function, consider to be a function of the parameter for a fixed sample

28
New cards

Form of likelihood

sampling distribution (probability distribution) of the iid sample

  1. Parameter value inputs, samples observed are the parameters

  2. Likelihood does not operate as probability function

  3. For continuous cases, likelihood is the likelihood of the point

29
New cards

iid

Independent and identically distributed random variables.

30
New cards

Statistic

A function on a sample

31
New cards

P-value

The probability of obtaining a value of a statistic or more extreme conditional on the null hypothesis being true

32
New cards

Hypothesis

Assumption about a parameter

33
New cards

Two tailed

Usure about which direction that the extreme is in

34
New cards

One tailed

Wrong in one direction of the extreme

35
New cards

Alpha

Threshold value, p-vaues below this value are rejected, above cannot be rejected

36
New cards

H0 is true, cannot reject H0

1-alpha, correct

37
New cards

H0 is true, reject H0

Alpha, type I error

38
New cards

H0 is false, cannot reject H0

Beta, type I error

39
New cards

H0 is false, reject H0

1-beta, power

40
New cards

Decrease alpha

Higher beta, type II error

41
New cards

Increasing alpha results in

High type I error, low type II error

42
New cards

Power increases with

Increased sample size, increase alpha

43
New cards

Power definition

How far the true value of the parameter is from the H0

44
New cards

Alternative hypothesis

Set of values for where we suspect our true parameter value with fall if our H0 is incorrect

45
New cards

Generalized linear models

Probability of Y|X is in the exponential family of distributions, link functions (inverses), error is a function of only Xbeta

46
New cards

Link function

Function that acts of the expected values of Y given X, monotonic

47
New cards

Monotonic

Can define an inverse function

48
New cards

Support vector machines

Find a liner function that provides maximum separation of the Y or the maximum margin between the two groups

49
New cards

Gradient descent algorithm

Moves the value by the gradient of the function, weighing a value at each iteration

50
New cards

Gradient

Steepest descent on the function

51
New cards

Epochs

Number of times the algorithm will run

52
New cards

Conjugate prior

a probability distribution that, when combined with a specific likelihood function using Bayes' theorem, results in a posterior distribution belonging to the same parametric family as the prior

53
New cards

Informative prior

strong predefined knowledge about a parameter before observing any data.

54
New cards

Improper Prior

cannot define a uniform probability, infinite interval

55
New cards

Posterior mean

Expected value of a parameter derived from its posterior distribution

56
New cards

marginal distribution of posterior

the probability distribution of a subset of parameters in a Bayesian model, calculated by integrating the full joint posterior distribution over all other "nuisance" parameters

57
New cards

Stochastic process

Collection of random vectors with defined conditional relationships, often indexed by an ordered set t

58
New cards

Loss function

Puts a number on each possible outcome to quantify how bad the outcome would be given that it is not the true value

59
New cards

Risk function

Expected value of the loss function (integrate over the probability of each of the possible values the decision function could map to)

60
New cards

Maximum a posteriori (MAP) estimation

Selects the value of the parameter that maximizes the posterior NOT BAYESIAN

61
New cards

Bayesian estimator

Selects the posterior mean

62
New cards

Bayes classifier

predicts class labels by calculating the highest posterior probability for a given data point, calculates posterior probability for each class given features choosing the class with the highest probability

63
New cards

Bayesian network

Assumes that among a set of variables, some of the variables are made conditionally independent by conditioning on other variables while others cannot be made independent no matter what variables they are conditioned on

64
New cards

Unidentified

Multiple models fit the data the same

65
New cards

penalized regression

form of regularization (aimed at preventing overfitting)

66
New cards

Unlearnable system

generally if y and x are independent

67
New cards

No free lunch theorem

No optimal mapping that will cover every case

68
New cards

Bias-variance trade off

balancing error from overly simplistic assumptions (bias) and high sensitivity to training data (variance)

69
New cards

Acceptable error

System may not admit a predictive model that will be useful in practice given this

70
New cards

Bias

calculated as the squared difference between the expected prediction and true value

71
New cards

Variance

calculated as the expected squared deviation of a model's prediction from its own average

72
New cards

Error

variation in the data that is expected to be conditionally independent of the model

73
New cards

Kernel function

Symmetric function that takes in a pair of observations vectors and outputs a number

74
New cards

Ensemble method

Run multiple times with shallow trees and select many trees that perform well and combine them for the final decision task

75
New cards

Bagging

Bootstrapping of the sample, using the algorithm to identify a new tree for each sample and create the learner

76
New cards

Bootstrapping

Produce a new sample by sampling with replacement

77
New cards

Boosting

Fit an original tree, calculate residual error, and fit a new tree to the error and repeat

78
New cards

Neural network

Non-linear method, defines large numbers of non-linear vectors with some selection ability given data while combining information across many observed variables

79
New cards

In NN observations are

rows

80
New cards

In NN variables are

rows

81
New cards

First step of NN

Linear transformation of variables

82
New cards

Second step of NN

Application of activation function

83
New cards

Third step of NN

Linear transformation of outputs

84
New cards

Universal approximation theorem

A two layer neural network as the capacity of approximate any continuous (or discrete) function from an input set to arbitrary accuracy as long s the functions are non-linear

85
New cards

Objective function

Min or max of function indicates if the model is fit to the data

86
New cards

Back propagation algorithm

Derivation of gradient descent

87
New cards

Acceptable error

Depends on the task, advantage of having a lot of acceptable answers

88
New cards

Forward propagation

Calculates y predicted to get some loss knowing what y actual is

89
New cards

Tensors

Mathematical object that contains multilinear relationships between sets, used for transformations

90
New cards

Arithmetic logic unit

Mathematical/logical units

91
New cards

Control unit

Directs operationsC

92
New cards

Cache

Where data is stored temporarily

93
New cards

NN used GPUs

Parallel processing with the distribution of the NN, array structure

94
New cards

SoftMax

Generates probabilities from untransformed data, emphasizes high values and normalizes so that the sum of the vector is 1

95
New cards

Logits

Raw, unnormalized, real-valued scores produced by the final layer of a neural network

96
New cards

Cross entropy

loss calculation that takes into the account measuring the difference between two probability distributions, typically the true labels and predicted probabilities

97
New cards

ADAM

Algorithm to dictate step size: take bigger steps when slope is larger, smaller when closer

98
New cards

Feedforward

NN that inherits from previous layers

99
New cards

Assumptions for NN construction

  1. Irreducible error is low and the acceptable error is high

  2. Training data is representative of the entire possible dataset that could be observed

  3. There is a lower dimensional set that can be learned/inferred from measured variables

  4. There is a prior understanding of how measured variables are related to features of value for modeling outcomes

100
New cards

Deep learning

More than one layer in NN