Comp Stats Final

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/119

There's no tags or description

Looks like no tags are added yet.

Last updated 11:02 PM on 4/28/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

120 Terms

New cards

Tokenization

Mapping of words to numbers

New cards

Span

Plane created by a matrix of vectors

New cards

Orthogonal

X1^T*X2=0

New cards

Full rank

Spans the entire space, will always make it possible to make a perfect prediction

New cards

Perfect prediction

To predict a vector of n elements, you need to define enough variables to produce a matrix of rank n

New cards

Set

Any collection, group, or conglomerate with no repeated elements

New cards

Elements

Members of a set

New cards

Disjoint set

sets with no elements in common

New cards

Sample space

Set comprising of all possible outcomes associated with an experiment

New cards

Events

Subset of the sample space

New cards

Sigma algebra

For discrete sets: all of the subsets are the sigma algebras

New cards

Probability function

Maps a sigma algebra of a sample to a subset of the reals

New cards

Axioms of probability

For an event Pr(A) >= 0, all possible events Pr(all) = 1, if you take a countable collection of pairwise disjoint events in the σ-algebra, the probability of their union equals the sum of their individual probabilities.

New cards

Sigma algebra -> probability function

Needs to satisfy the axioms of probability

New cards

Random variable

Variables with probability associated

New cards

Probability mass function (pmf)

Probability distribution on a variable

New cards

Probability density function (pdf)

Does not represent the probability of a specific value, rather the probability that X falls in an interval, probability calculating function

New cards

Probability of a value on the pdf

New cards

Marginal distribution

Probability distribution of a single variable without regard for other variables

New cards

Functionals

Maps a function to a scalar

New cards

Expectation

Center

New cards

Covariance

A measure of linear association between two variables (if they change together). Positive values indicate a positive relationship; negative values indicate a negative relationship

New cards

Parameter

Constant (theta) which indexes a probability model belonging to a family of models

New cards

Parameterized probability model

Probability model for the random variable we are interested in

New cards

Bernoulli

family of probability models indexed by theta where can be between 0 and 1

New cards

Maximum likelihood estimation

function that takes in a sample and outputs a value that is our estimate of the true model parameter

New cards

Likelihood

function with the form of a probability function, consider to be a function of the parameter for a fixed sample

New cards

Form of likelihood

sampling distribution (probability distribution) of the iid sample

Parameter value inputs, samples observed are the parameters
Likelihood does not operate as probability function
For continuous cases, likelihood is the likelihood of the point

New cards

iid

Independent and identically distributed random variables.

New cards

Statistic

A function on a sample

New cards

P-value

The probability of obtaining a value of a statistic or more extreme conditional on the null hypothesis being true

New cards

Hypothesis

Assumption about a parameter

New cards

Two tailed

Usure about which direction that the extreme is in

New cards

One tailed

Wrong in one direction of the extreme

New cards

Alpha

Threshold value, p-vaues below this value are rejected, above cannot be rejected

New cards

H0 is true, cannot reject H0

1-alpha, correct

New cards

H0 is true, reject H0

Alpha, type I error

New cards

H0 is false, cannot reject H0

Beta, type I error

New cards

H0 is false, reject H0

1-beta, power

New cards

Decrease alpha

Higher beta, type II error

New cards

Increasing alpha results in

High type I error, low type II error

New cards

Power increases with

Increased sample size, increase alpha

New cards

Power definition

How far the true value of the parameter is from the H0

New cards

Alternative hypothesis

Set of values for where we suspect our true parameter value with fall if our H0 is incorrect

New cards

Generalized linear models

Probability of Y|X is in the exponential family of distributions, link functions (inverses), error is a function of only Xbeta

New cards

Link function

Function that acts of the expected values of Y given X, monotonic

New cards

Monotonic

Can define an inverse function

New cards

Support vector machines

Find a liner function that provides maximum separation of the Y or the maximum margin between the two groups

New cards

Gradient descent algorithm

Moves the value by the gradient of the function, weighing a value at each iteration

New cards

Gradient

Steepest descent on the function

New cards

Epochs

Number of times the algorithm will run

New cards

Conjugate prior

a probability distribution that, when combined with a specific likelihood function using Bayes' theorem, results in a posterior distribution belonging to the same parametric family as the prior

New cards

Informative prior

strong predefined knowledge about a parameter before observing any data.

New cards

Improper Prior

cannot define a uniform probability, infinite interval

New cards

Posterior mean

Expected value of a parameter derived from its posterior distribution

New cards

marginal distribution of posterior

the probability distribution of a subset of parameters in a Bayesian model, calculated by integrating the full joint posterior distribution over all other "nuisance" parameters

New cards

Stochastic process

Collection of random vectors with defined conditional relationships, often indexed by an ordered set t

New cards

Loss function

Puts a number on each possible outcome to quantify how bad the outcome would be given that it is not the true value

New cards

Risk function

Expected value of the loss function (integrate over the probability of each of the possible values the decision function could map to)

New cards

Maximum a posteriori (MAP) estimation

Selects the value of the parameter that maximizes the posterior NOT BAYESIAN

New cards

Bayesian estimator

Selects the posterior mean

New cards

Bayes classifier

predicts class labels by calculating the highest posterior probability for a given data point, calculates posterior probability for each class given features choosing the class with the highest probability

New cards

Bayesian network

Assumes that among a set of variables, some of the variables are made conditionally independent by conditioning on other variables while others cannot be made independent no matter what variables they are conditioned on

New cards

Unidentified

Multiple models fit the data the same

New cards

penalized regression

form of regularization (aimed at preventing overfitting)

New cards

Unlearnable system

generally if y and x are independent

New cards

No free lunch theorem

No optimal mapping that will cover every case

New cards

Bias-variance trade off

balancing error from overly simplistic assumptions (bias) and high sensitivity to training data (variance)

New cards

Acceptable error

System may not admit a predictive model that will be useful in practice given this

New cards

Bias

calculated as the squared difference between the expected prediction and true value

New cards

Variance

calculated as the expected squared deviation of a model's prediction from its own average

New cards

Error

variation in the data that is expected to be conditionally independent of the model

New cards

Kernel function

Symmetric function that takes in a pair of observations vectors and outputs a number

New cards

Ensemble method

Run multiple times with shallow trees and select many trees that perform well and combine them for the final decision task

New cards

Bagging

Bootstrapping of the sample, using the algorithm to identify a new tree for each sample and create the learner

New cards

Bootstrapping

Produce a new sample by sampling with replacement

New cards

Boosting

Fit an original tree, calculate residual error, and fit a new tree to the error and repeat

New cards

Neural network

Non-linear method, defines large numbers of non-linear vectors with some selection ability given data while combining information across many observed variables

New cards

In NN observations are

rows

New cards

In NN variables are

rows

New cards

First step of NN

Linear transformation of variables

New cards

Second step of NN

Application of activation function

New cards

Third step of NN

Linear transformation of outputs

New cards

Universal approximation theorem

A two layer neural network as the capacity of approximate any continuous (or discrete) function from an input set to arbitrary accuracy as long s the functions are non-linear

New cards

Objective function

Min or max of function indicates if the model is fit to the data

New cards

Back propagation algorithm

Derivation of gradient descent

New cards

Acceptable error

Depends on the task, advantage of having a lot of acceptable answers

New cards

Forward propagation

Calculates y predicted to get some loss knowing what y actual is

New cards

Tensors

Mathematical object that contains multilinear relationships between sets, used for transformations

New cards

Arithmetic logic unit

Mathematical/logical units

New cards

Control unit

Directs operationsC

New cards

Cache

Where data is stored temporarily

New cards

NN used GPUs

Parallel processing with the distribution of the NN, array structure

New cards

SoftMax

Generates probabilities from untransformed data, emphasizes high values and normalizes so that the sum of the vector is 1

New cards

Logits

Raw, unnormalized, real-valued scores produced by the final layer of a neural network

New cards

Cross entropy

loss calculation that takes into the account measuring the difference between two probability distributions, typically the true labels and predicted probabilities

New cards

ADAM

Algorithm to dictate step size: take bigger steps when slope is larger, smaller when closer

New cards

Feedforward

NN that inherits from previous layers

New cards

Assumptions for NN construction

Irreducible error is low and the acceptable error is high
Training data is representative of the entire possible dataset that could be observed
There is a lower dimensional set that can be learned/inferred from measured variables
There is a prior understanding of how measured variables are related to features of value for modeling outcomes

100

New cards

Deep learning

More than one layer in NN