ai key terms

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/231

There's no tags or description

Looks like no tags are added yet.

Last updated 2:24 PM on 4/29/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

232 Terms

New cards

what is the definition of AI?

the area of cs which studies rational agents

New cards

what problems can AI solve?

object detection, travelling salesman problem (TSP), search problems

New cards

what are machine learning problems?

problems that require a model to be built automatically from data

New cards

what is supervised learning?

learning with a teacher

New cards

what is a teacher for supervised learning?

expected output, label, classes etc.

New cards

what does supervised learning solve?

classification and regression problems

New cards

what is a classification problem?

predict categorical class labels

New cards

what are regression problems?

a prediction of a real value

New cards

what is unsupervised learning?

learning without a teacher to find hidden structures/insights in data

New cards

what is reinforcement learning?

learning with (delayed) feedback/reward, learning with a series of action

New cards

what is the task of formulated supervised learning?

given some input x, predict an appropriate output y

New cards

what is the goal of formulated supervised learning?

a function f such that f(x) = y

New cards

what is training data?

examples of input-output pairs used in supervised learning

New cards

what is training/modelling in supervised learning?

where supervised learning helps to find a good f (function that gives appropriate outputs with given inputs)

New cards

what is a prediction in supervised learning?

given an input x, predicting its output, y

New cards

what are other terms for input?

attribute
feature
independent variable

New cards

what are other terms for output?

target, response, dependent variable

New cards

what are other terms for function?

hypothesis, predictor

New cards

what is overfitting?

when the training data is fitted ‘too well’
the model learns every irrelevant detail (noise) in a training data set

New cards

what is the danger of overfitting?

the model will not work well on new data,

New cards

what is linear regression?

a ML algorithm for regression problems

New cards

what is gradient descent?

a general strategy to minimise cost functions

New cards

what does regression mean?

learning a function that captures the trend between input and output, where the output is a continuous value, to predict target values for new inputs

New cards

what is the general rule for univariate linear regression?

find the best line that captures the trend in data

New cards

what is a loss function?

criterion hat measures how well a ML mode’s predictions align with actual outcomes

New cards

how does loss function work?

quantifies the error/difference between predicted and true values

New cards

what is the loss function also known as?

the g-function

New cards

what is the vector of partial derivatives called?

the gradient vector

New cards

how can linear regression be defined?

a linear and parametric model for regression problems

New cards

what is logistic regression?

a linear and parametric model for classification problems

New cards

what are K-nearest neighbours?

a non-parametric model that can be used for both classification and regression problems

New cards

what is a parametric model?

model that summarises data with a finite set of parameters by making assumptions on data distributions

New cards

what is an example of a parametric model?

neural networks

New cards

what is a non-parametric model?

model that cannot be characterised by a bounded set of parameters and makes no assumptions on data distribution

New cards

what are some examples of non-parametric models?

instance based learning that generate hypothesis using training examples, like kNN, decision trees

New cards

what are the three steps for logistic regression?

model formulation
cost function
learning algorithm by gradient descent

New cards

why use a cost function for logistic regression?

tells you which parameters are better/worse

New cards

why use a learning algorithm by gradient descent for logistic regression?

helps you find the best parameters to minimise the cost function

New cards

what is a sigmoid function and what is it used for?

a function that produces an S-shaped curve by mapping any real-valued number into a value between 0 and 1
used in logistic regression and neural networks to model probabilities and make binary classifications

New cards

what is a composite function?

a linear function (inner) embedded in the sigmoid function (outer)

New cards

what is the decision boundary in a sigmoid function?

the set of all possible inputs where a sigmoid function outputs exactly 0.5

New cards

what is a classification boundary in a sigmoid function?

the set of samples that give you a half chance of output

New cards

why does the sigmoid function require a new cost function?

MSE (mean squared error) becomes non-convex, meaning there is a bounded sigmoid between output (0,1). gradient descent does not work well on non-convex functions

New cards

how is a cost function calculated on a sigmoid function?

using cross-entropy loss, measuring the difference between the predicted probabilities and the actual class outcomes

New cards

what is an advantage of using a new cost function on a sigmoid function?

the loss is convex, so it can be minimised easily

New cards

what are the basic steps of logistic regression?

given training data, fit the model, by minimising the cross-entropy cost function

New cards

what are some extensions of logistic regression?

non-linear logistic regression, multiclass logistic regression

New cards

what is non-linear logistic regression?

instead of a linear function inside the expression in the sigmoid, we can use a polynomial function of the input attributes

New cards

what is multiclass logistic regression?

uses a multivalued version of sigmoid

New cards

what is the ‘no free lunch’ (NFL) theorem?

states that no optimisation or learning algorithm is universally the best-performing algorithm for all problems

New cards

what is the implication of the NFL theorem?

if learner A1 is better than learner A2 for a task, f, then there is another task g for which learner A2 is better than learner A1

New cards

what does kNN stand for?

k-Nearest Neighbours

New cards

what type of algorithm is kNN?

a non-parametric, instance-based model

New cards

what does it mean for an algorithm to be non-parametric?

no assumptions are made about the functional form of the model, complexity grows with the data set, allowing them to model complex, nonlinear relationships

New cards

what does it mean for an algorithm to be instance-based?

the prediction is based on a comparison of a new point with data points in the training set, rather than a model

New cards

why might kNN be called a ‘lazy’ algorithm?

there is no explicit training step, and it defers all the computation until prediction

New cards

what can kNN be used for?

both regression and classification problems

New cards

how does kNN work generally?

(instead of approximating a model function f(x) globally) kNN approximates the label of a new point based on its nearest neighbours in training data, where k = the radius of a circle that is used to include data, and k can be increased to include more data/decreased to include less data.

New cards

when is hamming distance used?

for discrete/categorical values (ie. {rainy, sunny}

New cards

what are the inputs for a kNN algorithm?

neighbour size k > 0, distance metric D, training set, a new unlabelled data

New cards

what steps are in the kNN algorithm?

for each example in the training set, calculate the distance metric from the new unlabelled data (x^j), and select k training examples closest to the unlabelled data (x^j)

New cards

what are the dangers of overfitting?

has worse generalisation performance on data

New cards

what are the dangers of underfitting?

has poor generalisation performance on data

New cards

what impacts the complexity of the final model when using kNN?

the value for k, as k decides how many samples we use to label the new example

New cards

when using kNN, what happens when k is small?

there is a small neighbourhood, high complexity, and a risk of overfitting

New cards

when using kNN, what happens when k is large?

there is a large neighbourhood, low complexity, and a risk of underfitting

New cards

what value for k do practicians often use in kNN?

often choose k between 3-15, or k < \sqrt N (where N is the number of training examples)

New cards

when is learning bias is embedded in kNN?

when attributes have different ranges, the attribute with the larger range is treated as more important by the kNN algorithm
potentially impacting performance if you do not want to treat attributes differently

New cards

what is normalisation in kNN?

linearly scaling the range of each attribute, by using the formula

New cards

what is standardisation is kNN?

linearly scaling each dimension to have 0 mean and variance 1 (by computing mean and variance)

New cards

what is the kNN algorithm with normalisation and standardisation?

normalise and standardise the unlabelled data (x^j).
for each example in the training set, normalise and standardise the example and calculate its distance from the unlabelled data.
select training examples closest to the unlabelled data.
return the plurality vote of labels from the k examples (classification) or the average/median of the y values of the k examples (regression)

New cards

what are the advantages of kNN?

non-parametric, instance-based, lazy algorithm
easy to implement and interpret
it can approximate complex functions so it has very good functions

New cards

what are the disadvantages of kNN?

need to specify the distance metrics and pre-define k value
it has to store all training data (large memory space) and calculate distance of each training example to the new example
it can be sensitive to noise, especially when k is small
performance is degraded greatly as data dimension increases (curse of dimensionality)

New cards

what is the curse of dimensionality in kNN?

as volume grows larger, the neighbours become further apart, the prediction becomes less accurate. distances are less meaningful in high dimensions

New cards

what are hyperparameters?

higher-level free parameters

New cards

what is the depth in a neural network?

the number of hidden layers

New cards

what is the width in a neural network?

the number of hidden neurons in a hidden layout

New cards

what is the activation function in a neural network?

choice of non-linearity in non-input nodes

New cards

what is a regularisation parameter in a neural network?

a way to trade off simplicity vs fit to the data

New cards

how is a predictor obtained?

by training the free parameters of the considered model, using the available annotated data

New cards

why evaluate predictors?

serves to estimate its future performance, before deploying it in the real world

New cards

how are predictors evaluated?

the available annotated data is split randomly into a training set, used to estimate the free parameters, and a test set, used to evaluate the performance of a trained predictor before deploying it

New cards

what methods can be used to evaluate models for model choice?

holdout validation, cross-validation, leave-one-out validation

New cards

what is the method for holdout validation?

randomly choose 30% of data to form a validation set
remaining data forms the training set
train your model on the training set
estimate the test performance on the validation set
choose the model with the lowest validation error
re-train with chosen model on joined training and validation to obtain predictor
estimate future performance of the obtained predictor on test set
deploy the predictor

New cards

how do you estimate the test performance on the validation set when using holdout validation?

if regression, compute the cost function (MSE) on the examples of the validation set instead of the training set. if classification, compute the 0-1 error metric (not cross-entropy cost!)

New cards

what is the method for k-fold cross-validation?

split the training set randomly into k (equal-sized) disjoint sets
use k-1 of those together for training
use the remaining one for validation
permute the k sets and repeat k times
average the performance on k validation sets

New cards

what is the last step in k-fold cross-validation?

repeating for other models

choose the model with the smallest average 3-fold cross validation error
retrain with chosen model on joined training and validation to obtain the predictor
estimate future performance of the obtained predictor on test set
deploy the predictor in real world

New cards

what is the method for leave-one-out validation?

leave out a single example for validation, and train on all the rest of the annotated data
for a total N examples, we repeat this N times, each time leaving out a single example
take the average of the validation errors as measured on the left-out points
same as the N-fold cross-validation where N is the number of labelled points

New cards

what are the advantages of holdout validation?

it is the computationally cheapest

New cards

what are the advantages of 3-fold validation?

slightly more reliable than holdout

New cards

what are the advantages of 10-fold validation?

only wastes 10%, fairly reliable

New cards

what are the advantages of leave-one-out validation?

doesn’t waste data

New cards

what are the disadvantages of holdout validation?

most unreliable if sample size is not large enough

New cards

what are the disadvantages of 3-fold validation?

wastes 1/3 of annotated data, computationally 3-times as expensive as holdout

New cards

what are the disadvantages of 10-fold validation?

wastes 10% of annotated data, computationally 10-times as expensive as holdout

New cards

what are the disadvantages of leave-one-out validation?

computationally most expensive

New cards

in supervised learning, what is a labelled observation?

where each observation is a tuple (x,y) of feature vector x and output label y, which are related according to an unknown function f(x) = y

New cards

what happens during training a supervised learning model?

the labelled observations are used to learn the relationship between x (input) and y (output)

New cards

what is the goal of supervised learning?

ensure that the learned model h(x) accurately predicts the output label of a previously unseen, test feature input

100

New cards

what are labels in supervised learning?

the ‘teacher’ during training, the ‘validator’ of results during testing