ai key terms

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/231

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 2:24 PM on 4/29/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

232 Terms

1
New cards

what is the definition of AI?

the area of cs which studies rational agents

2
New cards

what problems can AI solve?

object detection, travelling salesman problem (TSP), search problems

3
New cards

what are machine learning problems?

problems that require a model to be built automatically from data

4
New cards

what is supervised learning?

learning with a teacher

5
New cards

what is a teacher for supervised learning?

expected output, label, classes etc.

6
New cards

what does supervised learning solve?

classification and regression problems

7
New cards

what is a classification problem?

predict categorical class labels

8
New cards

what are regression problems?

a prediction of a real value

9
New cards

what is unsupervised learning?

learning without a teacher to find hidden structures/insights in data

10
New cards

what is reinforcement learning?

learning with (delayed) feedback/reward, learning with a series of action

11
New cards

what is the task of formulated supervised learning?

given some input x, predict an appropriate output y

12
New cards

what is the goal of formulated supervised learning?

a function f such that f(x) = y

13
New cards

what is training data?

  • examples of input-output pairs used in supervised learning

14
New cards

what is training/modelling in supervised learning?

where supervised learning helps to find a good f (function that gives appropriate outputs with given inputs)

15
New cards

what is a prediction in supervised learning?

given an input x, predicting its output, y

16
New cards

what are other terms for input?

  • attribute

  • feature

  • independent variable

17
New cards

what are other terms for output?

target, response, dependent variable

18
New cards

what are other terms for function?

hypothesis, predictor

19
New cards

what is overfitting?

  • when the training data is fitted ‘too well’

  • the model learns every irrelevant detail (noise) in a training data set

20
New cards

what is the danger of overfitting?

the model will not work well on new data,

21
New cards

what is linear regression?

a ML algorithm for regression problems

22
New cards

what is gradient descent?

a general strategy to minimise cost functions

23
New cards

what does regression mean?

learning a function that captures the trend between input and output, where the output is a continuous value, to predict target values for new inputs

24
New cards

what is the general rule for univariate linear regression?

find the best line that captures the trend in data

25
New cards

what is a loss function?

criterion hat measures how well a ML mode’s predictions align with actual outcomes

26
New cards

how does loss function work?

quantifies the error/difference between predicted and true values

27
New cards

what is the loss function also known as?

the g-function

28
New cards

what is the vector of partial derivatives called?

the gradient vector

29
New cards

how can linear regression be defined?

a linear and parametric model for regression problems

30
New cards

what is logistic regression?

  • a linear and parametric model for classification problems

31
New cards

what are K-nearest neighbours?

a non-parametric model that can be used for both classification and regression problems

32
New cards

what is a parametric model?

  • model that summarises data with a finite set of parameters by making assumptions on data distributions

33
New cards

what is an example of a parametric model?

neural networks

34
New cards

what is a non-parametric model?

model that cannot be characterised by a bounded set of parameters and makes no assumptions on data distribution

35
New cards

what are some examples of non-parametric models?

instance based learning that generate hypothesis using training examples, like kNN, decision trees

36
New cards

what are the three steps for logistic regression?

  1. model formulation

  2. cost function

  3. learning algorithm by gradient descent

37
New cards

why use a cost function for logistic regression?

tells you which parameters are better/worse

38
New cards

why use a learning algorithm by gradient descent for logistic regression?

helps you find the best parameters to minimise the cost function

39
New cards

what is a sigmoid function and what is it used for?

  • a function that produces an S-shaped curve by mapping any real-valued number into a value between 0 and 1

  • used in logistic regression and neural networks to model probabilities and make binary classifications

40
New cards

what is a composite function?

a linear function (inner) embedded in the sigmoid function (outer)

41
New cards

what is the decision boundary in a sigmoid function?

the set of all possible inputs where a sigmoid function outputs exactly 0.5

42
New cards

what is a classification boundary in a sigmoid function?

the set of samples that give you a half chance of output

43
New cards

why does the sigmoid function require a new cost function?

MSE (mean squared error) becomes non-convex, meaning there is a bounded sigmoid between output (0,1). gradient descent does not work well on non-convex functions

44
New cards

how is a cost function calculated on a sigmoid function?

using cross-entropy loss, measuring the difference between the predicted probabilities and the actual class outcomes

45
New cards

what is an advantage of using a new cost function on a sigmoid function?

the loss is convex, so it can be minimised easily

46
New cards

what are the basic steps of logistic regression?

given training data, fit the model, by minimising the cross-entropy cost function

47
New cards

what are some extensions of logistic regression?

non-linear logistic regression, multiclass logistic regression

48
New cards

what is non-linear logistic regression?

instead of a linear function inside the expression in the sigmoid, we can use a polynomial function of the input attributes

49
New cards

what is multiclass logistic regression?

uses a multivalued version of sigmoid

50
New cards

what is the ‘no free lunch’ (NFL) theorem?

states that no optimisation or learning algorithm is universally the best-performing algorithm for all problems

51
New cards

what is the implication of the NFL theorem?

if learner A1 is better than learner A2 for a task, f, then there is another task g for which learner A2 is better than learner A1

52
New cards

what does kNN stand for?

k-Nearest Neighbours

53
New cards

what type of algorithm is kNN?

a non-parametric, instance-based model

54
New cards

what does it mean for an algorithm to be non-parametric?

no assumptions are made about the functional form of the model, complexity grows with the data set, allowing them to model complex, nonlinear relationships

55
New cards

what does it mean for an algorithm to be instance-based?

the prediction is based on a comparison of a new point with data points in the training set, rather than a model

56
New cards

why might kNN be called a ‘lazy’ algorithm?

there is no explicit training step, and it defers all the computation until prediction

57
New cards

what can kNN be used for?

both regression and classification problems

58
New cards

how does kNN work generally?

(instead of approximating a model function f(x) globally) kNN approximates the label of a new point based on its nearest neighbours in training data, where k = the radius of a circle that is used to include data, and k can be increased to include more data/decreased to include less data.

59
New cards

when is hamming distance used?

for discrete/categorical values (ie. {rainy, sunny}

60
New cards

what are the inputs for a kNN algorithm?

neighbour size k > 0, distance metric D, training set, a new unlabelled data

61
New cards

what steps are in the kNN algorithm?

for each example in the training set, calculate the distance metric from the new unlabelled data (x^j), and select k training examples closest to the unlabelled data (x^j)

62
New cards

what are the dangers of overfitting?

has worse generalisation performance on data

63
New cards

what are the dangers of underfitting?

has poor generalisation performance on data

64
New cards

what impacts the complexity of the final model when using kNN?

the value for k, as k decides how many samples we use to label the new example

65
New cards

when using kNN, what happens when k is small?

there is a small neighbourhood, high complexity, and a risk of overfitting

66
New cards

when using kNN, what happens when k is large?

there is a large neighbourhood, low complexity, and a risk of underfitting

67
New cards

what value for k do practicians often use in kNN?

often choose k between 3-15, or k < \sqrt N (where N is the number of training examples)

68
New cards

when is learning bias is embedded in kNN?

  • when attributes have different ranges, the attribute with the larger range is treated as more important by the kNN algorithm

  • potentially impacting performance if you do not want to treat attributes differently

69
New cards

what is normalisation in kNN?

linearly scaling the range of each attribute, by using the formula

<p>linearly scaling the range of each attribute, by using the formula</p>
70
New cards

what is standardisation is kNN?

linearly scaling each dimension to have 0 mean and variance 1 (by computing mean and variance)

71
New cards

what is the kNN algorithm with normalisation and standardisation?

  • normalise and standardise the unlabelled data (x^j).

  • for each example in the training set, normalise and standardise the example and calculate its distance from the unlabelled data.

  • select training examples closest to the unlabelled data.

  • return the plurality vote of labels from the k examples (classification) or the average/median of the y values of the k examples (regression)

72
New cards

what are the advantages of kNN?

  • non-parametric, instance-based, lazy algorithm

  • easy to implement and interpret

  • it can approximate complex functions so it has very good functions

73
New cards

what are the disadvantages of kNN?

  • need to specify the distance metrics and pre-define k value

  • it has to store all training data (large memory space) and calculate distance of each training example to the new example

  • it can be sensitive to noise, especially when k is small

  • performance is degraded greatly as data dimension increases (curse of dimensionality)

74
New cards

what is the curse of dimensionality in kNN?

as volume grows larger, the neighbours become further apart, the prediction becomes less accurate. distances are less meaningful in high dimensions

75
New cards

what are hyperparameters?

higher-level free parameters

76
New cards

what is the depth in a neural network?

the number of hidden layers

77
New cards

what is the width in a neural network?

the number of hidden neurons in a hidden layout

78
New cards

what is the activation function in a neural network?

choice of non-linearity in non-input nodes

79
New cards

what is a regularisation parameter in a neural network?

a way to trade off simplicity vs fit to the data

80
New cards

how is a predictor obtained?

by training the free parameters of the considered model, using the available annotated data

81
New cards

why evaluate predictors?

serves to estimate its future performance, before deploying it in the real world

82
New cards

how are predictors evaluated?

the available annotated data is split randomly into a training set, used to estimate the free parameters, and a test set, used to evaluate the performance of a trained predictor before deploying it

83
New cards

what methods can be used to evaluate models for model choice?

holdout validation, cross-validation, leave-one-out validation

84
New cards

what is the method for holdout validation?

  • randomly choose 30% of data to form a validation set

  • remaining data forms the training set

  • train your model on the training set

  • estimate the test performance on the validation set

  • choose the model with the lowest validation error

  • re-train with chosen model on joined training and validation to obtain predictor

  • estimate future performance of the obtained predictor on test set

  • deploy the predictor

85
New cards

how do you estimate the test performance on the validation set when using holdout validation?

if regression, compute the cost function (MSE) on the examples of the validation set instead of the training set. if classification, compute the 0-1 error metric (not cross-entropy cost!)

86
New cards

what is the method for k-fold cross-validation?

  • split the training set randomly into k (equal-sized) disjoint sets

  • use k-1 of those together for training

  • use the remaining one for validation

  • permute the k sets and repeat k times

  • average the performance on k validation sets

87
New cards

what is the last step in k-fold cross-validation?

repeating for other models

  • choose the model with the smallest average 3-fold cross validation error

  • retrain with chosen model on joined training and validation to obtain the predictor

  • estimate future performance of the obtained predictor on test set

  • deploy the predictor in real world

88
New cards

what is the method for leave-one-out validation?

  • leave out a single example for validation, and train on all the rest of the annotated data

  • for a total N examples, we repeat this N times, each time leaving out a single example

  • take the average of the validation errors as measured on the left-out points

  • same as the N-fold cross-validation where N is the number of labelled points

89
New cards

what are the advantages of holdout validation?

it is the computationally cheapest

90
New cards

what are the advantages of 3-fold validation?

slightly more reliable than holdout

91
New cards

what are the advantages of 10-fold validation?

only wastes 10%, fairly reliable

92
New cards

what are the advantages of leave-one-out validation?

doesn’t waste data

93
New cards

what are the disadvantages of holdout validation?

most unreliable if sample size is not large enough

94
New cards

what are the disadvantages of 3-fold validation?

wastes 1/3 of annotated data, computationally 3-times as expensive as holdout

95
New cards

what are the disadvantages of 10-fold validation?

wastes 10% of annotated data, computationally 10-times as expensive as holdout

96
New cards

what are the disadvantages of leave-one-out validation?

computationally most expensive

97
New cards

in supervised learning, what is a labelled observation?

where each observation is a tuple (x,y) of feature vector x and output label y, which are related according to an unknown function f(x) = y

98
New cards

what happens during training a supervised learning model?

the labelled observations are used to learn the relationship between x (input) and y (output)

99
New cards

what is the goal of supervised learning?

ensure that the learned model h(x) accurately predicts the output label of a previously unseen, test feature input

100
New cards

what are labels in supervised learning?

the ‘teacher’ during training, the ‘validator’ of results during testing