· Machine learning – field of computer science that uses statistical techniques to give computer systems the ability to ‘learn’, i.e. progressively improve performance on a specific task with the help of data, without being explicitly
· Training set – data used to train the algorithm, via minimization of a cost function, to respond as expected
· Test set – data used to check whether the model performs as expected with data that it hasn’t seen before
· Explanatory/predictor variable/feature – the data you know; explains/predicts response variable; x
· Response variable – what you’re trying to estimate/classify; label for data; depends on explanatory variable; y
· Classification – the process of categorizing new data by assigning discrete labels to them; labeled data (no need for training/test sets); used for supervised learning
· Regression – multiple explanatory variables to predict the response variable, or estimate the odds of the response variable (Y = aX1 + aX2 + aX3…); many different variations; labeled data
· Clustering – grouping of homogenous data with trying to predict any response variable; no response variable; used for unsupervised learning
· Bias-variance trade off – bias and variance are inversely related; when you try to reduce bias, you increase variance, and vice versa; cannot eliminate both at the same time
· Principle of parsimony – you should choose the simplest possible model, given two models that perform the same; the simpler model will have less of a risk of being overfit than a complicated model
· Supervised learning – utilizes labeled data where both input and output are provided to train a model to make predictions
· Unsupervised learning – utilizes unlabeled data to allow the model to discover patterns and structures within the data on its out, without any predefined output labels
· Deep learning/Neural Network – subset of machine learning that consists of a number of neurons, i.e. nodes of a network holding a certain value, arranged in multiple layers to process complex data
· Learning rate – how fast you want neural network to find the lowest point in cost function; can be adjusted
· Epoch – a round of training; measures how much training has been applied to model; too much training will lead to overfit data
· Curse of dimensionality – phenomenon where performance of algorithms significantly deteriorates as the number of features in a dataset increase; the more dimensions you add, the more data you need to effectively train the model and avoid overfitting
· Cost function – also called loss function; error between predicted value and actual value; algorithm always tries to reach lowest point by changing weights/parameters to move in the direction where the function decreases the fastest (lowest slope)
· Gradient descent – process through calculus where the slope is computed in all possible directions; allows algorithm to find and follow the direction where slope decreases fastest
· Entropy – measure of randomness/disorder in data
· Distance function – Euclidean (sqrt), city block/manhattan (absolute value), chebyshev/chessboard (max)
§ conditions:
· d(u,u) = 0
· d(u,v) = d(v, u)
· d(u, v) + d(v, w) >= d(u, w)
· decision tree – great way to perform classification; average entropy reduces as you go down; very prone to overfitting
· linear/logistic regression – helps determine how much of the response variable can the explained by the explanatory variable; linear – Y = mX + b; logistic – ln(p/(1-p)) = mX +b + e
· k-nearest neighbors – classification of new points based on closest neighbors; defined by distance function
· k-means clustering – organize data based on distance; forming clusters based on homogenous data; defined by how many clusters you want to end up with
· decision boundary – line/curve that separates different classes within the feature space; marks line where a model predicts one class versus another
· linear discriminant analysis
· activation function – collapses neural network signal into a specific range of values to make the signal stronger; focuses the range of values the network produces and clarifies what the answer should be at every stage
· forward activation – feeding data into the neural network to go through each layer; the data gets multiplied by certain weight at each layer; nothing in model changes
· back propagation – process of going backwards and changing the weights of each neural network layer based on the gradient descent; no data is feed through network
· large language model – class of machine learning algorithms designed to communicate in one or multiple human or computer languages; consist of neural networks that feed into each other
· transformer – takes an input (human text ) and produces computer generated text as output; consists of an encoder network that creates numeric internal representations of the input and a decoder network that generates the output based on the representations
· token/coin – smallest unit of data that a model processes; created by breaking down larger pieces of information; building block for model to understand and analyze text data
· word (semantic) embedding – representing each token into a large list of numbers; you want similar words to have similar values; help determine meaning of word
· attention – used to understand the context around words; relates the word with every other word in the text
· artificial general intelligence – the performance of any intellectual task at the level that meets or exceeds that of humans
· encryption – process of transforming info in a way that only authorized people can decode
· symmetric key cryptography – same key that is used to encrypt a plaintext into a ciphertext can be used to decrypt the message once it is received
· asymmetric key cryptography – two keys (private and public); private key is used to decrypt messages sent to them and public key is freely distributed and used by whoever wants to communicate securely with them
· plaintext – readable data
· ciphertext – encrypted data that is unreadable without a decryption key
· binary number system – base two; only 0 and 1 digits
· hexadecimal number system – base sixteen; digits 0-9 and letters A-F
· stream cipher/XOR encryptor – type of encryption that converts plaintext into ciphertext by combining it with a pseudorandom keystream
· digital signature – electronics signature that encrypts documents with digital codes that are particularly difficult to duplicate; gives recipient very strong reason to believe authenticity, integrity, and non-repudiation of origin
· cryptographic hash function – takes text of any size and give you digest/hash (output of specific size that doesn’t make any sense); must satisfy the next three conditions
· pre-image resistance – cryptographic hash function must go only one way (text -> hash but not hash -> text)
· collision resistance – extremely unlikely for two different texts to give the same hash
· avalanche effect – changing the text, even in a very minor way, will dramatically change the hash
· nonce – special number included into a block so that the hash of the block has a certain format (ex: 40 leading zeros)
· proof-of-work – ensures that blocks can’t be changed on a whim without incurring a huge computational cost
· miner – trying to find the nonce; have no better way to do this than trial and error over and over again
· coinbase transaction – guaranteed compensation to miner for creating a block
· coin change -
· smart contract – digital agreement that automatically executes when certain conditions are met; stored on a blockchain
· oracle – provides objective outside information that allows a smart contact to take a course of action (ex. Airport providing flight info, delays weather channel, providing weather info)
· proof of stake – mechanism used to verify blockchain transactions; restricts the competition for building blocks from miners to pre-approved stakes who have provided collateral
· script and solidity – script (scripting language for bitcoin; very limited (cannot create loops) so that it is not susceptible to bugs); solidity (much more complex; can build any smart contract you want; Ethereum)
· turing-complete language – language that is complicated enough for us to design a program that can solve any question; solidity is an example; script is not; python is another example