rmi 4400 midterm

·      Machine learning – field of computer science that uses statistical techniques to give computer systems the ability to ‘learn’, i.e. progressively improve performance on a specific task with the help of data, without being explicitly

·      Training set – data used to train the algorithm, via minimization of a cost function, to respond as expected

·      Test set – data used to check whether the model performs as expected with data that it hasn’t seen before

·      Explanatory/predictor variable/feature – the data you know; explains/predicts response variable; x

·      Response variable – what you’re trying to estimate/classify; label for data; depends on explanatory variable; y

·      Classification – the process of categorizing new data by assigning discrete labels to them; labeled data (no need for training/test sets); used for supervised learning

·      Regression – multiple explanatory variables to predict the response variable, or estimate the odds of the response variable (Y = aX1 + aX2 + aX3…); many different variations; labeled data

·      Clustering – grouping of homogenous data with trying to predict any response variable; no response variable; used for unsupervised learning

·      Bias-variance trade off – bias and variance are inversely related; when you try to reduce bias, you increase variance, and vice versa; cannot eliminate both at the same time

·      Principle of parsimony – you should choose the simplest possible model, given two models that perform the same; the simpler model will have less of a risk of being overfit than a complicated model

·      Supervised learning – utilizes labeled data where both input and output are provided to train a model to make predictions

·      Unsupervised learning – utilizes unlabeled data to allow the model to discover patterns and structures within the data on its out, without any predefined output labels

·      Deep learning/Neural Network – subset of machine learning that consists of a number of neurons, i.e. nodes of a network holding a certain value, arranged in multiple layers to process complex data

·      Learning rate – how fast you want neural network to find the lowest point in cost function; can be adjusted

·      Epoch – a round of training; measures how much training has been applied to model; too much training will lead to overfit data

·      Curse of dimensionality – phenomenon where performance of algorithms significantly deteriorates as the number of features in a dataset increase; the more dimensions you add, the more data you need to effectively train the model and avoid overfitting

·      Cost function – also called loss function; error between predicted value and actual value; algorithm always tries to reach lowest point by changing weights/parameters to move in the direction where the function decreases the fastest (lowest slope)

·      Gradient descent – process through calculus where the slope is computed in all possible directions; allows algorithm to find and follow the direction where slope decreases fastest

·      Entropy – measure of randomness/disorder in data

·      Distance function – Euclidean (sqrt), city block/manhattan (absolute value), chebyshev/chessboard (max)

§  conditions:

·      d(u,u) = 0

·      d(u,v) = d(v, u)

·      d(u, v) + d(v, w) >= d(u, w)

·      decision tree – great way to perform classification; average entropy reduces as you go down; very prone to overfitting

·      linear/logistic regression – helps determine how much of the response variable can the explained by the explanatory variable; linear – Y = mX + b; logistic – ln(p/(1-p)) = mX +b + e

·      k-nearest neighbors – classification of new points based on closest neighbors; defined by distance function

·      k-means clustering – organize data based on distance; forming clusters based on homogenous data; defined by how many clusters you want to end up with

·      decision boundary – line/curve that separates different classes within the feature space; marks line where a model predicts one class versus another

·      linear discriminant analysis

·      activation function – collapses neural network signal into a specific range of values to make the signal stronger; focuses the range of values the network produces and clarifies what the answer should be at every stage

·      forward activation – feeding data into the neural network to go through each layer; the data gets multiplied by certain weight at each layer; nothing in model changes

·      back propagation – process of going backwards and changing the weights of each neural network layer based on the gradient descent; no data is feed through network

·      large language model – class of machine learning algorithms designed to communicate in one or multiple human or computer languages; consist of neural networks that feed into each other

·      transformer – takes an input (human text ) and produces computer generated text as output; consists of an encoder network that creates numeric internal representations of the input and a decoder network that generates the output based on the representations

·      token/coin – smallest unit of data that a model processes; created by breaking down larger pieces of information; building block for model to understand and analyze text data

·      word (semantic) embedding – representing each token into a large list of numbers; you want similar words to have similar values; help determine meaning of word

·      attention – used to understand the context around words; relates the word with every other word in the text

·      artificial general intelligence – the performance of any intellectual task at the level that meets or exceeds that of humans

·      encryption – process of transforming info in a way that only authorized people can decode

·      symmetric key cryptography – same key that is used to encrypt a plaintext into a ciphertext can be used to decrypt the message once it is received

·      asymmetric key cryptography – two keys (private and public); private key is used to decrypt messages sent to them and public key is freely distributed and used by whoever wants to communicate securely with them

·      plaintext – readable data

·      ciphertext – encrypted data that is unreadable without a decryption key

·      binary number system – base two; only 0 and 1 digits

·      hexadecimal number system – base sixteen; digits 0-9 and letters A-F

·      stream cipher/XOR encryptor – type of encryption that converts plaintext into ciphertext by combining it with a pseudorandom keystream

·      digital signature – electronics signature that encrypts documents with digital codes that are particularly difficult to duplicate; gives recipient very strong reason to believe authenticity, integrity, and non-repudiation of origin

·      cryptographic hash function – takes text of any size and give you digest/hash (output of specific size that doesn’t make any sense); must satisfy the next three conditions

·      pre-image resistance – cryptographic hash function must go only one way (text -> hash but not hash -> text)

·      collision resistance – extremely unlikely for two different texts to give the same hash

·      avalanche effect – changing the text, even in a very minor way, will dramatically change the hash

·      nonce – special number included into a block so that the hash of the block has a certain format (ex: 40 leading zeros)

·      proof-of-work – ensures that blocks can’t be changed on a whim without incurring a huge computational cost

·      miner – trying to find the nonce; have no better way to do this than trial and error over and over again

·      coinbase transaction – guaranteed compensation to miner for creating a block

·      coin change -

·      smart contract – digital agreement that automatically executes when certain conditions are met; stored on a blockchain

·      oracle – provides objective outside information that allows a smart contact to take a course of action (ex. Airport providing flight info, delays weather channel, providing weather info)

·      proof of stake – mechanism used to verify blockchain transactions; restricts the competition for building blocks from miners to pre-approved stakes who have provided collateral

·      script and solidity – script (scripting language for bitcoin; very limited (cannot create loops) so that it is not susceptible to bugs); solidity (much more complex; can build any smart contract you want; Ethereum)

·      turing-complete language – language that is complicated enough for us to design a program that can solve any question; solidity is an example; script is not; python is another example

robot