1/127
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What are genetic algorithms?
Adaptive heuristic search algorithms inspired by biological models. Part of evolutionary Computation
What two works do genetic algorithms rely on?
Genetic Mendel’s principles which defines how offspring receive genetic material from parent generations and Darwin’s theory of Evolution by natural selection which defines ability of individuals to survive, adapt, and reproduce
What problems is genetic algorithms reccomended for?
Complex problems for which there is no known efficient algorithm
What are some common areas of application for genetic algorithms?
Optimization, search, planning, classification. Also used to tune learning parameters for other algorithms such as artificial neural networks
How does GA generate successor hypotheses?
By repeatedly recombining and mutating parts of the best currently known hypothesis which guarantees that variants of these hypotheses are most likely to be considered next
In GA what do genes represent?
Representation of some parameter (variables) according to an alphabet used to form chromosomes
In GA, what are individuals?
Each individual corresponds to a chromosome, used to represent a candidate solution (hypothesis)
In GA, what is a population?
Set of individuals who will compete for survival and reproduction (collection of hypothesis)
In GA, what is a generation?
Population of a given period (iteration)
In GA, what is a fitness function?
Function used to evaluate an individual’s ability to survive and reproduce. Provide a measure of quality of each candidate solution
How is the choice of individuals who will form the initial population done?
Randomly or heuristically by using relevant information for the search process
What must individuals satisfy in GA?
Problem constraints (feasible)
What do larger populations allow for in GA?
Greater diversity of solutions but consumes more time
What should smaller populations do in GA?
GAs should not generate the initial population randomly, but in a well-defined approach for a wider coverage of the search space
In GA, what is the population size?
Population of individuals whose size remains constant over time
How to represent hypothesis in GA?
Represent hypotheses of if-then rules by bit strings (binary)
How to represent multiple attributes in GA?
By concatenating the strings
How can an entire rule be described in GA?
By concatenating the bit strings describing the rule preconditions, together with the bit string describing the rule postcondition
What does the choice of the fitness function to ranking hypotheses depend on?
The problem to be solved
When minimizing or maximizing mathematical functions, what fitness function should be used?
The function to be maximized/minimized
For a classification problem, what fitness function should be used?
Could be defined as the accuracy of the hypothesis over the training data
What happens to unfeasible individuals produced during the genetic operations?
Penalized or discarded
What is it called if all individuals in the previous population can be replaced by new solutions?
The process is called generational
What is it called if only one group of individuals can be replaced (usually the worse)
Steady state which prevents quality individuals from being lost
What are the two main strategies for selection?
Roulette wheel and tournament
In GA what is roulette wheel?
The probability that a hypothesis will be selected is given by the ratio of its fitness to the fitness of other members of the current population (correspond proportionally to the sectors of a roulette wheel)
In GA what is tournament?
Two hypothesis are first chosen at random from the current population. With some predefined probability p, the most fit of these two is then selected and with probability (1-p) the less fit hypothesis is selected.
Between the two main strategies for selection, which yields a more diverse population?
Tournament selection
In GA, what is mutation?
Produces small random changes to the bit string by choosing a single bit at random then changing its value
Why is mutation done during GA?
Prevents the algorithm from stagnating at a local maximum, allowing it to explore other regions that may be more rpomising
When is mutation done?
Often performed after crossover has been applied with low probability (0.1%)
What termination criteria can be used?
Fixed number of generations, an individual reaches the expected result, distance of an individual to the expected result, Convergence: it does not significantly improve the solution over several generations
Why is GA less likely to fall into same kind of local minima that can plague gradient descent methods neural network during backpropagation?
The gradient descent search in backpropagation moves smoothly from one hypothesis to a new hypothesis that is very similar. GA search can also move much more abruptly, replacing a parent hypothesis by an offspring that may be radically different from the parent
What are the characteristics of GA?
Multiple candidate solutions produced in parallel, analysis of only parts of the search space, handle highly complex problems with a large space to search, not necessary to have specialized knowledge about the domain to find a function, the fitness function guides the search
What is a Convolutional Neural Network (CNN)
Type of deep learning algorithm mainly used in the processing o data with grid-like topology, such as image recognition and classification
What are the characteristics of CNN?
Automatic feature extraction, translation invariance, pre-trained architectures, and versatilityW
In CNN, what is Automatic feature Extraction?
CNNs can autonomously learn and extract hierarchical features from data, eliminating the need for manual feature engineering
In CNN, what is translation invariance?
CNN’s can recognize patterns regardless of position, orientation, or scale, making them robust to spatial variations
In CNNs, what are pre-trained architectures?
Models like InceptionV3 and ResNet50 achieve state-of-the-art results and can be fine-tuned for new tasks with relatively small datasets
In CNNs, what is versatility?
Though commonly used for image classification, also excel in natural language processing, time series analysis, and speech recognition
CNN’s consist of what?
Multiple layers such as the input, convolutional, pooling, flattening, fully connected, and output layers
What is the convolutional layer?
Applies filters to the input image to extract features
What is the pooling layer?
Downsamples the feature maps to reduce computation
What is the flattening layer?
Reshapes the data into a 1D vetor
What is the fully connected layer?
Processes the learned features
What is the output layer?
Generates the final prediction
What is the input layer?
Layer which we give input to our model
In the input layer, what are image channels?
Represents the image in a numerical format
For the input layer, what happens to the image channel if its in color?
3D array is used where each pixel from the image is represented by its corresponding pixel values in three different channels. RGB The three values are blended to forma single color
In the convolutional layers, how are prominent features extracted?
Using filters and kernals
What are filters in CNN?
Small grids (matrices) that slide across (convolve) the input image and compute a spatial correlation between the filter values and local pixel patterns. The result is a new array called feature map
In CNN, what are filters?
Work as a mini magnifying glass that looks for specific patterns in the photo, like lines, curves or shapes
Why is the convolutional layer powerful for CNNs in image classification?
Allows for the ability ot detect local patterns, allowing them to learn visual features much more effectively than traditional fully connected networks
What activation function is commonly used for CNNs?
ReLU
When using a kernel, what is done to the receptive field and kernel field?
Dot product
In CNN, what is padding?
Due to the kernel only being bale to scan an area on the data set, the data point of the border cant be properly learned. To solve this, we can use padding which creates additional data points with a value of 0 outside of the border of the dataset to increase its size.
In CNN, what is stride?
Controls how many pixels the filter moves across the input each time it slides over it. Moves n pixels
What is the goal of the pooling layer?
Pull most significant features form the feature maps, making the model less sensitive to small shifts or distortions in the input. Also relevant for mitigating overfitting, as the network will have fewer parameters and activations in the later layers
How is pooling done in CNN?
Applying some aggregation operations which reduce the dimension of the feature map and the memory used while training. There are no weights to learn
What are the common types of pooling layers?
Max, min, sum, and average
Why is the flattening layer used?
Make the outcomes of CNN be compatible with an ANN which expects a vector as input. Reshape the 2 or 3D vector into 1D
What is the fully connected dense layer?
These layers receive the one-dimensional vector generated by the flattening layer. The main purpose of the fully connected layer is to learn high-level, abstract patterns by considering the entire input.
In CNN, what happens in a dense layer?
Every neuron is connected to every neuron in the previous layers and the weights are trainable. ReLU is used as activation function
In CNN, what is output (dense) layer used for?
Responsible for prediction, often a dense layer with activation functions
In CNN, in the output layer what activation function is used for classification?
Softmax for multiclass or sigmoid for binary activation
In CNN, in the output layer what activation function is used for regression?
Linear activation function
In CNN, what should match in the output dense layer?
The number of output neurons should match the number of classes or be 1 (for binary classification or regression)
What are the characteristics of ANN Multilayer neural networks?
Multilayer neural networks with at least one hidden layer are universal approximators, meaning they can approximate any target function. However, due to their expressiveness choosing the right network topology is crucial to prevent overfitting.
What are the characteristics of ANN Multilayer redundant features?
Can handle redundant features, as their weights are automatically learned during training so redundant features have smaller weights
What are the characteristics of ANN Multilayer noise?
Neural networks are sensitive to noise in training data. A common way to address this is by using a validation set to estimate generalization error
What are the characteristics of ANN Multilayer gradient descent?
Gradient descent, used for optimizing ANN weights, often converges to a local optimum
What are the characteristics of ANN Multilayer training?
Training ANNs is computationally expensive, especially with many hidden nodes, but once trained, they classify new examples efficiently
Analogy for aritifcial neural networks
Built out of a densely interconnected set of simple units (neurons) where each unit takes a number of inputs (including the outputs of other units) and produces a single output (which may become the input to many units)
ANNs are what?
Core of deep learning, versatile, powerful, and scalable
What tasks are ANN good for?
Classifying billions of images, summarizing large amounts of text, powering speech recognition services, recommending the best videos to watch to hundreds of millions of users each, learning to beat champions in games like Go
In ANN, what are instances represented by?
Many attribute-value pairs
In ANN, what can the target function output?
Discrete valued, real valued, or a vector of several real or discrete-valued attributesI
In ANN, will the train examples contain errors?
Maybe
In ANN, are long training times acceptable?
Yes
In ANN, the learned target function requires what?
Fast evaluation
In ANN, what is not a priority?
The learned target function is not a priority, no transparency
When ANNs where first proposed, what did they present?
A simplified computational model of how biological neurons might work together in animal brains to perform complex computations using propositional logic.
In ANN, what is an artificial neuron?
Has 1 or more binary inputs and 1 binary output
For an artificial neuron, when does it activate?
Activates its output when more than a certain number of inputs are active
In ANN, is it possible to have two or more layers?
Yes, considering that the first layer is not a processing (active) layer but only an input (passthrough) layer
In ANN, what is it called if there are two layers?
Perceptron
In ANN, what is it called if there is at least 1 hidden layer private to the network?
Multi-layer perceptron
In ANN, what does deep learning mean?
At least 2 hidden layers, but usually many more
Difference between partially connected network and fully connected network?
A fully connected network happens when all neurons in a layer are connected to every neuron in the previous layer
What is a feedforward network?
If signal flow strictly from input to output with no cycles
What is a recurrent feedback network?
If a network has feedback loops where neuron outputs are fed back into earlier layers
When designing the architecture of a neural network, what does one need to define?
The number of layers, the number of neurons in each layer, and how the neurons are connected. These decisions should be made before training and depend on the problem solved
When projecting a neural network, what should you do for non-linearly separate problems?
The networks should have at least 3 layers: input, hidden, output layers, forming a Multi-Layer Perceptron
In ANN, in the input layer, what is common for each continuous attribute and 1 each value of a categorical attribute?
Use 1 neuron for each
What is a perceptron?
One of the simplest ANN architectures. Presents the TLU with its inputs x and output y. The body of the perceptron is divided in two parts. The first one responsible to sum the product of the inputs x by their corresponding weights w. The second part includes the activation function f which will control the value to be sent through y.

In the perceptron, what is each input connection associated with?
A weight. wi represents a real-valued constant that determines the contribution of input xi to the perceptron output
Other than the inputs, what else does a perceptron include?
A bias
What are the common activation functions used in perceptron?
Heaviside and signstep functions
For the perceptron, what happens when you use the sgn activation function?
Perceptron outputs a 1 for instances lying on one side of the hyperplane and outputs -1 for instances lying on the other side
During perceptron training, what happens if it misclassifies?
Modify the perceptron weight which reinforces connections that help reduce the error