Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

Explore Top Notes

Medieval c800 - c1500

Studied by 8 people

Studied by 2 people

Chapter 2 Textbook

Studied by 18 people

Achievement Motivation

Studied by 6 people

Democracy in Deficit - 2. The Old-Time Fiscal Religion

Studied by 10 people

Artistic Vanguards

Studied by 43 people

Machine Learning Algorithms – Vocabulary Review

Artificial Intelligence, Machine Learning, and Deep Learning

Artificial Intelligence (AI) – any technique enabling computers to mimic human behaviour.
Machine Learning (ML) – algorithms that let computers “learn” patterns without being explicitly programmed.
Deep Learning (DL) – subset of ML that extracts hierarchical patterns directly from raw data using neural networks.
- Replaces hand-engineered features (time–consuming, non-scalable) with learned representations.
Three drivers of the current DL boom:
- Big Data (abundant labelled & unlabelled data sets).
- Hardware (GPUs/TPUs for massive parallelism).
- Software (open-source libraries, e.g. TensorFlow & PyTorch).

Framework Ecosystem

TensorFlow – Google-backed, static graphs + tf.keras high-level API.
PyTorch – Dynamic computation graphs, pythonic, research-friendly.

Biological Inspiration & the Perceptron

Early work (Rosenblatt) viewed the brain as:
- Mosaic sensory points → Projection areas → Association units → Response units.
Perceptron – mathematical abstraction of a neuron.
- Takes inputs x1,\dots,xm with weights w1,\dots,wm and bias w_0.
- Computes a weighted sum z = w0 + \sum{j=1}^{m} wj xj.
- Applies non-linear activation \hat y = g(z).

Forward Propagation Mathematics

Compact vector form: \hat y = g(w_0 + \mathbf{x}^T \mathbf{w}) where \mathbf{x} \in \mathbb{R}^m, \mathbf{w}\in\mathbb{R}^m.
Common scalar example: \hat y = g(1 + 3x1 - 2x2) defines a decision line in 2-D.
Multi-output version stacks neurons: zi = w{0,i} + \sum{j=1}^{m} xj w{j,i} then yi = g(z_i) for each output i.

Activation Functions

Sigmoid g(z)=\frac{1}{1+e^{-z}}, derivative g'(z)=g(z)(1-g(z)).
Hyperbolic Tangent g(z)=\tanh(z), derivative g'(z)=1-\tanh^2(z).
ReLU g(z)=\max(0,z), derivative g'(z)=0\; (z\le 0),\;1\; (z>0).
All introduce non-linearity, allowing networks to approximate complex functions; without them networks collapse into linear models.

Multilayer Perceptron (Single Hidden Layer)

Architecture example: 2 inputs + bias → 2 hidden neurons → 1 output neuron.
Forward pass:
- Hidden: z^{(1)}k = w^{(1)}{0k}+\sumj w^{(1)}{jk}xj, a^{(1)}k=g(z^{(1)}_k).
- Output: z^{(2)} = w^{(2)}{0}+\sumk w^{(2)}{k}a^{(1)}k, \hat y=g_{out}(z^{(2)}).
- z_k⁽¹⁾=w_0k⁽¹⁾w0k(1)+j∑wjk(1)xj
Common output activations: Sigmoid for binary, Softmax for multi-class.

Calculus Refresher

Slope for linear y = mx + b is constant m.
Derivative of nonlinear f(x)=x^2 via limit: f'(x)=\lim_{\Delta x \to 0}\frac{(x+\Delta x)^2 - x^2}{\Delta x}=2x.
Partial derivatives for multivariate f(x,y)=x^3+y^2: \frac{\partial f}{\partial x}=3x^2, \frac{\partial f}{\partial y}=2y.

Loss / Cost Functions

Mean Squared Error (MSE) \text{MSE}=\frac{1}{n}\sum{i=1}^{n} (yi-\hat y_i)^2.
Mean Absolute Error (MAE) \text{MAE}=\frac{1}{n}\sum{i=1}^{n} |yi-\hat y_i|.
Binary Cross-Entropy (Log-Loss) L = -\frac{1}{n}\sum{i=1}^{n}\big[yi\log(\hat yi)+(1-yi)\log(1-\hat y_i)\big].
TensorFlow/Keras compile examples:
- \text{loss='mean_squared_error'}, 'mean_absolute_error', 'binary_crossentropy', 'sparse_categorical_crossentropy'.

Gradient Descent Parameter Updates

Generic rule: w \leftarrow w - \eta \frac{\partial L}{\partial w}, b \leftarrow b - \eta \frac{\partial L}{\partial b} where \eta = learning rate.
For logistic-style model \hat y=\sigma(y) with y=\mathbf{w}^T\mathbf{x}+b:
- \frac{\partial L}{\partial wj}=\frac{1}{n}\sum{i=1}^{n} x{ij}(\hat yi-y_i).
- \frac{\partial L}{\partial b}=\frac{1}{n}\sum{i=1}^{n}(\hat yi-y_i).

Images as Numerical Matrices

Digital image = 3-D tensor H \times W \times C (e.g. 1080 \times 1080 \times 3 RGB).
Pixel intensities range [0,255] (often normalised to [0,1]).

Computer Vision Tasks

Classification – assign label; model may output class probabilities.
Regression – predict continuous value (e.g. steering angle).
Object Detection – locate & label bounding boxes.
Semantic Segmentation – per-pixel classification.

Why Not Plain ANN for Images?

Dense layers on 1920\times1080\times3 input would demand \sim 6 million neurons per layer and billions of weights → impractical.
Dense networks treat distant pixels the same as neighbours and are sensitive to object translation.

Convolutional Neural Networks (CNNs)

Convolution layer: learn filters (kernels) shared spatially.
- Example: 4\times4 filter (16 weights) slides with stride to generate feature map.
- Operation: element-wise multiply + sum.
Parameter sharing provides sparsity & translation equivalence.
Non-linearity (ReLU) follows each convolution.
Pooling layer (e.g. max pool 2\times2, stride 2):
- Downsamples, reduces computation, introduces spatial invariance, lowers overfitting.
Feature Hierarchy
- Early layers → edges & corners.
- Middle layers → motifs (eyes, wheels).
- Deep layers → object parts / high-level semantics.
Complete pipeline: CONV → ReLU → POOL repeated → flatten → fully connected → Softmax.

Practical Filter Examples (ASCII)

Vertical line detector \begin{bmatrix}-1 & 1 & -1\ -1 & 1 & -1\ -1 & 1 & -1\end{bmatrix}.
Diagonal detector \begin{bmatrix}-1 & -1 & 1\ -1 & 1 & -1\ 1 & -1 & -1\end{bmatrix}.
Loopy pattern filter demonstrated for digit 9 recognition.

Representation Learning Demonstrations

Combining detected sub-parts (eyes, nose, ears) → head → body → Koala classifier.
ReLU zeros out negative filter responses, producing sparse feature maps.
Max-pool continues to keep the strongest response regardless of position (shift invariance).

Advanced CNN Architectures for Vision

Fully Convolutional Networks (FCN) – all-conv; downsample then upsample using deconvolution \text{Conv2DTranspose} to output pixel-wise predictions (segmentation).
R-CNN (Regions with CNN features)
- 1) Generate ~2k region proposals (Selective Search).
- 2) Warp each region, feed into CNN, 3) classify region.
- Slow & brittle (hand-crafted proposals).
Faster R-CNN
- End-to-end network; backbone conv extracts feature map once.
- Region Proposal Network (RPN) predicts bounding boxes + objectness.
- ROI Pooling aligns proposals → shared classifier head.
- Learned proposals, orders-of-magnitude speed improvement.

Convolution, ReLU & Pooling – Combined Benefits

Convolution: sparse connectivity, weight sharing ⇒ fewer parameters, reduced overfitting.
ReLU: non-linearity, simple derivative, accelerates convergence.
Pooling: dimensionality reduction, computational savings, tolerance to small distortions.

Ethical & Practical Considerations

Scalability: DL leverages data/hardware; but large models consume energy.
Interpretability: learned features outperform manual yet can be opaque.
Fairness: biases in big data can propagate through learned representations.

Numerical & Code Snippets (Framework Agnostic)

TensorFlow code blocks for activations: tf.math.sigmoid(z), tf.math.tanh(z), tf.nn.relu(z).
PyTorch equivalents: torch.sigmoid(z), torch.tanh(z), torch.nn.ReLU().
Model compilation examples in Keras shown for different loss functions.

Key Takeaways

Neural networks map inputs → outputs via layers of linear transforms + non-linearities.
Training minimises loss via gradient descent; derivatives underpin updates.
CNNs specialise in vision: local receptive fields, shared filters, pooling.
Modern detection/segmentation models integrate learnable proposal or upsampling stages.
Toolchains (TensorFlow, PyTorch) abstract low-level math, letting practitioners focus on architecture & data.

Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

Explore Top Notes

Medieval c800 - c1500

Studied by 8 people

Studied by 2 people

Chapter 2 Textbook

Studied by 18 people

Achievement Motivation

Studied by 6 people

Democracy in Deficit - 2. The Old-Time Fiscal Religion

Studied by 10 people

Artistic Vanguards

Studied by 43 people