Note

0.0(0)

Take a practice test

Chat with Kai

Explore Top Notes

Health Psychology: Managing Chronic Illness

Studied by 2 people

Studied by 2 people

Chapter 3: Canada's Population in a Global Context

Studied by 26 people

Biology Midterm

Studied by 167 people

Chapter 47: Animal Reproduction and Development

Studied by 9 people

Acids Bases and Salts

Studied by 60 people

Machine Learning Algorithms – Comprehensive Bullet-Point Notes

Introduction to Artificial Intelligence, Machine Learning & Deep Learning

Artificial Intelligence (AI)
- Any computational technique that enables computers to mimic human behavior or cognition.
Machine Learning (ML)
- Sub-field of AI that provides systems with the ability to learn from data without being explicitly programmed.
- Learns mapping \mathbf x \;\rightarrow\; y by minimizing a cost function over examples.
Deep Learning (DL)
- Subset of ML that extracts hierarchical patterns directly from raw data with neural networks.
- Replaces labor-intensive hand-engineered features with learned representations.
Why the boom “Now”
- Big Data: unprecedented volumes & variety.
- Hardware: GPUs/TPUs enable massive parallelism.
- Software: high-level, open-source libraries (e.g.
  TensorFlow, PyTorch) drastically lower entry barrier.

Machine-Learning Frameworks & Tooling

TensorFlow & PyTorch explicitly highlighted (↑ on slide).
Provide auto-differentiation, optimized kernels, and deployment tooling.
Code snippets referenced:
- tf.math.sigmoid(z), torch.sigmoid(z)
- tf.nn.relu(z), torch.nn.ReLU()
- model.compile(optimizer='adam', loss='…', metrics=['accuracy'])

Biological Inspiration & The Perceptron

FIG. 1: activity map of biological brain (topographic & random connections, feedback loops).
Perceptron (FIG. 2)
- Simplified computational model of a biological neuron.
- Components:
- Inputs x1,\dots,xm
- Weights w1,\dots,wm & bias w_0
- Linear combination z = w0 + \sum{i=1}^{m} xi wi
- Non-linear activation \hat y = g(z)
- Foundation for multilayer perceptron (MLP) / feed-forward neural networks.

Forward Propagation in a Perceptron

Without bias: \hat y = g!\big( \sumi xi w_i \big)
With bias (more general): \hat y = g!\big( w_0 + \mathbf X^\top \mathbf w \big)
Activation options (sigmoid example)
- g(z)=\frac1{1+e^{-z}} (smoothly squashes to 0–1)
Multi-output extension
- For each output neuron j
  zj = w{0j}+\sum{i=1}^{m} xi w{ij}, \quad yj = g(z_j)

Common Activation Functions (all non-linear)

Sigmoid: g(z)=\frac1{1+e^{-z}}, \quad g'(z)=g(z)(1-g(z))
Hyperbolic Tangent: g(z)=\tanh(z), \quad g'(z)=1-g(z)^2
ReLU: g(z)=\max(0,z), \quad g'(z)=\begin{cases}1 & z>0\0 & z\le0\end{cases}
Importance
- Introduce non-linearity → network can approximate arbitrary functions (universal approximation theorem).
- Help mitigate vanishing gradients (ReLU better than sigmoid/tanh in deep nets).

Worked Perceptron Example (2-D Classification)

Parameters
- w0=1, \; w1 = 3, \; w_2 = -2
- Decision function: \hat y = g(1 + 3x1 - 2x2)
Decision boundary (z=0): 1 + 3x1 - 2x2 = 0 (a straight line in \mathbb R^2).
Sample input \mathbf x=[-1,\;2]^\top
- z = 1 + 3(-1) - 2(2) = -6
- \hat y = g(-6) \approx 0.002 → classified as negative (<0.5).
Region interpretation:
- z>0 \;\Rightarrow\; \hat y>0.5 (positive class)
- z<0 \;\Rightarrow\; \hat y<0.5 (negative class)

Single-Layer (One-Hidden-Layer) Neural Network (MLP)

Layers
1. Input: \mathbf x \in \mathbb R^m
2. Hidden: \mathbf z^{(1)} = W^{(1)} \mathbf x + \mathbf b^{(1)}, \; \mathbf a^{(1)} = g(\mathbf z^{(1)})
3. Output: \mathbf z^{(2)} = W^{(2)} \mathbf a^{(1)} + \mathbf b^{(2)}, \; \hat{\mathbf y} = g_{out}(\mathbf z^{(2)})
Example indexing: z{2}=w{0,2}+\sum{j=1}^{m} xj w_{j,2}^{(1)}
Forward pass called “forward propagation.”

Calculus Refresher – Derivatives & Gradients

Scalar function slope vs derivative
- For f(x)=x^2:
- f(x+\Delta x)= (x+\Delta x)^2 = x^2+2x\Delta x + (\Delta x)^2
- Slope =\dfrac{f(x+\Delta x)-f(x)}{\Delta x}=2x+\Delta x
- As \Delta x \to 0 → derivative f'(x)=2x.
Partial derivatives (multivariate)
- f(x,y)=x^3+y^2
- \tfrac{\partial f}{\partial x}=3x^2, \; \tfrac{\partial f}{\partial y}=2y.
Gradient \nabla f=(\partial f/\partial x, \partial f/\partial y,\dots) guides optimization.

Loss / Cost Functions

Mean Squared Error (MSE) for a single output
- Individual loss: L=\tfrac12 (y-\hat y)^2
- Dataset: \text{MSE}=\tfrac1n\sum{i=1}^n (yi-\hat y_i)^2.
Mean Absolute Error (MAE)
- \text{MAE}=\tfrac1n\sum{i=1}^n |yi-\hat y_i|.
Binary Cross-Entropy / Log-Loss
- \text{BCE}= -\tfrac1n\sum{i=1}^n \big[yi\log(\hat yi)+(1-yi)\log(1-\hat y_i)\big].
Categorical Cross-Entropy, Sparse Categorical Cross-Entropy for multi-class tasks.
TensorFlow/Keras compile examples:
- loss='mean_absolute_error' | 'mean_squared_error' | 'binary_crossentropy'.

Logistic Regression Example – Predicting Insurance Purchase

Features: Age (x1), Affordability (x2 – 0/1 having insurance).
Model: y = w1 x1 + w2 x2 + b; probability z=\sigma(-y).
- Example chosen weights: w1=1, w2=1, b=0 → for $(x1=22, x2=1)$, y=23 ⇒ z\approx0.99 (very likely to buy).
Error for this instance (squared): (y-\hat y)^2=0.9801 (given sample where true y=0 → huge error ⇒ motivates learning).

Gradient Descent Optimization

Goal: minimize cost J(w1,w2,b).
Parameter updates:
- w1 \leftarrow w1 - \eta \; \tfrac{\partial J}{\partial w_1}
- w2 \leftarrow w2 - \eta \; \tfrac{\partial J}{\partial w_2}
- b \leftarrow b - \eta \; \tfrac{\partial J}{\partial b}
Visual intuition: descending on cost surface toward global (or local) minima (illustrated on final slide).

Ethical, Practical, & Real-World Notes

Eliminating hand-engineered features increases scalability but hides interpretability.
Availability of big data & powerful hardware democratizes AI, but raises privacy & energy-consumption concerns.
Loss-function choice impacts robustness (e.g.
MAE less sensitive to outliers than MSE).
Activation-function choice affects training stability (ReLU helps mitigate vanishing gradients, but susceptible to “dying ReLUs”).

Note

0.0(0)

Take a practice test

Chat with Kai

Explore Top Notes

Health Psychology: Managing Chronic Illness

Studied by 2 people

Studied by 2 people

Chapter 3: Canada's Population in a Global Context

Studied by 26 people

Biology Midterm

Studied by 167 people

Chapter 47: Animal Reproduction and Development

Studied by 9 people

Acids Bases and Salts

Studied by 60 people