Notes on Optimization and Machine Learning Concepts

Definition: Mathematical optimization (or mathematical programming) is the process of selecting a best element from a set of alternatives, according to a specific criterion.

Scenario: Limited budget for shopping while optimizing calorie and protein intake.
Goals:
- Minimize costs
- Ensure calorie intake is less than a specific amount and protein intake is above a specific amount.

Objective: The goal of the optimization problem, which can either be maximized or minimized.
Constraints: Limits or requirements that must be adhered to in the decision-making process.
Decision Variables: Variables that you can control in order to optimize the objective function.

Types of Mathematical Problems:
- Linear Programming (LP):
- Objective function: ext{min (or max)} ext{ } f(x)
- Subject to constraints like:
  - x1 + x2 ext{ } ext{≤ } 7
  - 2x1 - x2 ext{ } ext{≥ } 4
- Variables: (x1, x2 \geq 0)
- Other Types:
- Non-Linear Programming (NLP)
- Mixed Integer Linear Programming (MILP)
- Mixed Integer Non-Linear Programming (MINLP)

Exact Methods:
- Linear Simplex Method
- Interior-point methods
Combinatorial Optimization
- Branch and Bound
- Dynamic Programming
Approximate Methods:
- Heuristic Methods
- Meta-heuristic Methods
  - Simulated Annealing
  - Genetic Algorithms
  - Swarm Intelligence

Definition: A function is convex if the line segment between any two points on the graph of the function lies above or on the graph itself. This property is fundamental in optimization as it ensures that local minima are also global minima.

Feasible Solution: A solution that satisfies all constraints.
Optimal Solution: A feasible solution that maximizes or minimizes the objective function.
Relaxed Solution: An approximation of a complex problem that is easier to solve.

Definition: A linear model assumes data is linearly separable, using a linear combination of weights to define relationships among data features.
Mathematical Representation in n-dimensional space:
- 0 = w1 f1 + w2 f2 + b (for 2D)
- 0 = w1 f1 + w2 f2 + w3 f3 + b (for 3D)

Steps:
- Iterate until convergence:
- For each training example (features and label):
  - If prediction does not match the label, update weights:
  - wj = wj + f_j * ext{label}
  - Update bias: b = b + ext{label}

Description: An optimization algorithm used for minimizing a function by iteratively moving in the direction of the steepest descent as defined by the negative of the gradient.
Process:
- Start with initial parameters.
- For each dimension, adjust parameters slightly towards decreasing loss using the gradient.
- Continue until no significant decrease in loss is observed.
Learning Rate: Determines the size of the steps taken towards the minimum.

Learning Rate Selection: Selecting a proper learning rate is crucial to avoid divergence or slow convergence.
Local Minima: Ensuring the optimization algorithm escapes local minima to reach the global minimum.

Purpose: To prevent overfitting by adding a regularizing term to the loss function that penalizes overly complex models.
Common Methods:
- L1 Regularization: Encourages sparsity in weight vectors.
- L2 Regularization: Penalizes large weights more strongly than smaller weights.

Model-Based Machine Learning Process:
- Pick a model.
- Choose criteria to optimize (objective function).
- Develop a learning algorithm to minimize loss, potentially including regularization to ensure generalization.