Lecture 2, 3, & 4 - Zero, First, & Second Order Optimization Techniques

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/16

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

17 Terms

1
New cards

What is Optimization?

The process of finding the best solution (i.e., the optimal solution) from a set of possible solutions. Typically means minimizing or maximizing a function that represents an objective, such as a cost function (or loss function). The goal is to adjust the model parameters (e.g., weights and biases in a neural network) in a way that minimizes the cost or maximizes some objective (e.g., accuracy or likelihood).

2
New cards

What is Zero-Order Optimization?

An optimization method that does not rely on any derivatives (gradients) of the objective function. Instead, it evaluates the objective function directly at various points in the search space to find the minimum or maximum

3
New cards

What is First-Order Optimization?

An optimization method that use first derivatives (gradients) of the objective function to guide the optimization process. The gradient tells you the direction of the steepest ascent or descent in the function’s value, and the optimization algorithm uses this information to move toward the optimal solution

4
New cards

What is Second-Order Optimization?

An optimization method that use second derivatives to better understand the curvature of the objective function. This helps optimize the search direction more efficiently, especially in cases where the first-order method might be slow or could oscillate near the minimum

5
New cards

What does the first derivative and second derivative tell you?

The first derivative gives you the rate of change or how fast the function is changing at any given point while the second derivative tells you the curvature of the function

6
New cards

Explain the Curse of Dimensionality

Refers to the exponential growth in computational complexity and the difficulty of solving optimization problems as the number of dimensions (or features) in the problem increases. Essentially, optimization problems become increasingly difficult as the number of dimension grows

7
New cards

What is Local Optimization?

Refers to finding a solution that is the best in a local region of the solution space, but not necessarily the best overall. In other words, the solution may be optimal in its neighborhood but might not be the best when compared to all other possible solutions across the entire problem domain

8
New cards

What is Global Optimization?

Seeks to find the best possible solution (global optimum) across the entire search space, not just within a local region. The goal is to find the absolute minimum (or maximum) of the objective function, regardless of the starting point or the landscape of the function

9
New cards

When would someone choose Local Optimization?

Local Optimization when:

  • The Problem is Simple and Well-Behaved

    • If you know the objective function is smooth, continuous and convex (single global optimum)

  • Good Initial Guess

    • If you have a good initial guess for the solution (prior knowledge to make an educated guess) allows for quick convergence

  • Speed

    • Local Optimization is typically faster than global optimization and is computationally cheaper

  • Low-Dimensions

    • If problem is low-dimensional, local optimization can efficiently solve the problem since the search space is small (lower chance of getting trapped in local optima)

10
New cards

When would someone choose Global Optimization?

Global Optimization when:

  • Problem has Multiple Local Optima

    • If the objective function is non-convex and has multiple local optima (or you're uncertain whether the problem has local optima), global optimization methods are designed to search more broadly and avoid getting stuck in a local optimum.

  • No Prior Knowledge of Solution Location

    • When you have little or no information about where the optimal solution might lie (i.e., you don't know if your initial guess is close to the best solution), global optimization methods that use multiple starting points or randomness (like genetic algorithms or simulated annealing) are more likely to find the global optimum

  • Complex or High-Dimensional Problems

    • For high-dimensional optimization problems (many decision variables), local optimization is at risk of being trapped in a local optimum. In these cases, global optimization methods are more effective because they explore larger regions of the search space.

    • In high-dimensional spaces, the problem of getting stuck in local optima becomes more pronounced due to the curse of dimensionality, so using global methods is often essential.

  • Computational Resources are Available

    • If you have enough computational resources (time, processing power), global optimization is worth considering because it explores a broader search space. Although these methods are generally more computationally expensive, they can offer significantly better results when local optimization is likely to fail

11
New cards

What are some ways to overcome the Local Optimization problem? (When local optimization gets stuck in a local optimum)

  1. Multiple Starting Points

    • Running the optimization algorithm multiple times at various different points, each potentially converging on different local minima optimum increases the chances of finding the global optimum

(Other options out there but are not discussed in class)

12
New cards

What makes a function Convex?

A function is convex if it has a single global minimum, and no local minima or maxima exist. Convex problems are easier to solve than non-convex because there is only global minima or maxima. Because of this, certain optimization techniques like gradient descent won’t have the issue of getting stuck in a local minima or maxima

13
New cards

What makes a function Non-Convex?

A function is Non-Convex when the cost/lost function has multiple local minima, local maxima, or saddle points. This makes optimization harder since algorithms like gradient descent could get stuck in a saddle point and never reach its global maxima or minima

14
New cards

Coordinate Search:

  • Explain what it is,

  • How it works (steps)

  • Pros and cons

  • When to use it

Coordinate Search is a zero-order optimization technique used to minimize or maximize a cost/lost function. It works by iteratively optimizing a function by adjusting one variable (or coordinate) at a time while keeping the others fixed. This process is done for all variables (coordinates) in turn.

Steps:

  1. Start with an initial guess of all variables

  2. Choose one coordinate x_1 (say the first variable) and perform a search over possible values of x_1, while keeping all other coordinates (x_2,..x_n) fixed.

  3. Update x_1​ to the value that minimizes the function with respect to x_1.

  4. Move to the next coordinate x_2​, and repeat the same process of optimizing for x_2​, while keeping x1,x3,…,x_n​ fixed.

  5. Continue this process for each coordinate, iterating over all coordinates multiple times.

Pros:

  • Simple to implement

  • Low computational cost

  • Works well in well-behaved search spaces

Cons:

  • Inefficient in High-Dimensional spaces

  • Slow Convergence (since it only updates one parameter (x) at a time)

  • Could get stuck in local minima

When should Coordinate Search be used?

  • Low-Dimensional Problems

  • Simple, Well-Defined Problems

  • When other methods are too complex

  • Fine-Tunning After Broader Search

15
New cards

What is a Greedy Algorithm?

Is an optimization algorithm that makes decisions by choosing the option that seems best at the moment, without considering the overall problem or future consequences. It doesn’t back track or reconsider previous choices once they are made. Best suited for problems where making a locally optimal choice at each step leads to a globally optimal solution. Some examples of optimization algorithms that are greedy are: coordinate search, coordinate descent, gradient descent, etc.

16
New cards

Coordinate Descent:

  • Explain what it is,

  • How it works (steps)

  • Pros and cons

  • When to use it

Coordinate Descent is a first-order optimization algorithm that is primarily used to solve a multivariable optimization problem by iteratively optimizing each variable (or coordinate) at a time, while keeping the other variables fixed.

Steps:

  1. Initialize the variables (coordinates).

  2. Select one coordinate (variable) to optimize.

  3. Update the selected coordinate by minimizing the objective function (taking the partial derivative/ gradient) with respect to that variable, keeping the others fixed.

  4. Repeat the process for each coordinate in a cyclic or random order.

  5. Convergence occurs when the objective function does not improve significantly with further updates.

Pros:

  • Simple and easy to implement.

  • Efficient when the problem is separable in terms of variables (i.e., each variable can be optimized independently).

  • Can handle high-dimensional problems

Cons:

  • May converge slowly, especially if variables are highly interdependent.

  • It is not guaranteed to find the global optimum in non-convex problems.

  • It can be sensitive to the initial choice of starting values.

When should Coordinate Descent be used?

  • When the cost/lost function is separable with respects to its variables, or when each variable can be optimized independently

  • Ideal for higher-dimensional optimization problems

  • Useful in problems where the objective function is smooth and convex, though can still work in non-convex settings.

17
New cards

What is the difference between Coordinate Search and Coordinate Descent?

Coordinate Descent is a more refined version of Coordinate Search that uses information about gradients of the cost/loss function at each iteration. Essentially, the only thing that is different mathematically when computing Coordinate Descent versus Coordinate Search is that Coordinate Descent involves the partial derivative/ gradient at each iteration while Coordinate Search does not use derivatives, thus Coordinate Descent is a first-order optimization technique while Coordinate Search is a zero-order optimization technique.