MD

Optimisation I

  • Parameters: Adjustable elements within an optimization problem, represented as \theta within a designated parameter space \&Theta. These parameters can take forms including scalars, vectors, or arrays, and are often situated in a vector space like \&mathbb{R}^n, where each dimension corresponds to a different variable influencing the system's behavior.

  • Objective Function: A crucial component of optimization, the objective function maps these parameters to a scalar value that reflects the quality of a given configuration. It is often expressed as L(\theta), with the goal of either maximizing or minimizing this value based on specific criteria relevant to the problem at hand.

  • Constraints: Constraints define the limitations imposed on the parameters, shaping the feasible region in which solutions must lie. These can consist of bounds on individual parameters, requiring that certain conditions are either strictly met or satisfied within defined thresholds.

Types of Optimization
  • Discrete Optimization: Involves situations where parameters can only take non-continuous values, such as integers or finite sets. This type of optimization is often encountered in scheduling and allocation problems where options are limited.

  • Continuous Optimization: Combines parameters that exist in a continuous space (e.g., \&mathbb{R}^n). Continuous optimization problems are generally simpler to solve due to properties such as smoothness and continuity, allowing techniques like calculus to be employed for finding optimal solutions.

Key Properties of Objective Functions
  • Convexity: A fundamental property of certain functions, where a convex function possesses a single global minimum. In practical terms, if any point is identified as a local minimum, it must also be the global minimum, thus simplifying search strategies.

  • Continuity: Refers to how small adjustments in the parameter values lead to slight changes in the objective function's output. Continuous functions facilitate smoother optimization processes, whereas discontinuities can introduce complexities that may hinder convergence.

Constrained Optimization
  • Hard Constraints: These are strict limitations placed on the parameters, consistently formatted as either equality constraints c(\theta) = 0 or inequality constraints c(\theta) \le 0. Violating hard constraints disqualifies a solution from consideration.

  • Example of Hard Constraints in Python

    from scipy.optimize import minimize
    
    def objective_function(x):
        return x[0] ** 2 + x[1] ** 2
    
    def constraint(x):
        return x[0] + x[1] - 1  # Hard constraint: x0 + x1 = 1
    
    # Define the constraints and bounds
    constraints = {'type': 'eq', 'fun': constraint}
    bounds = [(0, None), (0, None)]  # x0, x1 >= 0
    
    # Initial guess
    initial_guess = [0.5, 0.5]
    
    # Optimization
    result = minimize(objective_function, initial_guess, bounds=bounds, constraints=constraints)
    print(result)
    
  • Soft Constraints: Introduce a flexible approach to optimization, where penalties are added to the objective function to discourage violations of certain conditions. This is expressed mathematically as L'(\theta) = L(\theta) + \lambda C(\theta), where \lambda is a penalty factor, and C(\theta) represents a cost function penalizing constraint violations.

Iterative Optimization
  • Involves a systematic approach of successive adjustments to parameters aimed at minimizing the objective function.

  • Example for Iterative Optimization

    import numpy as np
    
    def iterative_optimization(learning_rate, threshold, initial_theta):
        current_theta = initial_theta
        while True:
            gradient = compute_gradient(current_theta)
            new_theta = current_theta - learning_rate * gradient
            if np.linalg.norm(new_theta - current_theta) < threshold:
                break
            current_theta = new_theta
        return current_theta
    
  • Iterative methods are valuable as they can gradually approach an optimal solution through staged improvements, typically employing algorithms like gradient descent.

  • Python Code for Gradient Descent

    import numpy as np
    
    def gradient_descent(learning_rate, num_iterations, initial_theta):
        theta = initial_theta
        for i in range(num_iterations):
            gradient = compute_gradient(theta)  # Assume this function calculates the gradient of the objective function
            theta -= learning_rate * gradient
        return theta
    
Heuristic Optimization
  • Principles: Draws upon concepts such as locality (exploiting similar values), memory (recording past parameter configurations), temperature (adaptable movement rates), and population (tracking multiple potential solutions).

  • Random Search: A fundamental stochastic optimization method that involves selecting random sets of parameters and evaluating the resultant objective function. While inherently inefficient, it is advantageous in scenarios where minimum information about the objective function is available.

  • Python Code for Random Search

    import numpy as np
    
    def random_search(num_samples):
        best_value = float('inf')
        best_params = None
        for _ in range(num_samples):
            params = np.random.rand(2)  # Randomly selecting parameters
            value = objective_function(params)  # Evaluate the function
            if value < best_value:
                best_value = value
                best_params = params
        return best_params, best_value
    
Metaheuristics
  • Locality: Utilizes the principle that similar configurations typically yield similar parameter values, thus optimizing searches based on proximity in the parameter space.

  • Temperature: Modulates how aggressively parameter adjustments are made, particularly useful for escaping local minima, an intrinsic problem in many optimization landscapes.

  • Population: Keeps a diverse set of parameter configurations throughout the optimization process, enabling selection and combination of the most promising candidates to improve overall outcomes.

  • Memory: The mechanism by which previous optimal steps are retained, providing valuable insight that can guide current and future iterations toward better performance.

Direct Convex Optimization: Least Squares
  • Specifically addresses objective functions in the form of \&arg\min_{x} |Ax - y|^2, emphasizing the need to minimize the squared differences between predicted and actual values. This method often employs the pseudo-inverse through techniques such as Singular Value Decomposition (SVD) for efficient solution derivation.

  • Least Squares Optimization Example

    from numpy.linalg import inv
    
    def least_squares(A, y):
        # Calculate the pseudo-inverse of A using SVD
        A_pseudo_inverse = inv(A.T @ A) @ A.T
        return A_pseudo_inverse @ y
    
Convergence
  • A well-designed optimization algorithm is characterized by rapid convergence, indicated by a significant and swift decline in the objective function value over successive iterations. Effective measures of convergence speed often dictate the practical utility of an optimization approach.

Measuring Convergence:

Objective Function Value: Track the value of the objective function at each iteration. A decreasing trend (for minimization problems) indicates convergence.

Parameter Change: Monitor how much the parameters change between iterations. Small changes suggest the algorithm is settling near a minimum.

Gradient Norm: Calculate the norm (magnitude) of the gradient of the objective function. A gradient close to zero implies a stationary point.

Convergence Criteria:

Threshold on Objective Function Change: Stop when the change in the objective function value between iterations falls below a predefined threshold. This ensures that only marginal improvements are being made.

Threshold on Parameter Change: Terminate the iterations when the change in parameter values becomes sufficiently small, indicating that the algorithm has stabilized.

Maximum Iterations: Set a limit on the number of iterations to prevent the algorithm from running indefinitely, especially when convergence is slow or not guaranteed.

Factors Affecting Convergence:

Learning Rate: A smaller learning rate might lead to slow convergence but can help avoid overshooting the minimum. A larger learning rate can speed up convergence but risks instability.

Initialization: The starting point for the optimization can significantly affect convergence. A good initialization can lead to faster and better convergence.

Objective Function Properties: Convex functions generally converge more reliably than non-convex ones. The smoothness and continuity of the objective function also play a crucial role.

Techniques to Improve