Optimisation I
Parameters: Adjustable elements within an optimization problem, represented as \theta within a designated parameter space \&Theta. These parameters can take forms including scalars, vectors, or arrays, and are often situated in a vector space like \&mathbb{R}^n, where each dimension corresponds to a different variable influencing the system's behavior.
Objective Function: A crucial component of optimization, the objective function maps these parameters to a scalar value that reflects the quality of a given configuration. It is often expressed as L(\theta), with the goal of either maximizing or minimizing this value based on specific criteria relevant to the problem at hand.
Constraints: Constraints define the limitations imposed on the parameters, shaping the feasible region in which solutions must lie. These can consist of bounds on individual parameters, requiring that certain conditions are either strictly met or satisfied within defined thresholds.
Types of Optimization
Discrete Optimization: Involves situations where parameters can only take non-continuous values, such as integers or finite sets. This type of optimization is often encountered in scheduling and allocation problems where options are limited.
Continuous Optimization: Combines parameters that exist in a continuous space (e.g., \&mathbb{R}^n). Continuous optimization problems are generally simpler to solve due to properties such as smoothness and continuity, allowing techniques like calculus to be employed for finding optimal solutions.
Key Properties of Objective Functions
Convexity: A fundamental property of certain functions, where a convex function possesses a single global minimum. In practical terms, if any point is identified as a local minimum, it must also be the global minimum, thus simplifying search strategies.
Continuity: Refers to how small adjustments in the parameter values lead to slight changes in the objective function's output. Continuous functions facilitate smoother optimization processes, whereas discontinuities can introduce complexities that may hinder convergence.
Constrained Optimization
Hard Constraints: These are strict limitations placed on the parameters, consistently formatted as either equality constraints c(\theta) = 0 or inequality constraints c(\theta) \le 0. Violating hard constraints disqualifies a solution from consideration.
Example of Hard Constraints in Python
from scipy.optimize import minimize def objective_function(x): return x[0] ** 2 + x[1] ** 2 def constraint(x): return x[0] + x[1] - 1 # Hard constraint: x0 + x1 = 1 # Define the constraints and bounds constraints = {'type': 'eq', 'fun': constraint} bounds = [(0, None), (0, None)] # x0, x1 >= 0 # Initial guess initial_guess = [0.5, 0.5] # Optimization result = minimize(objective_function, initial_guess, bounds=bounds, constraints=constraints) print(result)Soft Constraints: Introduce a flexible approach to optimization, where penalties are added to the objective function to discourage violations of certain conditions. This is expressed mathematically as L'(\theta) = L(\theta) + \lambda C(\theta), where \lambda is a penalty factor, and C(\theta) represents a cost function penalizing constraint violations.
Iterative Optimization
Involves a systematic approach of successive adjustments to parameters aimed at minimizing the objective function.
Example for Iterative Optimization
import numpy as np def iterative_optimization(learning_rate, threshold, initial_theta): current_theta = initial_theta while True: gradient = compute_gradient(current_theta) new_theta = current_theta - learning_rate * gradient if np.linalg.norm(new_theta - current_theta) < threshold: break current_theta = new_theta return current_thetaIterative methods are valuable as they can gradually approach an optimal solution through staged improvements, typically employing algorithms like gradient descent.
Python Code for Gradient Descent
import numpy as np def gradient_descent(learning_rate, num_iterations, initial_theta): theta = initial_theta for i in range(num_iterations): gradient = compute_gradient(theta) # Assume this function calculates the gradient of the objective function theta -= learning_rate * gradient return theta
Heuristic Optimization
Principles: Draws upon concepts such as locality (exploiting similar values), memory (recording past parameter configurations), temperature (adaptable movement rates), and population (tracking multiple potential solutions).
Random Search: A fundamental stochastic optimization method that involves selecting random sets of parameters and evaluating the resultant objective function. While inherently inefficient, it is advantageous in scenarios where minimum information about the objective function is available.
Python Code for Random Search
import numpy as np def random_search(num_samples): best_value = float('inf') best_params = None for _ in range(num_samples): params = np.random.rand(2) # Randomly selecting parameters value = objective_function(params) # Evaluate the function if value < best_value: best_value = value best_params = params return best_params, best_value
Metaheuristics
Locality: Utilizes the principle that similar configurations typically yield similar parameter values, thus optimizing searches based on proximity in the parameter space.
Temperature: Modulates how aggressively parameter adjustments are made, particularly useful for escaping local minima, an intrinsic problem in many optimization landscapes.
Population: Keeps a diverse set of parameter configurations throughout the optimization process, enabling selection and combination of the most promising candidates to improve overall outcomes.
Memory: The mechanism by which previous optimal steps are retained, providing valuable insight that can guide current and future iterations toward better performance.
Direct Convex Optimization: Least Squares
Specifically addresses objective functions in the form of \&arg\min_{x} |Ax - y|^2, emphasizing the need to minimize the squared differences between predicted and actual values. This method often employs the pseudo-inverse through techniques such as Singular Value Decomposition (SVD) for efficient solution derivation.
Least Squares Optimization Example
from numpy.linalg import inv def least_squares(A, y): # Calculate the pseudo-inverse of A using SVD A_pseudo_inverse = inv(A.T @ A) @ A.T return A_pseudo_inverse @ y
Convergence
A well-designed optimization algorithm is characterized by rapid convergence, indicated by a significant and swift decline in the objective function value over successive iterations. Effective measures of convergence speed often dictate the practical utility of an optimization approach.
Measuring Convergence:
Objective Function Value: Track the value of the objective function at each iteration. A decreasing trend (for minimization problems) indicates convergence.
Parameter Change: Monitor how much the parameters change between iterations. Small changes suggest the algorithm is settling near a minimum.
Gradient Norm: Calculate the norm (magnitude) of the gradient of the objective function. A gradient close to zero implies a stationary point.
Convergence Criteria:
Threshold on Objective Function Change: Stop when the change in the objective function value between iterations falls below a predefined threshold. This ensures that only marginal improvements are being made.
Threshold on Parameter Change: Terminate the iterations when the change in parameter values becomes sufficiently small, indicating that the algorithm has stabilized.
Maximum Iterations: Set a limit on the number of iterations to prevent the algorithm from running indefinitely, especially when convergence is slow or not guaranteed.
Factors Affecting Convergence:
Learning Rate: A smaller learning rate might lead to slow convergence but can help avoid overshooting the minimum. A larger learning rate can speed up convergence but risks instability.
Initialization: The starting point for the optimization can significantly affect convergence. A good initialization can lead to faster and better convergence.
Objective Function Properties: Convex functions generally converge more reliably than non-convex ones. The smoothness and continuity of the objective function also play a crucial role.
Techniques to Improve