ECON Maths pt 2

studied byStudied by 1 person
5.0(1)
Get a hint
Hint

Define limit of a function due to Heine

1 / 88

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

89 Terms

1

Define limit of a function due to Heine

Given the function f: A⊂RN→R, a∈R is a limit of f(x) as x→x0∈A if

∀{xn}∈A: lim(n→∞)xn=x0∈A ⟹ f(xn)→a

New cards
2

Define limit of a function due to Cauchy

Given the function f:A⊂RN→R, a∈R is a limit of f(x) as x→x0∈A if

∀ϵ>0, ∃δ>0 such that 0<|x−x0|<δ⟹|f(x)−a|<ϵ

  • note the similarity to the definition of the sequence

  • note that in the ϵ-ball around x0 the point x0 is excluded

_images/func_lim_epsilondelta.png

Fig. 45 limx→x0f(x)=L under the Cauchy definition#
Fact: Cauchy and Heine definitions of the function limit are equivalent

New cards
3

quotient / inverse function rules

knowt flashcard image
New cards
4

differentiability relationship with continuity

Differentiability ⟹ Continuity

Differentiability ⟸̸ Continuity

New cards
5

Little-o

Little-o notation is used to describe functions that approach zero faster than a given function

f(x)=o(g(x)) as x → a ⟺ lim(x→a)f(x)g(x)=0

New cards
6

Define total derivative (Jacobian)

If a vector-valued function f: RNRK is differentiable at x, then all partial derivatives at x exist and the (_) is given by

(matrix - see image)

Note:

  • note that partial derivatives of a vector-valued function f are indexed with two subscripts: i for f and j for x

  • index i∈{1,…,K} indexes functions and rows of the Jacobian matrix

  • index j∈{1,…,N} indexes variables and columns of the Jacobian matrix

  • Jacobian matrix is (K×N), i.e. number of functions in the tuple by the number of variables

  • the gradient is a special case of Jacobian with K=1, as expected!

Sufficient conditions for differentiability of a function f: RNRK at x are that all partial derivatives exist and are continuous in x

<p>If a vector-valued function <span><em>f: </em><strong><em>R</em></strong><em><sup>N</sup>→</em><strong><em>R</em></strong><em><sup>K</sup></em></span><em> </em>is differentiable at <span><em>x</em></span>, then all partial derivatives at <span>x</span> exist and the <strong>(_)</strong> is given by </p><p>(matrix - see image)</p><p>Note:</p><ul><li><p>note that partial derivatives of a vector-valued function <span>f</span> are indexed with two subscripts: <span>i</span> for <span>f</span> and <span>j</span> for <span>x</span></p></li><li><p>index <span>i∈{1,…,K}</span> indexes functions and rows of the Jacobian matrix</p></li><li><p>index <span>j∈{1,…,N}</span> indexes variables and columns of the Jacobian matrix</p></li><li><p>Jacobian matrix is <span>(K×N)</span>, i.e. number of functions in the tuple by the number of variables</p></li><li><p>the gradient is a special case of Jacobian with <span>K=1</span>, as expected!</p></li></ul><p>Sufficient conditions for differentiability of a function <span><em>f: </em><strong><em>R</em></strong><em><sup>N</sup>→</em><strong><em>R</em></strong><em><sup>K</sup></em></span><em> </em>at <span><em>x</em></span> are that all partial derivatives exist and are continuous in <em>x</em><br></p>
New cards
7

Rules of differentiation multivariate calculus - Sum and scaling

Let A denote an open set in RN and let f,g: A→RK be differentiable functions at x∈A.

  1. f+g is differentiable at x and D(f+g)(x)=Df(x)+Dg(x)

  2. cf is differentiable at x and D(cf)(x)=cDf(x) for any scalar c

New cards
8

Define outer product

Outer product of two vectors x∈RN and y∈RK is a N×K matrix given by

xy′=(x1⋮xN)(y1,…,yK)=(x1y1⋯x1yK⋮⋱⋮xNy1⋯xNyK)

Note: Potential confusion with products continues: in current context scalar product is a multiplication with a scalar (and not the product that produces a scalar as before in dot product of vectors)

New cards
9

Dot product rule

Fact: Dot product rule

Let A denote an open set in RN and let f,g:A→RK be both differentiable functions at x∈A. Assume that both functions output column vectors.

Then the dot product function f⋅g:A→R is differentiable at x and

∇(f⋅g)(x) = f(x)TDg(x)+g(x)TDf(x),

where T denotes transposition of a vector.

Observe that:

  • both Dg(x) and Df(x) are (K×N) matrices

  • both f(x)T and g(x)T are (1×K) row-vectors

  • therefore, both f(x)TDg(x) and g(x)TDf(x) are (1×N) row-vectors

New cards
10

Chain rule

Fact: Chain rule

Let

  • A denote an open set in RN

  • B denote an open set in RK

  • f: A→B be differentiable at x∈A

  • g: B→RL be differentiable at f(x)∈C

Then the composition g∘f is differentiable at x and its Jacobian is given by

D(g∘f)(x)=Dg(f(x))Df(x)

New cards
11

Define Hessian

Let A denote an open set in RN, and let f:A→R. Assume that f is twice differentiable at x∈A.

The total derivative of the gradient of function f at point x, ∇f(x) is called the (_) matrix of f denoted by Hf or ∇2f, and is given by a N×N matrix

<p>Let <span><em>A</em></span> denote an open set in <span><em>R<sup>N</sup></em></span>, and let <span><em>f:A→R</em></span><em>. </em>Assume that <span>f</span> is twice differentiable at <span><em>x∈A</em></span>.</p><p>The total derivative of the gradient of function <span>f</span> at point <span><em>x</em></span><em>, </em><span><em>∇f(x)</em></span><em> </em>is called the <strong>(_)</strong> matrix of <span><em>f</em></span> denoted by <span><em>Hf</em></span><em> or </em><span><em>∇2f</em></span>, and is given by a <span><em>N×N</em></span> matrix</p>
New cards
12

State Young’s Theorem

Fact (Young’s theorem)

For every x∈A ⊂ RN where A is an open and f:A→RN is twice continuously differentiable, the Hessian matrix ∇2f(x) is symmetric

New cards
13

State the Clairaut Theorem

Fact (Clairaut theorem)

If all the k-th order partial derivatives of f exist and are continuous in a neighbourhood of x, then the order of differentiation does not matter, and the mixed partial derivatives are equal

New cards
14

Define and state properties of dot product

The dot product of two vectors x,y∈RN is

x⋅y=xTy=∑Nn=1xnyn

Fact: Properties of dot product

For any α,β∈R and any x,y∈RN, the following statements are true:

  1. xTy=yTx

  2. (αx)′(βy)=αβ(xTy)

  3. xT(y+z)=xTy+xTz

New cards
15

Define the (Euclidean) norm

The (_) (_) of x∈RN is defined as

‖x‖=sqrt(xTx)=(∑Nn=1x2n)1/2

  • ‖x‖ represents the ``length’’ of x

  • ‖x−y‖ represents distance between x and y

New cards
16

linear algebra - inequalities

For any α∈R and any x,y∈RN, the following statements are true:

  1. ‖x‖≥0 and ‖x‖=0 if and only if x=0

  • ‖αx‖=|α|‖x‖

  • ‖x+y‖≤‖x‖+‖y‖ (triangle inequality)

  • |xTy|≤‖x‖‖y‖ (Cauchy-Schwarz inequality)

New cards
17

Linear combinations

Definition

A linear combination of vectors x1,…,xK in RN is a vector

y=∑Kk=1 αkxk = α1x1+⋯+αKxK

where α1,…,αK are scalars

New cards
18

Inner products of linear combinations satisfy

knowt flashcard image
New cards
19

Define span

Set of all possible linear combinations of elements of X is called the (_) of X, denoted by span(X)

<p>Set of all possible linear combinations of elements of <span>X</span> is called the <strong>(_)</strong> of <span>X</span>, denoted by <span>span(X)</span></p>
New cards
20

Define linear subspace (subspace)

A nonempty S⊂RN called a linear subspace of RN if

x, y ∈ S and α, β ∈ R ⟹ αx + βy∈ S

In other words, S⊂RN is “closed” under vector addition and scalar multiplication

New cards
21

canonical

The set of canonical basis vectors {e1,…,eN} is linearly independent in RN

New cards
22

linearly independent

Fact

Take X={x1,…,xK}⊂RN. For K>1 all of following statements are equivalent

  1. X is linearly independent

  2. No xi ∈ X can be written as linear combination of the others

  3. X0 ⊂ X ⟹ span(X0) ⊂ span(X)

New cards
23

Facts of linear independence (4)

Fact If X is linearly independent, then X does not contain 0

Fact If X is linearly independent, then every subset of X is linearly independent

Fact If X={x1,…,xK} ⊂ RN is linearly independent and z is an N-vector not in span(X), then X∪{z} is linearly independent

Fact If X={x1,…,xK} ⊂ RN is linearly independent and y ∈ RN, then there is at most one set of scalars α1,…,αK such that y=∑Kk=1 αkxk

New cards
24

Exchange Lemma

Let S be a linear subspace of RN, and be spanned by K vectors.
Then any linearly independent subset of S has at most K vectors

New cards
25

Define the basis

Let S be a linear subspace of RN

A set of vectors B={b1,…,bK} ⊂ S is called a basis of S if

  1. B is linearly independent

  2. span(B)=S

e.g. Canonical basis vectors form a basis of RN

New cards
26

Define linear function

A function T:RK→RN is called (_) if

T(αx+βy)=αTx+βTy∀x, y∈RK, ∀ α, β ∈ R

New cards
27

Define the null space (kernel)

The (_) or (_) of linear mapping T:RKRN is

kernel (T)={x∈RK:Tx=0}

New cards
28

If T is a linear function from RN to RN then all of the following are equivalent:

If T is a linear function from RN to RN then all of the following are equivalent:

  1. T is a bijection

  2. T is onto/surjective

  3. T is one-to-one/injective

  4. kernel(T)={0}

  5. The set of vectors V={Te1,…,TeN} is linearly independent

Definition: If any one of the above equivalent conditions is true, then T is called nonsingular

Fact: If T:RNRN is nonsingular then so is T−1

New cards
29

For a linear mapping T from RKRN, the following statements are true:

  1. If K<N then T is not onto

  2. If K>N then T is not one-to-one

New cards
30

Define Rank

The dimension of span(A) is known as the (_) of A

rank(A)=dim(span(A))

Because span(A) is the span of K vectors, we have

rank(A) = dim(span(A))≤K

Definition

A is said to have full column rank if

rank(A)= number of columns of A

New cards
31

Define nonsingular for square matrix A

We say that A is nonsingular if T: x→Ax is nonsingular

Equivalent:

  • x↦Ax is a bijection from RN to RN

We now list equivalent conditions for nonsingularity

Fact Let A be an N×N matrix
All of the following conditions are equivalent

  1. A is nonsingular

  2. The columns of A are linearly independent

  3. rank(A) = N

  4. span(A) = RN

  5. If Ax=Ay, then x=y

  6. If Ax=0, then x=0

  7. For each b ∈ RN, the equation Ax=b has a solution

  8. For each b ∈ RN, the equation Ax=b has a unique solution

New cards
32

Define invertible

Given square matrix A, suppose ∃ square matrix B such that

AB=BA=I

Then

  • B is called the inverse of A, and written A−1

  • A is called invertible

Fact A square matrix A is non-singular if and only if it is invertible

New cards
33

State the inverse of a 2×2 matrix

knowt flashcard image
New cards
34

Define
a) eigenvalue
b) eigenvector
c) eigenpair

Let A be a square matrix

Think of A as representing a mapping x↦Ax, (this is a linear function)

But sometimes x will only be scaled:

Ax=λx for some scalar λ

Definition

If Ax=λx holds and x is nonzero, then

  1. x is called an (_b_) of A and λ is called an (_a_)

  2. (x,λ) is called an (_c_)

<p>Let <span>A</span> be a square matrix</p><p>Think of <span>A</span> as representing a mapping <span><em>x↦Ax</em></span>, (this is a linear function)</p><p>But sometimes <span>x</span> will only be <strong>scaled</strong>:</p><p><em>Ax=λx </em>for some scalar&nbsp;<em>λ</em></p><p class="admonition-title"><strong>Definition</strong></p><p>If <span><em>Ax=λx</em></span> holds and <span>x</span> is nonzero, then</p><ol><li><p><span>x</span> is called an <strong>(_b_)</strong> of <span><em>A</em></span> and <span><em>λ</em></span> is called an <strong>(_a_)</strong></p></li><li><p><span><em>(x,λ)</em></span> is called an <strong>(_c_)</strong></p></li></ol><p></p>
New cards
35

Eigenvalues and determinants

For any square matrix A

λ is an eigenvalue of A ⟺ det(A−λI) = 0

New cards
36

Define
a) characteristic polynomial
b) characteristic equation

The polynomial det(A−λI) is called a (_a_) of A.

The roots of the (_b_) det(A−λI)=0 determine all eigenvalues of A.

New cards
37

Define similar (matrix)

Square matrix A is said to be (_) to square matrix B if there exist an invertible matrix P such that A=PBP−1.

If the linear map is the same, we must have

Ax=PBP−1x

_images/diagonalize.png

New cards
38

Fact

Given N×N matrix A with distinct eigenvalues λ1,…,λN we have

  • If A=diag(d1,…,dN), then λn=dn for all n

  • det(A)=∏n=1Nλn

  • If A is symmetric, then λn∈R for all n (not complex!)

Proof

Fact A is nonsingular all eigenvalues are nonzero

Fact If A is nonsingular, then eigenvalues of A−1 are 1/λ1,…,1/λN

New cards
39

Conic sections

An equation describing every conic section is a quadratic equation in two variables.

_images/conic.png

Fig. 68 Conic sections#

New cards
40

Define Hyperbola

(_) is a collection of points (x,y) ∈ R2, the difference of distances from which to two fixed points (termed foci) is constant.

_images/hyperbola.png

New cards
41

Define Quadratic form

(_) form in a function Q:RN→R

Q(x)=∑Ni=1Nj=1aijxixj

(_) form Q:RN→R can be equivalently represented as a matrix product

Q(x)=xTAx

where x∈RN is a column vector, and (N×N) symmetric matrix A consists of elements {aij}, i,j=1,…,N

<p><strong>(_)</strong> form in a function <span><em>Q:R<sup>N</sup>→R</em></span></p><p><em>Q(x)=∑<sup>N</sup><sub>i=1</sub>∑<sup>N</sup><sub>j=1</sub>a<sub>ij</sub>x<sub>i</sub>x<sub>j</sub></em></p><p><strong>(_)</strong> form <span><em>Q:R<sup>N</sup>→R</em></span> can be equivalently represented as a matrix product</p><p><em>Q(x)=x<sup>T</sup>Ax</em></p><p>where <span><em>x∈R<sup>N</sup></em></span><em> </em>is a column vector, and <span><em>(N×N)</em></span><em> </em><strong>symmetric</strong> matrix <span>A</span> consists of elements <span><em>{a<sub>ij</sub>}</em></span>, <span><em>i,j=1,…,N</em></span></p>
New cards
42

Changing bases

Fact By change of bases any quadratic form Q(x) with a symmetric real matrix A can be transformed to a sum of squares with positive or negative coefficients

Q(x)=xTAx = xTPTDPx =(Px)TD(Px)=∑Ni=1λi(Px)2i

where λi are the eigenvalues and P is the matrix of eigenvectors of A.

_images/ellipse_rotated.png

Fig. 69 Change of bases to convert the quadratic form to a canonical form

New cards
43

Define
a) positive definite
b) negative definite

Quadratic form Q(x) is called (_a_) if

Q(x)=xTAx>0 for all x≠0

Quadratic form Q(x) is called (_b_) if

Q(x)=xTAx<0 for all x≠0

  • the examples above show these cases

  • note that because every quadratic form can be represented by a linear combination of squares, the characteristic properties are the signs of the coefficients in the canonical form!

New cards
44

Law of inertia

Fact (law_)

The number of coefficients of a given sign in the canonical form of a quadratic form does not depend on the choice of the basis

New cards
45

Define Definite

Positive and negative definite quadratic forms are called (_)

Note that for some matrices it may be that

  • for some x: xTAx < 0

  • for some x: xTAx > 0

In this case A is called indefinite

_images/qform_indef.png

Fig. 72 Indefinite quadratic function

Q(z)= x21 /2+8x1x2+x22 /2

New cards
46

Define positive semi-definite (aka. non-negative definite)

Quadratic form Q(x) is called non-negative definite (positive semi-definite) if

Q(x)=xTAx⩾0 for all x≠0

Quadratic form Q(x) is called non-positive definite (negative semi-definite) if

Q(x)=xTAx⩽0 for all x≠0

New cards
47

Facts definite(ness)

Fact: If quadratic form xTAx is positive definite, then det

Fact (definiteness)

Let A be an (N×N) symmetric square matrix. The corresponding quadratic form Q(x)=xTAx is positive definite if and only if all leading principal minors of A are positive

xTAx>0 for all x≠0⟺Mk>0 for all k=1,…,N

Quadratic form Q(x)=xTAx is negative definite if and only if the leading principal minors of A alternate in sign according to the following rule

xTAx<0 for all x≠0⟺(−1)kMk>0 for all k=1,…,N

New cards
48

Fact (semi-definiteness)

Fact (semi-definiteness)

Let A be an (N×N) symmetric square matrix. The corresponding quadratic form Q(x)=xTAx is positive semi-definite if and only if all principal minors of A are non-negative.

xTAx⩾0 for all x≠0⟺∀Pk:Pk⩾0 for all k=1,…,N

Quadratic form Q(x)=xTAx is negative semi-definite if and only if all principal minors of oder k of A are zero or have the same sign as (−1)k

xTAx⩽0 for all x≠0⟺∀Pk:(−1)kPk⩾0 for all k=1,…,N

New cards
49

Components of the optimization problem

  1. Objective function: function to be maximized or minimized, also known as maximand
    In the example above profit function π(p,q)=pq−C(q) to be maximized

  2. Decision/choice variables: the variables that the agent can control in order to optimize the objective function, also known as controls
    In the example above price p and quantity q variables that the firm can choose to maximize its profit

  3. Equality constraints: restrictions on the choice variables in the form of equalities
    In the example above q=α+14αp2−p

  4. Inequality constraints (weak and strict): restrictions on the choice variables in the form of inequalities
    In the example above q≥0 and p>0

  5. Parameters: the variables that are not controlled by the agent, but affect the objective function and/or constraints
    In the example above α, β and δ are parameters of the problem

  6. Value function: the “optimized” value of the objective function as a function of parameters
    In the example above Π(α,β,δ) is the value function

New cards
50

Define the general form of the optimisation problem

The (_) is

V(θ)=max(x) f(x,θ) subject to gi(x,θ)=0, i∈{1,…,I}hj(x,θ)≤0, j∈ {1,…,J}

where:

  • f(x,θ): RN×RK→R is an objective function

  • x∈RN are decision/choice variables

  • θ∈RK are parameters

  • gi(x,θ)=0,i ∈ {1,…,I} where gi: RN×RKR, are equality constraints

  • hj(x,θ)≤0, j∈{1,…,J} where hj: RN×RKR, are inequality constraints

  • V(θ):RKR is a value function

<p>The <strong>(_) </strong>is</p><p><em>V(θ)=max<sub>(x) </sub>f(x,θ)</em> subject to <em>g<sub>i</sub>(x,θ)=0, i∈{1,…,I}hj(x,θ)≤0, j∈ {1,…,J}</em></p><p>where:</p><ul><li><p><span><em>f(x,θ): R<sup>N</sup>×R<sup>K</sup>→R</em></span> is an objective function</p></li><li><p><span><em>x∈</em><strong><em>R</em></strong><em><sup>N</sup></em></span> are decision/choice variables</p></li><li><p><span><em>θ∈</em><strong><em>R</em></strong><em><sup>K</sup></em></span> are parameters</p></li><li><p><span><em>g<sub>i</sub>(x,θ)=0,i ∈ {1,…,I}</em></span> where <span><em>g<sub>i</sub>: </em><strong><em>R</em></strong><em><sup>N</sup>×</em><strong><em>R</em></strong><em><sup>K</sup>→</em><strong><em>R</em></strong></span><em>, </em>are equality constraints</p></li><li><p><span><em>h<sub>j</sub>(x,θ)≤0, j∈{1,…,J}</em></span> where <span><em>h<sub>j</sub>: </em><strong><em>R</em></strong><em><sup>N</sup>×</em><strong><em>R</em></strong><em><sup>K</sup>→</em><strong><em>R</em></strong></span>, are inequality constraints</p></li><li><p><span>V(θ):<strong><em>R</em></strong><sup>K</sup>→<strong>R</strong></span> is a value function</p></li></ul><p></p>
New cards
51

Define the set of admissible choices (admissible set)

The set of (_) contains all the choices that satisfy the constraints of the optimization problem.

New cards
52

Reminder of definitions

A point x∗∈[a,b] is called a

  • maximizer of f on [a,b] if f(x∗)≥f(x) for all x∈[a,b]

  • minimizer of f on [a,b] if f(x∗)≤f(x) for all x∈[a,b]

Point x is called interior to [a,b] if a<x<b

A stationary point of f on [a,b] is an interior point x with f′(x)=0

_images/stationary.png

Fig. 73 Both x∗ and x∗∗ are stationary

New cards
53

Define stationary point

Definition

Given a function f: RNR, a point x∈RN is called a stationary point of f if ∇f(x)=0

Definition

Given a function f: RN R, a point x*∈RN is called a local maximiser/minimiser of f if ∃ϵ such that f(x)≤ f(x*) for all x∈Bϵ(x*)

If the inequality is strict, then x* is called a strict local maximiser/minimiser of f

A maximiser/minimiser (global) must also be a local one, but the opposite is not necessarily true.

New cards
54

State the necessary condition for optima

Let f(x,θ): RNR be a differentiable function and let x*∈ RN be a local maximiser/minimiser of f.

Then x* is a stationary point of f, that is ∇f(x*)=0

New cards
55

saddle point

  • saddle point where the FOC hold, yet the point is not a local maximizer/minimizer!

  • Similar to x=0 in f(x)=x3: derivative is zero, yet the point is not an optimizer

  • How to distinguish saddle points from optima? Key insight: the function has different second order derivatives in different directions!

  • need SOC to distinguish

New cards
56

SOC

  • do not give definitive answer in all cases, unfortunately

Fact (necessary SOC)

Let f(x): RNR be a twice continuously differentiable function and let x*∈ RN be a local maximiser/minimiser of f. Then:

  • f has a local maximum at x*⟹Hf(x*) is negative semi-definite

  • f has a local minimum at x*⟹Hf(x*) is positive semi-definite

  • recall the definition of semi-definiteness

Fact (sufficient SOC)

Let f(x): RNR be a twice continuously differentiable function. Then:

  • if for some x*∈ RN ∇f(x*)=0 (FOC satisfied) and Hf(x*) is negative definite, then x* is a strict local maximum of f

  • if for some x*∈RN ∇f(x*)=0 (FOC satisfied) and Hf(x*) is positive definite, then x* is a strict local minimum of f

  • observe that SOC are only necessary in the “weak” form, but are sufficient in the “strong” form

  • this leaves room for ambiguity when we can not arrive at a conclusion — particular stationary point may be a local maximum or minimum

  • but we can rule out saddle points for sure, in this case neither semi-definiteness nor definiteness can be established, the Hessian is indefinite

New cards
57

SOC provide:

Fact Given a twice continuously differentiable function f: R2 → R and a stationary point x*: ∇f(x*)=0, the second order conditions provide:

  1. if det(Hf(x*))>0 and trace(Hf(x*))>0 ⟹

    • λ1>0 and λ2>0,

    • Hf(x*) is positive definite,

    • f has a strict local minimum at x*

  2. if det(Hf(x*))>0 and trace(Hf(x*))<0 ⟹

    • λ1<0 and λ2<0,

    • Hf(x*) is negative definite,

    • f has a strict local maximum at x*

  3. if det(Hf(x*))=0 and trace(Hf(x*))>0 ⟹

    • λ1=0 and λ2>0,

    • Hf(x*) is positive semi-definite (nonnegative definite),

    • f may have a local minimum x*

    • undeceive!

  4. if det(Hf(x*))=0 and trace(Hf(x*))<0 ⟹

    • λ1=0 and λ2<0,

    • Hf(x*) is negative semi-definite (nonpositive definite),

    • f may have a local maximum x*

    • undeceive!

  5. if det(Hf(x*))<0

    • λ1 and λ2 have different signs,

    • Hf(x*) is indefinite,

    • x* is a saddle point of f

New cards
58

Define convex (set)

Definition

A set C⊂RK is called (_) if

x,y in C and 0≤λ≤1 ⟹ λx+(1−λ)y ∈ C

Convexity line between any two points in C lies in C

_images/convex.png

Fig. 74 Convex set

A non-convex set

_images/non_convex.png

Fig. 75 Non-convex set

New cards
59

Fact about intersection of convex sets

Fact If A and B are convex, then so is A∩B

New cards
60

Define a convex function

f is called (_) if

f(λx+(1−λ)y) ≤ λf(x)+(1−λ)f(y)

for all x,y∈A and all λ∈[0,1]

f is called strictly convex if

f(λx+(1−λ)y)<λf(x)+(1−λ)f(y)

for all x,y∈A with x≠y and all λ∈(0,1)

_images/convex_function.png

Fig. 76 A strictly convex function on a subset of R#

New cards
61

Fact about convex function

Fact If A is K×K and positive definite, then

Q(x)=x′Ax(x∈RK)

is strictly convex on RK

New cards
62

Define a concave function

f is called concave if

f(λx+(1−λ)y) ≥ λf(x)+(1−λ)f(y)

for all x,y∈A and all λ∈[0,1]

f is called strictly concave if

f(λx+(1−λ)y) > λf(x)+(1−λ)f(y)

for all x,y∈A with x≠y and all λ∈(0,1)

Fact

f:A→R is concave if and only if its hypograph (aka subgraph)

Hf:={(x,y)∈A×R:f(x)≥y}

is a convex subset of RK×R

_images/hypograph.png

Fig. 79 Convex hypograph/subgraph of a function#

New cards
63

Preservation of Shape

Fact If f and g are convex (resp., concave) and α≥0, then

  • αf is convex (resp., concave)

  • f+g is convex (resp., concave)

If f and g are strictly convex (resp., strictly concave) and α>0, then

  • αf is strictly convex (resp., strictly concave)

  • f+g is strictly convex (resp., strictly concave)

New cards
64

Hessian based shape criterion

If f: A→R is a C2 function where A⊂RK is open and convex, then

  1. Hf(x) positive semi-definite (nonnegative definite) for all x∈A ⟺f convex

  2. Hf(x) negative semi-definite (nonpositive definite) for all x∈A ⟺f concave

In addition,

  1. Hf(x) positive definite for all x∈A ⟹f strictly convex

  2. Hf(x) negative definite for all x∈A ⟹f strictly concave

New cards
65

Advantages of convexity/concavity (two)

  1. Sufficiency of first order conditions for optimality

  2. Uniqueness of optimizers with strict convexity/concavity

Let A⊂RK be convex set and let f: A→R be concave/convex differentiable function.

Then x*∈A is a minimizer/minimizer of f on A if and only if x* is a stationary point of f on A.

New cards
66

Uniqueness

Let A⊂RK be convex set and let f: A→R

  1. If f is strictly convex, then f has at most one minimizer on A

  2. If f is strictly concave, then f has at most one maximizer on A

Interpretation, strictly concave case:

  • we don’t know in general if f has a maximizer

  • but if it does, then it has exactly one

  • in other words, we have uniqueness

New cards
67

Sufficient and necessary conditions (sets)

Let A⊂RK be convex set, let f: A→R, and x*∈A stationary, then

  1. f strictly concave x∗ is the unique maximizer of f on A

  2. f strictly convex x∗ is the unique minimizer of f on A

_images/concave_max.png

Fig. 80 Unique maximizer for a strictly concave function

New cards
68

Roadmap for unconstrained optimization

  1. Assuming that the domain of the objective function is the whole space, check that it is continuous and for derivative based methods twice continuously differentiable

  2. Derive the gradient and Hessian of the objective function. If it can be shown that the function is globally concave or convex, checking SOC (below) is not necessary, and global optima can be established

  3. Find all stationary points by solving FOC

  4. Check SOC to determine whether found stationary points are (local) maxima or minima and filter out saddle points

  5. Assess whether any of the local optima are global by comparing function values at these points, and inspecting the shape of the function to determine if its value anywhere exceeds the best local optimum (limits at the positive and negative infinities, possible vertical asymptotes, etc.)

New cards
69

Define the general form of the optimization problem

The (_) is

V(θ)=max(x) f(x,θ) subject to
gi(x,θ) = 0, i∈{1,…,I}
hj(x,θ) ≤ 0, j∈{1,…,J}

where:

  • f(x,θ): RN×RK R is an objective function

  • x∈RN are decision/choice variables

  • θ∈RK are parameters

  • gi(x,θ) = 0,i∈{1,…,I} where gi:RN×RKR, are equality constraints

  • hj(x,θ) ≤ 0,j∈{1,…,J} where hj:RN×RKR, are inequality constraints

  • V(θ): RK R is a value function

New cards
70

State the Lagrange theorem

Let f: RN R and g: RNRK be continuously differentiable functions.

Let D={x: gi(x)=0, i=1,…,K}⊂RN

Suppose that

  • x*∈D is a local extremum of f on D, and

  • the gradients of the constraint functions gi are linearly independent at x* (equivalently, the rank of the Jacobian matrix Dg(x*) is equal to K)

Then there exists a vector λ* ∈ RK such that

Df(x*)−λ*⋅Dg(x*)=Df(x*)−∑Ki=1 λi*Dgi(x*)=0

New cards
71

Note on Lagrange

  • λ* is called the vector of Lagrange multipliers

  • The assumption that the gradients of the constraint functions gi are linearly independent at x* is called the constraint qualification assumption

  • Lagrange theorem provides necessary first order conditions for both maximum and minimum problems

New cards
72

Define the Lagrangian function

Definition

The function L: RN+KR defined as

L(x,λ)=f(x)−λ⋅g(x)=f(x)−∑Ki=1 λigi(x)

is called a Lagrangian (function)

New cards
73

Hessian of the Lagrangian

Using the rules of the differentiation from multivariate calculus we can express the gradient and the Hessian of the Lagrangian w.r.t. x as

xL(λ,x)=Df(x)−λ⋅Dg(x)HxL(λ,x)=Hf(x)−λ⊙Hg(x)

  • where ⊙ denotes multiplication of the vector λ and the Hessian (K,N,N) tensor Hg(x) along the first dimension

New cards
74

Fact (necessary SOC)

Let f:RN R and g: RNRK be twice continuously differentiable functions (C2) and let D = {x: gi(x)=0,i=1,…,K} ⊂ RN

Suppose x* ∈ D is the local constrained optimizer of f on D and that the constraint qualification assumption holds at x* and some λ* ∈ RK

Then:

  • If f has a local maximum on D at x*, then HxL(λ,x) defined above is negative semi-definite on a linear constraint set {v: Dg(x*)v=0}, that is

∀v ∈ RN and Dg(x*)v=0 v′HxL(λ,x)v≤0

  • If f has a local minimum on D at x*, then HxL(λ,x) defined above is positive semi-definite on a linear constraint set {v: Dg(x*)v=0}, that is

∀v ∈ RN and Dg(x*)v=0 ⟹ v′HxL(λ,x)v ≥ 0

New cards
75

Fact (sufficient SOC)

Let f: RN R and g: RN RK be twice continuously differentiable functions (C2) and let D={x: gi(x)=0,i=1,…,K} ⊂ RN

Suppose there exists x* ∈ D such that rank(Dg(x⋆))=K and Df(x*)−λ*⋅Dg(x*) = 0 for some λ* ∈ RK (the constraint qualification assumption holds at x* which satisfies the FOC)

Then:

  • If HxL(λ,x) defined above is negative definite on a linear constraint set {v: Dg(x*)v=0}, that is

∀v ≠ 0 and Dg(x*)v = 0 ⟹ v′HxL(λ,x)v < 0,

then x* is the local constrained maximum of f on D

  • If HxL(λ,x) defined above is positive definite on a linear constraint set {v: Dg(x*)v=0}, that is

∀v ≠ 0 and Dg(x*)v=0⟹v′HxL(λ,x)v > 0

then x* is the local constrained minimum of f on D

New cards
76

Define bordered Hessian

Definition

The Hessian of the Lagrangian with respect to (λ,x) is called a (_), and is given by

HL(λ,x) = ([0]K×K [−Dg(x)]K×N
[−Dg(x)]’N×K [Hf(x)−λ⊙Hg(x)]N×N)

<p>Definition</p><p>The Hessian of the Lagrangian with respect to <span>(λ,x)</span> is called a <strong>(_)</strong>, and is given by</p><p>H<u>L</u>(λ,x) = ([0]<sub>K×K    </sub>[−Dg(x)]<sub>K×N</sub><br>[−Dg(x)]’<sub>N×K</sub>         [Hf(x)−λ⊙Hg(x)]<sub>N×N</sub>)</p><p></p>
New cards
77

Define
a) positive definite subject to a linear constraint
b) negative definite subject to a linear constraint

Definition

Let x ∈ RN, A is (N×N) symmetric matrix and B is (K×N) matrix.

Quadratic form Q(x) is called positive definite subject to a linear constraint Bx=0 if

Q(x)=xTAx>0 for all x≠0 and Bx=0

Quadratic form Q(x) is called negative definite subject to a linear constraint Bx=0 if

Q(x)=xTAx<0 for all x≠0 and Bx=0

Positive and negative semi-definiteness subject to a linear constraint is obtained by replacing strict inequalities with non-strict ones.

recall that Bx=0 is equivalent to saying that x is in the null space (kernel) of B

New cards
78

Definiteness of quadratic form subject to linear constraint table

In order to establish definiteness of the quadratic form xTAx, x∈RN, subject to a linear constraint Bx=0, we need to check signs of the N−K leading principle minors of the (N−K×N−K) matrix E which represents the equivalent lower dimension quadratic form without the constraint. The signs of these leading principle minors correspond to (−1)K times the signs of the determinant of the corresponding bordered matrix H, where a number of last rows and columns are dropped, according to the following table:

Order j of leading principle minor in E

Include first rows/columns of H

Drop last rows/columns of H

1

2K+1

N−K−1

2

2K+2

N−K−2

N−K−1

N+K−1

1

N−K

N+K

0

Clearly, this table has N−K rows, and it is N−K determinants that have to be computed.

The definiteness of the quadratic form subject to the linear constraint is then given by the pattern of signs of these determinants according to the Silvester’s criterion.

Note that computing determinants of H after dropping a number of last rows and columns is effectively computing last leading principle minors of H.

If any of the leading principle minors turns out to be zero, we can move on and check the patters for the semi-definiteness.

New cards
79

Semi-definiteness of quadratic form subject to linear constraint

In order to establish semi-definiteness of the quadratic form xTAx, x ∈ RN, subject to a linear constraint Bx=0, we need to check signs of the principle minors of the (N−K×N−K) matrix E which represents the equivalent lower dimension quadratic form without the constraint. The signs of these principle minors correspond to (−1)K times the signs of the determinant of the corresponding bordered matrix H, where a number number of rows and columns are dropped from among the last N−K rows and columns (in all combinations), according to the following table:

Order j of principle minor in E

Number of principle minors

Drop among last N − K rows/columns

1

N−K

N−K−1

2

C(N−K,2)

N−K−2

N−K−1

N−K

1

N−K

1

0

The table again has N−K rows, but each row corresponds to a different number of principle minors of a given order, namely for order j there are (N−k)-choose-j principle minors given by the corresponding number of combinations C(N−K, j).

The total number of principle minors to be checked is 2N−K−1, given by the sum of binomial coefficients

The definiteness of the quadratic form subject to the linear constraint is then given by the pattern of signs of these determinants according to the Silvester’s criterion.

In what follows we will refer to computing determinants of H after dropping a number of rows and columns among the last N−K rows and columns as computing last principle minors of H. The number of last principle minors will correspond to the number of lines in the table above, and not the number of determinants to be computed.

New cards
80

Fact (definiteness of Hessian on linear constraint set)

Fact

For f: RN R and g: RN RK twice continuously differentiable functions (C2):

HxL(λ,x) is positive definite on a linear constraint set {v: Dg(x*)v = 0} the last N−K leading principal minors of HL(λ,x) are non-zero and have the same sign as (−1)K.

HxL(λ,x) is negative definite on a linear constraint set {v: Dg(x*)v=0} the last N−K leading principal minors of HL(λ,x) are non-zero and alternate in sign, with the sign of the full Hessian matrix equal to (−1)N.

HxL(λ,x) is positive semi-definite on a linear constraint set {v:Dg(x*)v=0} the last N−K principal minors of HL(λ,x) are zero or have the same sign as (−1)K.

HxL(λ,x) is negative semi-definite on a linear constraint set {v:Dg(x*)v=0} the last N−K principal minors of HL(λ,x) are zero or alternate in sign, with the sign of the full Hessian matrix equal to (−1)N.

  • see above the exact definition of the last principle minors and how to count them

New cards
81

Fact (necessary SOC)

Let f: RN R and g: RNRK be twice continuously differentiable functions (C2), and let D={x: gi(x)=0,i=1,…,K} ⊂ RN

Suppose x* ∈ D is the local constrained optimizer of f on D and that the constraint qualification assumption holds at x* and some λ* ∈ RK

Then:

  • If f has a local maximum on D at x*, then the last N−K principal minors of bordered Hessian HL(λ,x) are zero or alternate in sign, with the sign of the determinant of the full Hessian matrix equal to (−1)N

  • If f has a local minimum on D at x*, then the last N−K principal minors of bordered Hessian HL(λ,x) are zero or have the same sign as (−1)K

  • see above the exact definition of the last principle minors and how to count them

New cards
82

Define sufficient SOC for function

Let f: RN R and g: RN RK be twice continuously differentiable functions (C2) and let D = {x:gi(x)=0,i=1,…,K} ⊂ RN

Suppose there exists x* ∈ D such that rank(Dg(x*))=K and Df(x*)−λ*⋅Dg(x*)=0 for some λ* ∈ RK (i.e. the constraint qualification assumption holds at (x*,λ*) which satisfies the FOC)

Then:

  • If the last N−K leading principal minors of the bordered Hessian HL(λ,x) are non-zero and alternate in sign, with the sign of the determinant of the full Hessian matrix equal to (−1)N, then x* is the local constrained maximum of f on D

  • If the last N−K leading principal minors of the bordered Hessian HL(λ,x) are non-zero and have the same sign as (−1)K, then x* is the local constrained minimum of f on D

New cards
83

Lagrangian method: algorithm

  1. Write down the Lagrangian function L(x,λ)

  2. Find all stationary points of L(x,λ) with respect to x and λ, i.e. solve the system of first order equations

  3. Derive the bordered Hessian HL(x,λ) and compute its value at the stationary points

  4. Using second order conditions, check if the stationary points are local optima

  5. To find the global optima, compare the function values at all identified local optima

  • note that using convexity/concavity of the objective function is not as straightforward as in the unconstrained case, requires additional assumptions on the convexity of the constrained set

New cards
84

Lagrange method may not work when:

  • Constraints qualification assumption is not satisfied at the stationary point

  • FOC system of equations does not have solutions (thus no local optima)

  • Saddle points – second order conditions can help

  • Local optimizer is not a global one – no way to check without studying analytic properties of each problem

Conclusion: great tool, but blind application may lead to wrong results!

New cards
85

Karush-Kuhn-Tucker conditions (maximization)

Let f:RN→R and g:RN→RK be continuously differentiable functions.

Let D={x: gi(x)≤0, i=1,…,K} ⊂ RN

Suppose that x⋆∈D is a local maximum of f on D and that the gradients of the constraint functions gi corresponding to the binding constraints are linearly independent at x⋆ (equivalently, the rank of the matrix composed of the gradients of the binding constraints is equal to the number of binding constraints).

Then there exists a vector λ⋆∈RK such that

Df(x⋆)−λ⋆⋅Dg(x⋆)=Df(x⋆)−∑Ki=1λ*iDgi(x*)=0

and

λi⋆≥0 with λi⋆gi(x*)=0 i=1,…,K

New cards
86

Karush-Kuhn-Tucker conditions (miminization)

In the settings of KKT conditions (maximization), suppose that x⋆∈D is a local minimum of f on D, and as before the matrix composed of the gradients of the binding constraints has full rank.

Then there exists a vector λ*∈RK such that (note opposite sign)

Df(x*)+λ*⋅Dg(x*)=Df(x*)+∑Ki=1λ*iDgi(x*)=0

and

λ*i ≥ 0 with λ*igi(x*)=0 i=1,…,K

  • Very similar to the Lagrange theorem, but now we have inequalities!

New cards
87

The general form of the optimisation problem

V(θ)=max(x)f(x,θ) subject to
gi(x,θ) = 0, i∈{1,…,I}
hj(x,θ) ≤ 0, j∈{1,…,J}

where:

  • f(x,θ):RN×RK R is an objective function

  • x ∈ RN are decision/choice variables

  • θ ∈ RK are parameters

  • gi(x,θ)=0, i ∈ {1,…,I} where
    gi:RN×RK R, are equality constraints

  • hj(x,θ)≤0, j ∈ {1,…,J} where
    hj:RN×RK → R, are inequality constraints

  • V(θ): RKR is a value function

New cards
88

Envelope theorem for unconstrained problems

Let f(x,θ): RN×RKR be a differentiable function, and x*(θ) be the maximizer of f(x,θ) for every θ. Suppose that x*(θ) is differentiable function itself. Then the value function of the problem V(θ)=f(x*(θ),θ) is differentiable w.r.t. θ and

∂V/∂θj= (∂f/∂θj) (x*(θ),θ),∀j

<p>Let <span><em>f(x,θ): </em><strong><em>R</em></strong><em><sup>N</sup>×</em><strong><em>R</em></strong><em><sup>K</sup>→ </em><strong><em>R</em></strong></span> be a differentiable function, and <span><em>x*(θ)</em></span> be the maximizer of <span><em>f(x,θ)</em></span> for every <span>θ</span>. Suppose that <span><em>x*(θ)</em></span> is differentiable function itself. Then the value function of the problem <span><em>V(θ)=f(x*(θ),θ)</em></span> is differentiable w.r.t. <span><em>θ</em></span> and</p><p>∂V/∂θ<sub>j</sub>= (∂f/∂θ<sub>j</sub>)<sub> </sub>(x*(θ),θ),∀j</p>
New cards
89

Envelope theorem for constrained problems

Consider an equality constrained optimization problem

V(θ)=max(x)f(x,θ) subject to gi(x,θ)=0,i∈{1,…,I}

where:

  • f(x,θ): RN×RK→R is an objective function

  • x ∈ RN are decision/choice variables

  • θ ∈ RK are parameters

  • gi(x,θ)=0, i∈{1,…,I} where gi: RN×RK R, are equality constraints

  • V(θ): RKR is a value function

Assume that the maximizer correspondence D*(θ) = argmax f(x,θ) is single-valued and can be represented by the function x*(θ): RK RN, with the corresponding Lagrange multipliers λ*(θ): RKRI.

Assume that both x*(θ) and λ*(θ) are differentiable, and that the constraint qualification assumption holds. Then

∂V/∂θj=(∂L/∂θj)(x*(θ),λ*(θ),θ), ∀j

where L(x,λ,θ) is the Lagrangian of the problem.

New cards

Explore top notes

note Note
studied byStudied by 13 people
... ago
5.0(1)
note Note
studied byStudied by 17 people
... ago
5.0(2)
note Note
studied byStudied by 64 people
... ago
5.0(2)
note Note
studied byStudied by 23 people
... ago
5.0(1)
note Note
studied byStudied by 29 people
... ago
5.0(2)
note Note
studied byStudied by 8 people
... ago
4.0(1)
note Note
studied byStudied by 14 people
... ago
5.0(1)
note Note
studied byStudied by 123 people
... ago
5.0(2)

Explore top flashcards

flashcards Flashcard (46)
studied byStudied by 16 people
... ago
5.0(1)
flashcards Flashcard (114)
studied byStudied by 16 people
... ago
5.0(1)
flashcards Flashcard (330)
studied byStudied by 5 people
... ago
5.0(1)
flashcards Flashcard (30)
studied byStudied by 9 people
... ago
5.0(1)
flashcards Flashcard (48)
studied byStudied by 28 people
... ago
5.0(1)
flashcards Flashcard (113)
studied byStudied by 8 people
... ago
5.0(1)
flashcards Flashcard (20)
studied byStudied by 5 people
... ago
5.0(1)
flashcards Flashcard (82)
studied byStudied by 7 people
... ago
5.0(1)
robot