ECON Maths pt 2

5.0(1)
studied byStudied by 1 person
5.0(1)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/88

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

89 Terms

1
New cards

Define limit of a function due to Heine

Given the function f: A⊂RN→R, a∈R is a limit of f(x) as x→x0∈A if

∀{xn}∈A: lim(n→∞)xn=x0∈A ⟹ f(xn)→a

2
New cards

Define limit of a function due to Cauchy

Given the function f:A⊂RN→R, a∈R is a limit of f(x) as x→x0∈A if

∀ϵ>0, ∃δ>0 such that 0<|x−x0|<δ⟹|f(x)−a|<ϵ

  • note the similarity to the definition of the sequence

  • note that in the ϵ-ball around x0 the point x0 is excluded

_images/func_lim_epsilondelta.png

Fig. 45 limx→x0f(x)=L under the Cauchy definition#
Fact: Cauchy and Heine definitions of the function limit are equivalent

3
New cards

quotient / inverse function rules

knowt flashcard image
4
New cards

differentiability relationship with continuity

Differentiability ⟹ Continuity

Differentiability ⟸̸ Continuity

5
New cards

Little-o

Little-o notation is used to describe functions that approach zero faster than a given function

f(x)=o(g(x)) as x → a ⟺ lim(x→a)f(x)g(x)=0

6
New cards

Define total derivative (Jacobian)

If a vector-valued function f: RNRK is differentiable at x, then all partial derivatives at x exist and the (_) is given by

(matrix - see image)

Note:

  • note that partial derivatives of a vector-valued function f are indexed with two subscripts: i for f and j for x

  • index i∈{1,…,K} indexes functions and rows of the Jacobian matrix

  • index j∈{1,…,N} indexes variables and columns of the Jacobian matrix

  • Jacobian matrix is (K×N), i.e. number of functions in the tuple by the number of variables

  • the gradient is a special case of Jacobian with K=1, as expected!

Sufficient conditions for differentiability of a function f: RNRK at x are that all partial derivatives exist and are continuous in x

<p>If a vector-valued function <span><em>f: </em><strong><em>R</em></strong><em><sup>N</sup>→</em><strong><em>R</em></strong><em><sup>K</sup></em></span><em> </em>is differentiable at <span><em>x</em></span>, then all partial derivatives at <span>x</span> exist and the <strong>(_)</strong> is given by </p><p>(matrix - see image)</p><p>Note:</p><ul><li><p>note that partial derivatives of a vector-valued function <span>f</span> are indexed with two subscripts: <span>i</span> for <span>f</span> and <span>j</span> for <span>x</span></p></li><li><p>index <span>i∈{1,…,K}</span> indexes functions and rows of the Jacobian matrix</p></li><li><p>index <span>j∈{1,…,N}</span> indexes variables and columns of the Jacobian matrix</p></li><li><p>Jacobian matrix is <span>(K×N)</span>, i.e. number of functions in the tuple by the number of variables</p></li><li><p>the gradient is a special case of Jacobian with <span>K=1</span>, as expected!</p></li></ul><p>Sufficient conditions for differentiability of a function <span><em>f: </em><strong><em>R</em></strong><em><sup>N</sup>→</em><strong><em>R</em></strong><em><sup>K</sup></em></span><em> </em>at <span><em>x</em></span> are that all partial derivatives exist and are continuous in <em>x</em><br></p>
7
New cards

Rules of differentiation multivariate calculus - Sum and scaling

Let A denote an open set in RN and let f,g: A→RK be differentiable functions at x∈A.

  1. f+g is differentiable at x and D(f+g)(x)=Df(x)+Dg(x)

  2. cf is differentiable at x and D(cf)(x)=cDf(x) for any scalar c

8
New cards

Define outer product

Outer product of two vectors x∈RN and y∈RK is a N×K matrix given by

xy′=(x1⋮xN)(y1,…,yK)=(x1y1⋯x1yK⋮⋱⋮xNy1⋯xNyK)

Note: Potential confusion with products continues: in current context scalar product is a multiplication with a scalar (and not the product that produces a scalar as before in dot product of vectors)

9
New cards

Dot product rule

Fact: Dot product rule

Let A denote an open set in RN and let f,g:A→RK be both differentiable functions at x∈A. Assume that both functions output column vectors.

Then the dot product function f⋅g:A→R is differentiable at x and

∇(f⋅g)(x) = f(x)TDg(x)+g(x)TDf(x),

where T denotes transposition of a vector.

Observe that:

  • both Dg(x) and Df(x) are (K×N) matrices

  • both f(x)T and g(x)T are (1×K) row-vectors

  • therefore, both f(x)TDg(x) and g(x)TDf(x) are (1×N) row-vectors

10
New cards

Chain rule

Fact: Chain rule

Let

  • A denote an open set in RN

  • B denote an open set in RK

  • f: A→B be differentiable at x∈A

  • g: B→RL be differentiable at f(x)∈C

Then the composition g∘f is differentiable at x and its Jacobian is given by

D(g∘f)(x)=Dg(f(x))Df(x)

11
New cards

Define Hessian

Let A denote an open set in RN, and let f:A→R. Assume that f is twice differentiable at x∈A.

The total derivative of the gradient of function f at point x, ∇f(x) is called the (_) matrix of f denoted by Hf or ∇2f, and is given by a N×N matrix

<p>Let <span><em>A</em></span> denote an open set in <span><em>R<sup>N</sup></em></span>, and let <span><em>f:A→R</em></span><em>. </em>Assume that <span>f</span> is twice differentiable at <span><em>x∈A</em></span>.</p><p>The total derivative of the gradient of function <span>f</span> at point <span><em>x</em></span><em>, </em><span><em>∇f(x)</em></span><em> </em>is called the <strong>(_)</strong> matrix of <span><em>f</em></span> denoted by <span><em>Hf</em></span><em> or </em><span><em>∇2f</em></span>, and is given by a <span><em>N×N</em></span> matrix</p>
12
New cards

State Young’s Theorem

Fact (Young’s theorem)

For every x∈A ⊂ RN where A is an open and f:A→RN is twice continuously differentiable, the Hessian matrix ∇2f(x) is symmetric

13
New cards

State the Clairaut Theorem

Fact (Clairaut theorem)

If all the k-th order partial derivatives of f exist and are continuous in a neighbourhood of x, then the order of differentiation does not matter, and the mixed partial derivatives are equal

14
New cards

Define and state properties of dot product

The dot product of two vectors x,y∈RN is

x⋅y=xTy=∑Nn=1xnyn

Fact: Properties of dot product

For any α,β∈R and any x,y∈RN, the following statements are true:

  1. xTy=yTx

  2. (αx)′(βy)=αβ(xTy)

  3. xT(y+z)=xTy+xTz

15
New cards

Define the (Euclidean) norm

The (_) (_) of x∈RN is defined as

‖x‖=sqrt(xTx)=(∑Nn=1x2n)1/2

  • ‖x‖ represents the ``length’’ of x

  • ‖x−y‖ represents distance between x and y

16
New cards

linear algebra - inequalities

For any α∈R and any x,y∈RN, the following statements are true:

  1. ‖x‖≥0 and ‖x‖=0 if and only if x=0

  • ‖αx‖=|α|‖x‖

  • ‖x+y‖≤‖x‖+‖y‖ (triangle inequality)

  • |xTy|≤‖x‖‖y‖ (Cauchy-Schwarz inequality)

17
New cards

Linear combinations

Definition

A linear combination of vectors x1,…,xK in RN is a vector

y=∑Kk=1 αkxk = α1x1+⋯+αKxK

where α1,…,αK are scalars

18
New cards

Inner products of linear combinations satisfy

knowt flashcard image
19
New cards

Define span

Set of all possible linear combinations of elements of X is called the (_) of X, denoted by span(X)

<p>Set of all possible linear combinations of elements of <span>X</span> is called the <strong>(_)</strong> of <span>X</span>, denoted by <span>span(X)</span></p>
20
New cards

Define linear subspace (subspace)

A nonempty S⊂RN called a linear subspace of RN if

x, y ∈ S and α, β ∈ R ⟹ αx + βy∈ S

In other words, S⊂RN is “closed” under vector addition and scalar multiplication

21
New cards

canonical

The set of canonical basis vectors {e1,…,eN} is linearly independent in RN

22
New cards

linearly independent

Fact

Take X={x1,…,xK}⊂RN. For K>1 all of following statements are equivalent

  1. X is linearly independent

  2. No xi ∈ X can be written as linear combination of the others

  3. X0 ⊂ X ⟹ span(X0) ⊂ span(X)

23
New cards

Facts of linear independence (4)

Fact If X is linearly independent, then X does not contain 0

Fact If X is linearly independent, then every subset of X is linearly independent

Fact If X={x1,…,xK} ⊂ RN is linearly independent and z is an N-vector not in span(X), then X∪{z} is linearly independent

Fact If X={x1,…,xK} ⊂ RN is linearly independent and y ∈ RN, then there is at most one set of scalars α1,…,αK such that y=∑Kk=1 αkxk

24
New cards

Exchange Lemma

Let S be a linear subspace of RN, and be spanned by K vectors.
Then any linearly independent subset of S has at most K vectors

25
New cards

Define the basis

Let S be a linear subspace of RN

A set of vectors B={b1,…,bK} ⊂ S is called a basis of S if

  1. B is linearly independent

  2. span(B)=S

e.g. Canonical basis vectors form a basis of RN

26
New cards

Define linear function

A function T:RK→RN is called (_) if

T(αx+βy)=αTx+βTy∀x, y∈RK, ∀ α, β ∈ R

27
New cards

Define the null space (kernel)

The (_) or (_) of linear mapping T:RKRN is

kernel (T)={x∈RK:Tx=0}

28
New cards

If T is a linear function from RN to RN then all of the following are equivalent:

If T is a linear function from RN to RN then all of the following are equivalent:

  1. T is a bijection

  2. T is onto/surjective

  3. T is one-to-one/injective

  4. kernel(T)={0}

  5. The set of vectors V={Te1,…,TeN} is linearly independent

Definition: If any one of the above equivalent conditions is true, then T is called nonsingular

Fact: If T:RNRN is nonsingular then so is T−1

29
New cards

For a linear mapping T from RKRN, the following statements are true:

  1. If K<N then T is not onto

  2. If K>N then T is not one-to-one

30
New cards

Define Rank

The dimension of span(A) is known as the (_) of A

rank(A)=dim(span(A))

Because span(A) is the span of K vectors, we have

rank(A) = dim(span(A))≤K

Definition

A is said to have full column rank if

rank(A)= number of columns of A

31
New cards

Define nonsingular for square matrix A

We say that A is nonsingular if T: x→Ax is nonsingular

Equivalent:

  • x↦Ax is a bijection from RN to RN

We now list equivalent conditions for nonsingularity

Fact Let A be an N×N matrix
All of the following conditions are equivalent

  1. A is nonsingular

  2. The columns of A are linearly independent

  3. rank(A) = N

  4. span(A) = RN

  5. If Ax=Ay, then x=y

  6. If Ax=0, then x=0

  7. For each b ∈ RN, the equation Ax=b has a solution

  8. For each b ∈ RN, the equation Ax=b has a unique solution

32
New cards

Define invertible

Given square matrix A, suppose ∃ square matrix B such that

AB=BA=I

Then

  • B is called the inverse of A, and written A−1

  • A is called invertible

Fact A square matrix A is non-singular if and only if it is invertible

33
New cards

State the inverse of a 2×2 matrix

knowt flashcard image
34
New cards

Define
a) eigenvalue
b) eigenvector
c) eigenpair

Let A be a square matrix

Think of A as representing a mapping x↦Ax, (this is a linear function)

But sometimes x will only be scaled:

Ax=λx for some scalar λ

Definition

If Ax=λx holds and x is nonzero, then

  1. x is called an (_b_) of A and λ is called an (_a_)

  2. (x,λ) is called an (_c_)

<p>Let <span>A</span> be a square matrix</p><p>Think of <span>A</span> as representing a mapping <span><em>x↦Ax</em></span>, (this is a linear function)</p><p>But sometimes <span>x</span> will only be <strong>scaled</strong>:</p><p><em>Ax=λx </em>for some scalar&nbsp;<em>λ</em></p><p class="admonition-title"><strong>Definition</strong></p><p>If <span><em>Ax=λx</em></span> holds and <span>x</span> is nonzero, then</p><ol><li><p><span>x</span> is called an <strong>(_b_)</strong> of <span><em>A</em></span> and <span><em>λ</em></span> is called an <strong>(_a_)</strong></p></li><li><p><span><em>(x,λ)</em></span> is called an <strong>(_c_)</strong></p></li></ol><p></p>
35
New cards

Eigenvalues and determinants

For any square matrix A

λ is an eigenvalue of A ⟺ det(A−λI) = 0

36
New cards

Define
a) characteristic polynomial
b) characteristic equation

The polynomial det(A−λI) is called a (_a_) of A.

The roots of the (_b_) det(A−λI)=0 determine all eigenvalues of A.

37
New cards

Define similar (matrix)

Square matrix A is said to be (_) to square matrix B if there exist an invertible matrix P such that A=PBP−1.

If the linear map is the same, we must have

Ax=PBP−1x

_images/diagonalize.png

38
New cards

Fact

Given N×N matrix A with distinct eigenvalues λ1,…,λN we have

  • If A=diag(d1,…,dN), then λn=dn for all n

  • det(A)=∏n=1Nλn

  • If A is symmetric, then λn∈R for all n (not complex!)

Proof

Fact A is nonsingular all eigenvalues are nonzero

Fact If A is nonsingular, then eigenvalues of A−1 are 1/λ1,…,1/λN

39
New cards

Conic sections

An equation describing every conic section is a quadratic equation in two variables.

_images/conic.png

Fig. 68 Conic sections#

40
New cards

Define Hyperbola

(_) is a collection of points (x,y) ∈ R2, the difference of distances from which to two fixed points (termed foci) is constant.

_images/hyperbola.png

41
New cards

Define Quadratic form

(_) form in a function Q:RN→R

Q(x)=∑Ni=1Nj=1aijxixj

(_) form Q:RN→R can be equivalently represented as a matrix product

Q(x)=xTAx

where x∈RN is a column vector, and (N×N) symmetric matrix A consists of elements {aij}, i,j=1,…,N

<p><strong>(_)</strong> form in a function <span><em>Q:R<sup>N</sup>→R</em></span></p><p><em>Q(x)=∑<sup>N</sup><sub>i=1</sub>∑<sup>N</sup><sub>j=1</sub>a<sub>ij</sub>x<sub>i</sub>x<sub>j</sub></em></p><p><strong>(_)</strong> form <span><em>Q:R<sup>N</sup>→R</em></span> can be equivalently represented as a matrix product</p><p><em>Q(x)=x<sup>T</sup>Ax</em></p><p>where <span><em>x∈R<sup>N</sup></em></span><em> </em>is a column vector, and <span><em>(N×N)</em></span><em> </em><strong>symmetric</strong> matrix <span>A</span> consists of elements <span><em>{a<sub>ij</sub>}</em></span>, <span><em>i,j=1,…,N</em></span></p>
42
New cards

Changing bases

Fact By change of bases any quadratic form Q(x) with a symmetric real matrix A can be transformed to a sum of squares with positive or negative coefficients

Q(x)=xTAx = xTPTDPx =(Px)TD(Px)=∑Ni=1λi(Px)2i

where λi are the eigenvalues and P is the matrix of eigenvectors of A.

_images/ellipse_rotated.png

Fig. 69 Change of bases to convert the quadratic form to a canonical form

43
New cards

Define
a) positive definite
b) negative definite

Quadratic form Q(x) is called (_a_) if

Q(x)=xTAx>0 for all x≠0

Quadratic form Q(x) is called (_b_) if

Q(x)=xTAx<0 for all x≠0

  • the examples above show these cases

  • note that because every quadratic form can be represented by a linear combination of squares, the characteristic properties are the signs of the coefficients in the canonical form!

44
New cards

Law of inertia

Fact (law_)

The number of coefficients of a given sign in the canonical form of a quadratic form does not depend on the choice of the basis

45
New cards

Define Definite

Positive and negative definite quadratic forms are called (_)

Note that for some matrices it may be that

  • for some x: xTAx < 0

  • for some x: xTAx > 0

In this case A is called indefinite

_images/qform_indef.png

Fig. 72 Indefinite quadratic function

Q(z)= x21 /2+8x1x2+x22 /2

46
New cards

Define positive semi-definite (aka. non-negative definite)

Quadratic form Q(x) is called non-negative definite (positive semi-definite) if

Q(x)=xTAx⩾0 for all x≠0

Quadratic form Q(x) is called non-positive definite (negative semi-definite) if

Q(x)=xTAx⩽0 for all x≠0

47
New cards

Facts definite(ness)

Fact: If quadratic form xTAx is positive definite, then det

Fact (definiteness)

Let A be an (N×N) symmetric square matrix. The corresponding quadratic form Q(x)=xTAx is positive definite if and only if all leading principal minors of A are positive

xTAx>0 for all x≠0⟺Mk>0 for all k=1,…,N

Quadratic form Q(x)=xTAx is negative definite if and only if the leading principal minors of A alternate in sign according to the following rule

xTAx<0 for all x≠0⟺(−1)kMk>0 for all k=1,…,N

48
New cards

Fact (semi-definiteness)

Fact (semi-definiteness)

Let A be an (N×N) symmetric square matrix. The corresponding quadratic form Q(x)=xTAx is positive semi-definite if and only if all principal minors of A are non-negative.

xTAx⩾0 for all x≠0⟺∀Pk:Pk⩾0 for all k=1,…,N

Quadratic form Q(x)=xTAx is negative semi-definite if and only if all principal minors of oder k of A are zero or have the same sign as (−1)k

xTAx⩽0 for all x≠0⟺∀Pk:(−1)kPk⩾0 for all k=1,…,N

49
New cards

Components of the optimization problem

  1. Objective function: function to be maximized or minimized, also known as maximand
    In the example above profit function π(p,q)=pq−C(q) to be maximized

  2. Decision/choice variables: the variables that the agent can control in order to optimize the objective function, also known as controls
    In the example above price p and quantity q variables that the firm can choose to maximize its profit

  3. Equality constraints: restrictions on the choice variables in the form of equalities
    In the example above q=α+14αp2−p

  4. Inequality constraints (weak and strict): restrictions on the choice variables in the form of inequalities
    In the example above q≥0 and p>0

  5. Parameters: the variables that are not controlled by the agent, but affect the objective function and/or constraints
    In the example above α, β and δ are parameters of the problem

  6. Value function: the “optimized” value of the objective function as a function of parameters
    In the example above Π(α,β,δ) is the value function

50
New cards

Define the general form of the optimisation problem

The (_) is

V(θ)=max(x) f(x,θ) subject to gi(x,θ)=0, i∈{1,…,I}hj(x,θ)≤0, j∈ {1,…,J}

where:

  • f(x,θ): RN×RK→R is an objective function

  • x∈RN are decision/choice variables

  • θ∈RK are parameters

  • gi(x,θ)=0,i ∈ {1,…,I} where gi: RN×RKR, are equality constraints

  • hj(x,θ)≤0, j∈{1,…,J} where hj: RN×RKR, are inequality constraints

  • V(θ):RKR is a value function

<p>The <strong>(_) </strong>is</p><p><em>V(θ)=max<sub>(x) </sub>f(x,θ)</em> subject to <em>g<sub>i</sub>(x,θ)=0, i∈{1,…,I}hj(x,θ)≤0, j∈ {1,…,J}</em></p><p>where:</p><ul><li><p><span><em>f(x,θ): R<sup>N</sup>×R<sup>K</sup>→R</em></span> is an objective function</p></li><li><p><span><em>x∈</em><strong><em>R</em></strong><em><sup>N</sup></em></span> are decision/choice variables</p></li><li><p><span><em>θ∈</em><strong><em>R</em></strong><em><sup>K</sup></em></span> are parameters</p></li><li><p><span><em>g<sub>i</sub>(x,θ)=0,i ∈ {1,…,I}</em></span> where <span><em>g<sub>i</sub>: </em><strong><em>R</em></strong><em><sup>N</sup>×</em><strong><em>R</em></strong><em><sup>K</sup>→</em><strong><em>R</em></strong></span><em>, </em>are equality constraints</p></li><li><p><span><em>h<sub>j</sub>(x,θ)≤0, j∈{1,…,J}</em></span> where <span><em>h<sub>j</sub>: </em><strong><em>R</em></strong><em><sup>N</sup>×</em><strong><em>R</em></strong><em><sup>K</sup>→</em><strong><em>R</em></strong></span>, are inequality constraints</p></li><li><p><span>V(θ):<strong><em>R</em></strong><sup>K</sup>→<strong>R</strong></span> is a value function</p></li></ul><p></p>
51
New cards

Define the set of admissible choices (admissible set)

The set of (_) contains all the choices that satisfy the constraints of the optimization problem.

52
New cards

Reminder of definitions

A point x∗∈[a,b] is called a

  • maximizer of f on [a,b] if f(x∗)≥f(x) for all x∈[a,b]

  • minimizer of f on [a,b] if f(x∗)≤f(x) for all x∈[a,b]

Point x is called interior to [a,b] if a<x<b

A stationary point of f on [a,b] is an interior point x with f′(x)=0

_images/stationary.png

Fig. 73 Both x∗ and x∗∗ are stationary

53
New cards

Define stationary point

Definition

Given a function f: RNR, a point x∈RN is called a stationary point of f if ∇f(x)=0

Definition

Given a function f: RN R, a point x*∈RN is called a local maximiser/minimiser of f if ∃ϵ such that f(x)≤ f(x*) for all x∈Bϵ(x*)

If the inequality is strict, then x* is called a strict local maximiser/minimiser of f

A maximiser/minimiser (global) must also be a local one, but the opposite is not necessarily true.

54
New cards

State the necessary condition for optima

Let f(x,θ): RNR be a differentiable function and let x*∈ RN be a local maximiser/minimiser of f.

Then x* is a stationary point of f, that is ∇f(x*)=0

55
New cards

saddle point

  • saddle point where the FOC hold, yet the point is not a local maximizer/minimizer!

  • Similar to x=0 in f(x)=x3: derivative is zero, yet the point is not an optimizer

  • How to distinguish saddle points from optima? Key insight: the function has different second order derivatives in different directions!

  • need SOC to distinguish

56
New cards

SOC

  • do not give definitive answer in all cases, unfortunately

Fact (necessary SOC)

Let f(x): RNR be a twice continuously differentiable function and let x*∈ RN be a local maximiser/minimiser of f. Then:

  • f has a local maximum at x*⟹Hf(x*) is negative semi-definite

  • f has a local minimum at x*⟹Hf(x*) is positive semi-definite

  • recall the definition of semi-definiteness

Fact (sufficient SOC)

Let f(x): RNR be a twice continuously differentiable function. Then:

  • if for some x*∈ RN ∇f(x*)=0 (FOC satisfied) and Hf(x*) is negative definite, then x* is a strict local maximum of f

  • if for some x*∈RN ∇f(x*)=0 (FOC satisfied) and Hf(x*) is positive definite, then x* is a strict local minimum of f

  • observe that SOC are only necessary in the “weak” form, but are sufficient in the “strong” form

  • this leaves room for ambiguity when we can not arrive at a conclusion — particular stationary point may be a local maximum or minimum

  • but we can rule out saddle points for sure, in this case neither semi-definiteness nor definiteness can be established, the Hessian is indefinite

57
New cards

SOC provide:

Fact Given a twice continuously differentiable function f: R2 → R and a stationary point x*: ∇f(x*)=0, the second order conditions provide:

  1. if det(Hf(x*))>0 and trace(Hf(x*))>0 ⟹

    • λ1>0 and λ2>0,

    • Hf(x*) is positive definite,

    • f has a strict local minimum at x*

  2. if det(Hf(x*))>0 and trace(Hf(x*))<0 ⟹

    • λ1<0 and λ2<0,

    • Hf(x*) is negative definite,

    • f has a strict local maximum at x*

  3. if det(Hf(x*))=0 and trace(Hf(x*))>0 ⟹

    • λ1=0 and λ2>0,

    • Hf(x*) is positive semi-definite (nonnegative definite),

    • f may have a local minimum x*

    • undeceive!

  4. if det(Hf(x*))=0 and trace(Hf(x*))<0 ⟹

    • λ1=0 and λ2<0,

    • Hf(x*) is negative semi-definite (nonpositive definite),

    • f may have a local maximum x*

    • undeceive!

  5. if det(Hf(x*))<0

    • λ1 and λ2 have different signs,

    • Hf(x*) is indefinite,

    • x* is a saddle point of f

58
New cards

Define convex (set)

Definition

A set C⊂RK is called (_) if

x,y in C and 0≤λ≤1 ⟹ λx+(1−λ)y ∈ C

Convexity line between any two points in C lies in C

_images/convex.png

Fig. 74 Convex set

A non-convex set

_images/non_convex.png

Fig. 75 Non-convex set

59
New cards

Fact about intersection of convex sets

Fact If A and B are convex, then so is A∩B

60
New cards

Define a convex function

f is called (_) if

f(λx+(1−λ)y) ≤ λf(x)+(1−λ)f(y)

for all x,y∈A and all λ∈[0,1]

f is called strictly convex if

f(λx+(1−λ)y)<λf(x)+(1−λ)f(y)

for all x,y∈A with x≠y and all λ∈(0,1)

_images/convex_function.png

Fig. 76 A strictly convex function on a subset of R#

61
New cards

Fact about convex function

Fact If A is K×K and positive definite, then

Q(x)=x′Ax(x∈RK)

is strictly convex on RK

62
New cards

Define a concave function

f is called concave if

f(λx+(1−λ)y) ≥ λf(x)+(1−λ)f(y)

for all x,y∈A and all λ∈[0,1]

f is called strictly concave if

f(λx+(1−λ)y) > λf(x)+(1−λ)f(y)

for all x,y∈A with x≠y and all λ∈(0,1)

Fact

f:A→R is concave if and only if its hypograph (aka subgraph)

Hf:={(x,y)∈A×R:f(x)≥y}

is a convex subset of RK×R

_images/hypograph.png

Fig. 79 Convex hypograph/subgraph of a function#

63
New cards

Preservation of Shape

Fact If f and g are convex (resp., concave) and α≥0, then

  • αf is convex (resp., concave)

  • f+g is convex (resp., concave)

If f and g are strictly convex (resp., strictly concave) and α>0, then

  • αf is strictly convex (resp., strictly concave)

  • f+g is strictly convex (resp., strictly concave)

64
New cards

Hessian based shape criterion

If f: A→R is a C2 function where A⊂RK is open and convex, then

  1. Hf(x) positive semi-definite (nonnegative definite) for all x∈A ⟺f convex

  2. Hf(x) negative semi-definite (nonpositive definite) for all x∈A ⟺f concave

In addition,

  1. Hf(x) positive definite for all x∈A ⟹f strictly convex

  2. Hf(x) negative definite for all x∈A ⟹f strictly concave

65
New cards

Advantages of convexity/concavity (two)

  1. Sufficiency of first order conditions for optimality

  2. Uniqueness of optimizers with strict convexity/concavity

Let A⊂RK be convex set and let f: A→R be concave/convex differentiable function.

Then x*∈A is a minimizer/minimizer of f on A if and only if x* is a stationary point of f on A.

66
New cards

Uniqueness

Let A⊂RK be convex set and let f: A→R

  1. If f is strictly convex, then f has at most one minimizer on A

  2. If f is strictly concave, then f has at most one maximizer on A

Interpretation, strictly concave case:

  • we don’t know in general if f has a maximizer

  • but if it does, then it has exactly one

  • in other words, we have uniqueness

67
New cards

Sufficient and necessary conditions (sets)

Let A⊂RK be convex set, let f: A→R, and x*∈A stationary, then

  1. f strictly concave x∗ is the unique maximizer of f on A

  2. f strictly convex x∗ is the unique minimizer of f on A

_images/concave_max.png

Fig. 80 Unique maximizer for a strictly concave function

68
New cards

Roadmap for unconstrained optimization

  1. Assuming that the domain of the objective function is the whole space, check that it is continuous and for derivative based methods twice continuously differentiable

  2. Derive the gradient and Hessian of the objective function. If it can be shown that the function is globally concave or convex, checking SOC (below) is not necessary, and global optima can be established

  3. Find all stationary points by solving FOC

  4. Check SOC to determine whether found stationary points are (local) maxima or minima and filter out saddle points

  5. Assess whether any of the local optima are global by comparing function values at these points, and inspecting the shape of the function to determine if its value anywhere exceeds the best local optimum (limits at the positive and negative infinities, possible vertical asymptotes, etc.)

69
New cards

Define the general form of the optimization problem

The (_) is

V(θ)=max(x) f(x,θ) subject to
gi(x,θ) = 0, i∈{1,…,I}
hj(x,θ) ≤ 0, j∈{1,…,J}

where:

  • f(x,θ): RN×RK R is an objective function

  • x∈RN are decision/choice variables

  • θ∈RK are parameters

  • gi(x,θ) = 0,i∈{1,…,I} where gi:RN×RKR, are equality constraints

  • hj(x,θ) ≤ 0,j∈{1,…,J} where hj:RN×RKR, are inequality constraints

  • V(θ): RK R is a value function

70
New cards

State the Lagrange theorem

Let f: RN R and g: RNRK be continuously differentiable functions.

Let D={x: gi(x)=0, i=1,…,K}⊂RN

Suppose that

  • x*∈D is a local extremum of f on D, and

  • the gradients of the constraint functions gi are linearly independent at x* (equivalently, the rank of the Jacobian matrix Dg(x*) is equal to K)

Then there exists a vector λ* ∈ RK such that

Df(x*)−λ*⋅Dg(x*)=Df(x*)−∑Ki=1 λi*Dgi(x*)=0

71
New cards

Note on Lagrange

  • λ* is called the vector of Lagrange multipliers

  • The assumption that the gradients of the constraint functions gi are linearly independent at x* is called the constraint qualification assumption

  • Lagrange theorem provides necessary first order conditions for both maximum and minimum problems

72
New cards

Define the Lagrangian function

Definition

The function L: RN+KR defined as

L(x,λ)=f(x)−λ⋅g(x)=f(x)−∑Ki=1 λigi(x)

is called a Lagrangian (function)

73
New cards

Hessian of the Lagrangian

Using the rules of the differentiation from multivariate calculus we can express the gradient and the Hessian of the Lagrangian w.r.t. x as

xL(λ,x)=Df(x)−λ⋅Dg(x)HxL(λ,x)=Hf(x)−λ⊙Hg(x)

  • where ⊙ denotes multiplication of the vector λ and the Hessian (K,N,N) tensor Hg(x) along the first dimension

74
New cards

Fact (necessary SOC)

Let f:RN R and g: RNRK be twice continuously differentiable functions (C2) and let D = {x: gi(x)=0,i=1,…,K} ⊂ RN

Suppose x* ∈ D is the local constrained optimizer of f on D and that the constraint qualification assumption holds at x* and some λ* ∈ RK

Then:

  • If f has a local maximum on D at x*, then HxL(λ,x) defined above is negative semi-definite on a linear constraint set {v: Dg(x*)v=0}, that is

∀v ∈ RN and Dg(x*)v=0 v′HxL(λ,x)v≤0

  • If f has a local minimum on D at x*, then HxL(λ,x) defined above is positive semi-definite on a linear constraint set {v: Dg(x*)v=0}, that is

∀v ∈ RN and Dg(x*)v=0 ⟹ v′HxL(λ,x)v ≥ 0

75
New cards

Fact (sufficient SOC)

Let f: RN R and g: RN RK be twice continuously differentiable functions (C2) and let D={x: gi(x)=0,i=1,…,K} ⊂ RN

Suppose there exists x* ∈ D such that rank(Dg(x⋆))=K and Df(x*)−λ*⋅Dg(x*) = 0 for some λ* ∈ RK (the constraint qualification assumption holds at x* which satisfies the FOC)

Then:

  • If HxL(λ,x) defined above is negative definite on a linear constraint set {v: Dg(x*)v=0}, that is

∀v ≠ 0 and Dg(x*)v = 0 ⟹ v′HxL(λ,x)v < 0,

then x* is the local constrained maximum of f on D

  • If HxL(λ,x) defined above is positive definite on a linear constraint set {v: Dg(x*)v=0}, that is

∀v ≠ 0 and Dg(x*)v=0⟹v′HxL(λ,x)v > 0

then x* is the local constrained minimum of f on D

76
New cards

Define bordered Hessian

Definition

The Hessian of the Lagrangian with respect to (λ,x) is called a (_), and is given by

HL(λ,x) = ([0]K×K [−Dg(x)]K×N
[−Dg(x)]’N×K [Hf(x)−λ⊙Hg(x)]N×N)

<p>Definition</p><p>The Hessian of the Lagrangian with respect to <span>(λ,x)</span> is called a <strong>(_)</strong>, and is given by</p><p>H<u>L</u>(λ,x) = ([0]<sub>K×K    </sub>[−Dg(x)]<sub>K×N</sub><br>[−Dg(x)]’<sub>N×K</sub>         [Hf(x)−λ⊙Hg(x)]<sub>N×N</sub>)</p><p></p>
77
New cards

Define
a) positive definite subject to a linear constraint
b) negative definite subject to a linear constraint

Definition

Let x ∈ RN, A is (N×N) symmetric matrix and B is (K×N) matrix.

Quadratic form Q(x) is called positive definite subject to a linear constraint Bx=0 if

Q(x)=xTAx>0 for all x≠0 and Bx=0

Quadratic form Q(x) is called negative definite subject to a linear constraint Bx=0 if

Q(x)=xTAx<0 for all x≠0 and Bx=0

Positive and negative semi-definiteness subject to a linear constraint is obtained by replacing strict inequalities with non-strict ones.

recall that Bx=0 is equivalent to saying that x is in the null space (kernel) of B

78
New cards

Definiteness of quadratic form subject to linear constraint table

In order to establish definiteness of the quadratic form xTAx, x∈RN, subject to a linear constraint Bx=0, we need to check signs of the N−K leading principle minors of the (N−K×N−K) matrix E which represents the equivalent lower dimension quadratic form without the constraint. The signs of these leading principle minors correspond to (−1)K times the signs of the determinant of the corresponding bordered matrix H, where a number of last rows and columns are dropped, according to the following table:

Order j of leading principle minor in E

Include first rows/columns of H

Drop last rows/columns of H

1

2K+1

N−K−1

2

2K+2

N−K−2

N−K−1

N+K−1

1

N−K

N+K

0

Clearly, this table has N−K rows, and it is N−K determinants that have to be computed.

The definiteness of the quadratic form subject to the linear constraint is then given by the pattern of signs of these determinants according to the Silvester’s criterion.

Note that computing determinants of H after dropping a number of last rows and columns is effectively computing last leading principle minors of H.

If any of the leading principle minors turns out to be zero, we can move on and check the patters for the semi-definiteness.

79
New cards

Semi-definiteness of quadratic form subject to linear constraint

In order to establish semi-definiteness of the quadratic form xTAx, x ∈ RN, subject to a linear constraint Bx=0, we need to check signs of the principle minors of the (N−K×N−K) matrix E which represents the equivalent lower dimension quadratic form without the constraint. The signs of these principle minors correspond to (−1)K times the signs of the determinant of the corresponding bordered matrix H, where a number number of rows and columns are dropped from among the last N−K rows and columns (in all combinations), according to the following table:

Order j of principle minor in E

Number of principle minors

Drop among last N − K rows/columns

1

N−K

N−K−1

2

C(N−K,2)

N−K−2

N−K−1

N−K

1

N−K

1

0

The table again has N−K rows, but each row corresponds to a different number of principle minors of a given order, namely for order j there are (N−k)-choose-j principle minors given by the corresponding number of combinations C(N−K, j).

The total number of principle minors to be checked is 2N−K−1, given by the sum of binomial coefficients

The definiteness of the quadratic form subject to the linear constraint is then given by the pattern of signs of these determinants according to the Silvester’s criterion.

In what follows we will refer to computing determinants of H after dropping a number of rows and columns among the last N−K rows and columns as computing last principle minors of H. The number of last principle minors will correspond to the number of lines in the table above, and not the number of determinants to be computed.

80
New cards

Fact (definiteness of Hessian on linear constraint set)

Fact

For f: RN R and g: RN RK twice continuously differentiable functions (C2):

HxL(λ,x) is positive definite on a linear constraint set {v: Dg(x*)v = 0} the last N−K leading principal minors of HL(λ,x) are non-zero and have the same sign as (−1)K.

HxL(λ,x) is negative definite on a linear constraint set {v: Dg(x*)v=0} the last N−K leading principal minors of HL(λ,x) are non-zero and alternate in sign, with the sign of the full Hessian matrix equal to (−1)N.

HxL(λ,x) is positive semi-definite on a linear constraint set {v:Dg(x*)v=0} the last N−K principal minors of HL(λ,x) are zero or have the same sign as (−1)K.

HxL(λ,x) is negative semi-definite on a linear constraint set {v:Dg(x*)v=0} the last N−K principal minors of HL(λ,x) are zero or alternate in sign, with the sign of the full Hessian matrix equal to (−1)N.

  • see above the exact definition of the last principle minors and how to count them

81
New cards

Fact (necessary SOC)

Let f: RN R and g: RNRK be twice continuously differentiable functions (C2), and let D={x: gi(x)=0,i=1,…,K} ⊂ RN

Suppose x* ∈ D is the local constrained optimizer of f on D and that the constraint qualification assumption holds at x* and some λ* ∈ RK

Then:

  • If f has a local maximum on D at x*, then the last N−K principal minors of bordered Hessian HL(λ,x) are zero or alternate in sign, with the sign of the determinant of the full Hessian matrix equal to (−1)N

  • If f has a local minimum on D at x*, then the last N−K principal minors of bordered Hessian HL(λ,x) are zero or have the same sign as (−1)K

  • see above the exact definition of the last principle minors and how to count them

82
New cards

Define sufficient SOC for function

Let f: RN R and g: RN RK be twice continuously differentiable functions (C2) and let D = {x:gi(x)=0,i=1,…,K} ⊂ RN

Suppose there exists x* ∈ D such that rank(Dg(x*))=K and Df(x*)−λ*⋅Dg(x*)=0 for some λ* ∈ RK (i.e. the constraint qualification assumption holds at (x*,λ*) which satisfies the FOC)

Then:

  • If the last N−K leading principal minors of the bordered Hessian HL(λ,x) are non-zero and alternate in sign, with the sign of the determinant of the full Hessian matrix equal to (−1)N, then x* is the local constrained maximum of f on D

  • If the last N−K leading principal minors of the bordered Hessian HL(λ,x) are non-zero and have the same sign as (−1)K, then x* is the local constrained minimum of f on D

83
New cards

Lagrangian method: algorithm

  1. Write down the Lagrangian function L(x,λ)

  2. Find all stationary points of L(x,λ) with respect to x and λ, i.e. solve the system of first order equations

  3. Derive the bordered Hessian HL(x,λ) and compute its value at the stationary points

  4. Using second order conditions, check if the stationary points are local optima

  5. To find the global optima, compare the function values at all identified local optima

  • note that using convexity/concavity of the objective function is not as straightforward as in the unconstrained case, requires additional assumptions on the convexity of the constrained set

84
New cards

Lagrange method may not work when:

  • Constraints qualification assumption is not satisfied at the stationary point

  • FOC system of equations does not have solutions (thus no local optima)

  • Saddle points – second order conditions can help

  • Local optimizer is not a global one – no way to check without studying analytic properties of each problem

Conclusion: great tool, but blind application may lead to wrong results!

85
New cards

Karush-Kuhn-Tucker conditions (maximization)

Let f:RN→R and g:RN→RK be continuously differentiable functions.

Let D={x: gi(x)≤0, i=1,…,K} ⊂ RN

Suppose that x⋆∈D is a local maximum of f on D and that the gradients of the constraint functions gi corresponding to the binding constraints are linearly independent at x⋆ (equivalently, the rank of the matrix composed of the gradients of the binding constraints is equal to the number of binding constraints).

Then there exists a vector λ⋆∈RK such that

Df(x⋆)−λ⋆⋅Dg(x⋆)=Df(x⋆)−∑Ki=1λ*iDgi(x*)=0

and

λi⋆≥0 with λi⋆gi(x*)=0 i=1,…,K

86
New cards

Karush-Kuhn-Tucker conditions (miminization)

In the settings of KKT conditions (maximization), suppose that x⋆∈D is a local minimum of f on D, and as before the matrix composed of the gradients of the binding constraints has full rank.

Then there exists a vector λ*∈RK such that (note opposite sign)

Df(x*)+λ*⋅Dg(x*)=Df(x*)+∑Ki=1λ*iDgi(x*)=0

and

λ*i ≥ 0 with λ*igi(x*)=0 i=1,…,K

  • Very similar to the Lagrange theorem, but now we have inequalities!

87
New cards

The general form of the optimisation problem

V(θ)=max(x)f(x,θ) subject to
gi(x,θ) = 0, i∈{1,…,I}
hj(x,θ) ≤ 0, j∈{1,…,J}

where:

  • f(x,θ):RN×RK R is an objective function

  • x ∈ RN are decision/choice variables

  • θ ∈ RK are parameters

  • gi(x,θ)=0, i ∈ {1,…,I} where
    gi:RN×RK R, are equality constraints

  • hj(x,θ)≤0, j ∈ {1,…,J} where
    hj:RN×RK → R, are inequality constraints

  • V(θ): RKR is a value function

88
New cards

Envelope theorem for unconstrained problems

Let f(x,θ): RN×RKR be a differentiable function, and x*(θ) be the maximizer of f(x,θ) for every θ. Suppose that x*(θ) is differentiable function itself. Then the value function of the problem V(θ)=f(x*(θ),θ) is differentiable w.r.t. θ and

∂V/∂θj= (∂f/∂θj) (x*(θ),θ),∀j

<p>Let <span><em>f(x,θ): </em><strong><em>R</em></strong><em><sup>N</sup>×</em><strong><em>R</em></strong><em><sup>K</sup>→ </em><strong><em>R</em></strong></span> be a differentiable function, and <span><em>x*(θ)</em></span> be the maximizer of <span><em>f(x,θ)</em></span> for every <span>θ</span>. Suppose that <span><em>x*(θ)</em></span> is differentiable function itself. Then the value function of the problem <span><em>V(θ)=f(x*(θ),θ)</em></span> is differentiable w.r.t. <span><em>θ</em></span> and</p><p>∂V/∂θ<sub>j</sub>= (∂f/∂θ<sub>j</sub>)<sub> </sub>(x*(θ),θ),∀j</p>
89
New cards

Envelope theorem for constrained problems

Consider an equality constrained optimization problem

V(θ)=max(x)f(x,θ) subject to gi(x,θ)=0,i∈{1,…,I}

where:

  • f(x,θ): RN×RK→R is an objective function

  • x ∈ RN are decision/choice variables

  • θ ∈ RK are parameters

  • gi(x,θ)=0, i∈{1,…,I} where gi: RN×RK R, are equality constraints

  • V(θ): RKR is a value function

Assume that the maximizer correspondence D*(θ) = argmax f(x,θ) is single-valued and can be represented by the function x*(θ): RK RN, with the corresponding Lagrange multipliers λ*(θ): RKRI.

Assume that both x*(θ) and λ*(θ) are differentiable, and that the constraint qualification assumption holds. Then

∂V/∂θj=(∂L/∂θj)(x*(θ),λ*(θ),θ), ∀j

where L(x,λ,θ) is the Lagrangian of the problem.