Exercise 1 - Taylor Series

Compute the Taylor series expansion up to the second order term for the following multivariate functions around a given point:

  1. \(f(x) = 5x^3\) around \(x_0=1\).
  2. \(f(x,y) = x^2 \cdot y^3 + x^2\) around \(x_0=3\), \(y_0=2\).
  3. \(f(\fx) = x_1^3 \cdot x_2 \cdot \log(x_2)\) around \(\fx_0=(2,1)^\top\).
  4. \(f(\fx) = \sin(x_1) + \cos(x_2)\) around \(\fx_0=(-\pi,\pi)^\top\).

Solution

There are different ways to write the second order taylor series expansion a at a point \(\fa\) for multivariate functions \(f\). We will use the following form

\[T(x) = f(\fa) + \nabla f(\fa)^\top (\fx-\fa) + \frac{1}{2}(\fx-\fa)^\top \fH (\fx-\fa)\]

where \(\nabla f(\fa)\) is the gradient of \(f\) at \(\fa\) and \(\fH\) is the Hessian of \(f\) at \(\fa\). As a reminder, the Hessian is the matrix of second order partial derivatives. So, all we need to do for all of the exercises is evaluate \(f(\fa)\) and compute its gradient and Hessian.

  1. \(T(x) = 5 + 15 (x-1) + 15 (x-1)^2\)

  2. First, we simplify \(f(x,y)=x^2 (y^3+1)\) and compute \(f(3,2)=81\) and then we compute the gradient and Hessian of \(f\) resulting in \[ \nabla f(x,y) = \bpmat 2x (y^3-1) \\ x^2(3y^2-1) \epmat,\quad \fH = \bpmat 2(y^3-1) & 6 xy^2 \\ 6xy^2 & 6x^2y \epmat. \]

  3. This one is similar to (b) but we have to be careful with the logarithm. First, we have \(f(\fx_0)=0\) because \(\log(1)=0\). Then, we compute the gradient and Hessian of \(f\) resulting in \[ \nabla f(\fx) = \bpmat 3x_1^2 x_2 \log(x_2) \\ x_1^3 (1+\log(x_2)) \epmat,\quad \fH = \bpmat 6x_1 x_2 \log(x_2) & 3x_1^2 (1+\log(x_2)) \\ 3x_1^2 (1+\log(x_2)) & x_1^3 x_2^{-1} \epmat \]

  4. Evaluating the trigonometric functions here is simpler than it seems because they are evaluated at \(\pi\) and \(-\pi\). We get \(f(\fx_0) = \sin(-\pi) + \cos(\pi) = 0 - 1 = -1\) and \[ \nabla f(\fx) = \bpmat \cos(x_1) \\ -\sin(x_2) \epmat, \quad \fH = \bpmat -\sin(x_1) & 0 \\ 0 & -\cos(x_2) \epmat. \]

Exercise 2 - Eigenvalues, Eigenvectors

You are given the sets of eigevalues and eigenvectors. Compute the corresponding matrix.

  1. \(\la_1 = 2\), \(\la_2 = 3\), \(\fv_1 = (1,0)^\top\), \(\fv_2 = (0,1)^\top\).
  2. \(\la_1 = 2\), \(\la_2 = 3\), \(\fv_1 = (1,1)^\top\), \(\fv_2 = (1,-1)^\top\).

Solution

First, remember that the normalized eigenvectors of a symmetric matrix are orthogonal. Thus, we have \[ \fe_i^\top \fe_j = \begin{cases} 1 & i=j \\ 0 & i\neq j \end{cases}. \]

Second, for symmetric \(\fA\), its spectral decomposition is given by \(\fA = \fQ \fLa \fQ^\top\), where \(\fQ\) is a matrix where each column is an (orthogonal) eigenvector of unit length.

  1. Here, the eigenvectors are already normalized and orthogonal, so we can simply write \(\fQ = (\fe_1, \fe_2)\) and \(\fLa = \diag(\lambda_1, \lambda_2)\). Then, we have \[ \fA = \bpmat 2 & 0 \\ 0 & 3 \epmat \]

  2. Here, we have to normalize the eigenvectors first. Each has length \(\sqrt{2}\), so we have to divide each of them by \(\sqrt{2}\), i.e. we set \(\fte_i:=\fe_i/\sqrt{2}\). With this, we can construct an orthogonal matrix of eigenvalues as \(\fQ = (\fte_1, \fte_2)\). The resulting matrix \(\fA\) is \[ \fA = \bpmat 2.5 & -0.5 \\ -0.5 & 2.5 \epmat \]

Exercise 3 - SGD with Momentum

Implement stochastic gradient descent with momentum and apply it to optimize some elementary functions in 1d and 2d.

Solution

An example implementation could look like this. First we deinfe the objective function and its gradient:

def objective(x):
    return x**2

def obj_grad(x):
    return 2*x

Then, we implement the momentum optimizer itself:

def sgd_with_momentum(obj, grad, x_init, learning_rate, momentum, max_iter):
  x = x_init
  update = 0
  for i in range(max_iter):
    update = momentum * update - learning_rate * grad(x)
    x = x + update

    print('>%d f(%s) = %.5f' % (i, x, obj(x)))
  return x

It can now be invoked directly via:

sgd_with_momentum(
    objective, obj_grad, x_init=3.0, learning_rate=0.1, momentum=0.5, 
    max_iter=20
    )