exercises06_solution – CSC413

Exercise 1 - Variance

Show that for two independent random variables, \(X,Y\) and arbitraty \(a,b\in\R\), the following equality holds \[\Var(aX+bY) = a^2\cdot\Var(X) + b^2\cdot\Var(Y).\]

Solution

First, we use the definition of variance and rewrite the left hand side as \[ \Var(aX+bY) = \E\li[ (aX+bY)^2 \ri] - \E[aX+bY]^2. \] Next, we expand the squares for each of the terms on the right hand side: \[\begin{align*} \E\li[ (aX+bY)^2 \ri] &= \E\li[ a^2X^2 + 2abXY + b^2Y^2 \ri] \\ &= a^2\E\li[ X^2 \ri] + 2ab\E[ XY ] + b^2\E\li[ Y^2 \ri] \\ &= a^2\E\li[ X^2 \ri] + 2ab\E[ X]\,\E[Y] + b^2\E\li[ Y^2 \ri] ,\\ \E[aX+bY]^2 &= \li(a\E[X] + b\E[Y]\ri)^2 \\ &= a^2\E[X]^2 + 2ab\E[X]\,\E[Y] + b^2\E[Y]^2. \end{align*}\] Subtracting the two terms, we get \[\begin{align*} & \E\li[ (aX+bY)^2 \ri] - \E[aX+bY]^2 \\ & \quad = a^2\E\li[ X^2 \ri] + 2ab\E[ X]\,\E[Y] + b^2\E\li[ Y^2 \ri] - a^2\E[X]^2 - 2ab\E[X]\,\E[Y] - b^2\E[Y]^2 \\ & \quad = a^2\E\li[ X^2 \ri] - a^2\E[X]^2 + b^2\E\li[ Y^2 \ri] - b^2\E[Y]^2 \\ & \quad = a^2(\E\li[ X^2 \ri] - \E[X]^2) + b^2\li(\E\li[ Y^2 \ri] - \E[Y]^2\ri) \\ & \quad = a^2\cdot\Var(X) + b^2\cdot\Var(Y). \end{align*}\]

Exercise 2 - Variance / Bias Decomposistion

Let \(D=\lbrace (x_i,y_i) | i=1 \ldots n\rbrace\) be a dataset obtained from the true underlying data distribution \(P\), i.e. \(D\sim P^n\). And let \(h_D(\cdot)\) be a classifier trained on \(D\). Show the variance bias decomposition \[ \underbrace{\mathbb{E}_{D,x,y} \li[ (h_D(x) - y)^2 \ri]}_{\text{Expected test error}} = \underbrace{\mathbb{E}_{D,x} \li[ (h_D(x) - \hat{h}(x))^2 \ri]}_{\text{Variance}} + \underbrace{\mathbb{E}_{x,y} \li[ (\hat{y}(x) - y)^2 \ri]}_{\text{Noise}} + \underbrace{\mathbb{E}_{x} \li[ (\hat{h}(x) - \hat{y}(x))^2 \ri]}_{\text{Bias}^2} \] where \(\hat{h}(x) = \mathbb{E}_{D \sim P^n}[h_D(x)]\) is the expected regressor over possible training sets, given the learning algorithm \(\mathcal{A}\) and \(\hat{y}(x) = \mathbb{E}_{y|x}[y]\) is the expected label given \(x\). As mentioned in the lecture, labels might not be deterministic given x. To carry out the proof, proceed in the following steps:

Show that the following identity holds \[\begin{align} \E_{D,x,y}\li[\li[h_{D}(x) - y\ri]^{2}\ri] = \E_{D, x}\li[(\hh_{D}(x) - \hh(x))^{2}\ri] + \E_{x, y} \li[\li(\hh(x) - y\ri)^{2}\ri]. \end{align}\]
Next, show \[\begin{align} E_{x, y} \li[ \li(\hh(x) - y \ri)^{2}\ri] =E_{x, y} \li[\li(\hy(x) - y\ri)^{2}\ri] + E_{x} \li[\li(\hh(x) - \hy(x)\ri)^{2}\ri] \end{align}\] which completes the proof by substituting (2) into (1).

Solution

First, we reformulate (1) as \[\begin{align*} \E_{D,x,y}\li[\li[h_{D}(x) - y\ri]^{2}\ri] &= \E_{D,x,y}\li[\li[\li(h_{D}(x) - \hh(x)\ri) + \li(\hh(x) - y\ri)\ri]^{2}\ri] \nonumber \\ &= \E_{x, D}\li[(\hh_{D}(x) - \hh(x))^{2}\ri] + 2 \mathrm{\;} \E_{x, y, D} \li[\li(h_{D}(x) - \hh(x)\ri)\li(\hh(x) - y\ri)\ri] + \E_{x, y} \li[\li(\hh(x) - y\ri)^{2}\ri] \end{align*}\] Next, we note that the second term in the above equation is zero because \[\begin{align*} \E_{D,x, y} \li[\li(h_{D}(x) - \hh(x)\ri) \li(\hh(x) - y\ri)\ri] &= \E_{x, y} \li[\E_{D} \li[ h_{D}(x) - \hh(x)\ri] \li(\hh(x) - y\ri) \ri] \\ &= \E_{x, y} \li[ \li( \E_{D} \li[ h_{D}(x) \ri] - \hh(x) \ri) \li(\hh(x) - y \ri)\ri] \\ &= \E_{x, y} \li[ \li(\hh(x) - \hh(x) \ri) \li(\hh(x) - y \ri)\ri] \\ &= \E_{x, y} \li[ 0 \ri] \\ &= 0\ . \end{align*}\]
The proof here, is similar. We start by reformulating the second term in (2) as \[\begin{align*} \E_{x, y} \li[ \li(\hh(x) - y \ri)^{2}\ri] &= \E_{x, y} \li[ \li(\hh(x) -\bar y(x) )+(\bar y(x) - y \ri)^{2}\ri] \\ &=\E_{x, y} \li[\li(\hy(x) - y\ri)^{2}\ri] + \E_{x} \li[\li(\hh(x) - \hy(x)\ri)^{2}\ri] + 2 \mathrm{\;} \E_{x, y} \li[ \li(\hh(x) - \hy(x)\ri)\li(\hy(x) - y\ri)\ri] \end{align*}\] Here, the third term is zero which follows from an analogous derivation as in (a). Thus, we have \[\begin{align*} \E_{x, y} \li[\li(\hh(x) - \hy(x)\ri)\li(\hy(x) - y\ri)\ri] &= \E_{x}\li[\E_{y \mid x} \li[\hy(x) - y \ri] \li(\hh(x) - \hy(x) \ri) \ri] \\ &= \E_{x} \li[ \E_{y \mid x} \li[ \hy(x) - y\ri] \li(\hh(x) - \hy(x)\ri)\ri] \\ &= \E_{x} \li[ \li( \hy(x) - \E_{y \mid x} \li [ y \ri]\ri) \li(\hh(x) - \hy(x)\ri)\ri] \\ &= \E_{x} \li[ \li( \hy(x) - \hy(x) \ri) \li(\hh(x) - \hy(x)\ri)\ri] \\ &= \E_{x} \li[ 0 \ri] \\ &= 0 \end{align*}\]

Exercise 3 - Ensembling

Download the file ex06-ensembling.ipynb from quercus. It contains basic Pytorch code training a classifier on MNIST. Modify that code such that it trains an ensemble of 5-10 neural networks and computes their average prediction once trained.