Minor edit

krasserm · krasserm · commit 54de8dac4bac · 2020-09-11T06:08:34.000+02:00
diff --git a/bayesian_neural_networks.ipynb b/bayesian_neural_networks.ipynb
@@ -22,7 +22,7 @@
     "\n",
     "## Variational inference\n",
     "\n",
-    "Unfortunately, an analytical solution for the posterior $p(\\mathbf{w} \\lvert \\mathcal{D})$ in neural networks is untractable. We therefore have to approximate the true posterior with a variational distribution $q(\\mathbf{w} \\lvert \\boldsymbol{\\theta})$ of known functional form whose parameters we want to estimate. This can be done by minimizing the [Kullback-Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) between $q(\\mathbf{w} \\lvert \\boldsymbol{\\theta})$ and the true posterior $p(\\mathbf{w} \\lvert \\mathcal{D})$  w.r.t. to $\\boldsymbol{\\theta}$. As shown in [Appendix](#Appendix), the corresponding optimization objective or cost function is\n",
+    "Unfortunately, an analytical solution for the posterior $p(\\mathbf{w} \\lvert \\mathcal{D})$ in neural networks is untractable. We therefore have to approximate the true posterior with a variational distribution $q(\\mathbf{w} \\lvert \\boldsymbol{\\theta})$ of known functional form whose parameters we want to estimate. This can be done by minimizing the [Kullback-Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) between $q(\\mathbf{w} \\lvert \\boldsymbol{\\theta})$ and the true posterior $p(\\mathbf{w} \\lvert \\mathcal{D})$. As shown in [Appendix](#Appendix), the corresponding optimization objective or cost function is\n",
     "\n",
     "$$\n",
     "\\mathcal{F}(\\mathcal{D},\\boldsymbol{\\theta}) = \n",
@@ -388,7 +388,7 @@
     "\\mathcal{F}(\\mathcal{D},\\boldsymbol{\\theta}) + \\log p(\\mathcal{D})\n",
     "$$\n",
     "\n",
-    "In order to minimize $\\mathrm{KL}(q(\\mathbf{w} \\lvert \\boldsymbol{\\theta}) \\mid\\mid p(\\mathbf{w} \\lvert \\mathcal{D}))$ w.r.t. $\\boldsymbol{\\theta}$ we only need to minimize $\\mathcal{F}(\\mathcal{D},\\boldsymbol{\\theta})$ as $p(\\mathcal{D})$ doesn't depend on $\\boldsymbol{\\theta}$. The negative variational free energy is also known as *evidence lower bound* $\\mathcal{L}(\\mathcal{D},\\boldsymbol{\\theta})$ (ELBO). \n",
+    "In order to minimize $\\mathrm{KL}(q(\\mathbf{w} \\lvert \\boldsymbol{\\theta}) \\mid\\mid p(\\mathbf{w} \\lvert \\mathcal{D}))$, we only need to minimize $\\mathcal{F}(\\mathcal{D},\\boldsymbol{\\theta})$ w.r.t. $\\boldsymbol{\\theta}$ as $p(\\mathcal{D})$ doesn't depend on $\\boldsymbol{\\theta}$. The negative variational free energy is also known as *evidence lower bound* $\\mathcal{L}(\\mathcal{D},\\boldsymbol{\\theta})$ (ELBO). \n",
     "\n",
     "$$\n",
     "\\mathrm{KL}(q(\\mathbf{w} \\lvert \\boldsymbol{\\theta}) \\mid\\mid p(\\mathbf{w} \\lvert \\mathcal{D})) =\n",