Skip to content

Commit 54de8da

Browse files
committed
Minor edit
1 parent 3cf9b48 commit 54de8da

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

bayesian_neural_networks.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
"\n",
2323
"## Variational inference\n",
2424
"\n",
25-
"Unfortunately, an analytical solution for the posterior $p(\\mathbf{w} \\lvert \\mathcal{D})$ in neural networks is untractable. We therefore have to approximate the true posterior with a variational distribution $q(\\mathbf{w} \\lvert \\boldsymbol{\\theta})$ of known functional form whose parameters we want to estimate. This can be done by minimizing the [Kullback-Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) between $q(\\mathbf{w} \\lvert \\boldsymbol{\\theta})$ and the true posterior $p(\\mathbf{w} \\lvert \\mathcal{D})$ w.r.t. to $\\boldsymbol{\\theta}$. As shown in [Appendix](#Appendix), the corresponding optimization objective or cost function is\n",
25+
"Unfortunately, an analytical solution for the posterior $p(\\mathbf{w} \\lvert \\mathcal{D})$ in neural networks is untractable. We therefore have to approximate the true posterior with a variational distribution $q(\\mathbf{w} \\lvert \\boldsymbol{\\theta})$ of known functional form whose parameters we want to estimate. This can be done by minimizing the [Kullback-Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) between $q(\\mathbf{w} \\lvert \\boldsymbol{\\theta})$ and the true posterior $p(\\mathbf{w} \\lvert \\mathcal{D})$. As shown in [Appendix](#Appendix), the corresponding optimization objective or cost function is\n",
2626
"\n",
2727
"$$\n",
2828
"\\mathcal{F}(\\mathcal{D},\\boldsymbol{\\theta}) = \n",
@@ -388,7 +388,7 @@
388388
"\\mathcal{F}(\\mathcal{D},\\boldsymbol{\\theta}) + \\log p(\\mathcal{D})\n",
389389
"$$\n",
390390
"\n",
391-
"In order to minimize $\\mathrm{KL}(q(\\mathbf{w} \\lvert \\boldsymbol{\\theta}) \\mid\\mid p(\\mathbf{w} \\lvert \\mathcal{D}))$ w.r.t. $\\boldsymbol{\\theta}$ we only need to minimize $\\mathcal{F}(\\mathcal{D},\\boldsymbol{\\theta})$ as $p(\\mathcal{D})$ doesn't depend on $\\boldsymbol{\\theta}$. The negative variational free energy is also known as *evidence lower bound* $\\mathcal{L}(\\mathcal{D},\\boldsymbol{\\theta})$ (ELBO). \n",
391+
"In order to minimize $\\mathrm{KL}(q(\\mathbf{w} \\lvert \\boldsymbol{\\theta}) \\mid\\mid p(\\mathbf{w} \\lvert \\mathcal{D}))$, we only need to minimize $\\mathcal{F}(\\mathcal{D},\\boldsymbol{\\theta})$ w.r.t. $\\boldsymbol{\\theta}$ as $p(\\mathcal{D})$ doesn't depend on $\\boldsymbol{\\theta}$. The negative variational free energy is also known as *evidence lower bound* $\\mathcal{L}(\\mathcal{D},\\boldsymbol{\\theta})$ (ELBO). \n",
392392
"\n",
393393
"$$\n",
394394
"\\mathrm{KL}(q(\\mathbf{w} \\lvert \\boldsymbol{\\theta}) \\mid\\mid p(\\mathbf{w} \\lvert \\mathcal{D})) =\n",

0 commit comments

Comments
 (0)