You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/gaussian_processes/GP-smoothing.myst.md
+50-32Lines changed: 50 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,35 +21,45 @@ If we assume the functional dependency between $x$ and $y$ is **linear** then, b
21
21
However, the **functional form** of $y=f(x)$ is **not always known in advance**, and it might be hard to choose which one to fit, given the data. For example, you wouldn't necessarily know which function to use, given the following observed data. Assume you haven't seen the formula that generated it:
As humans, we see that there is a non-linear dependency with some noise, and we would like to capture that dependency. If we perform a linear regression, we see that the "smoothed" data is less than satisfactory:
@@ -90,15 +100,9 @@ When we estimate the maximum likelihood values of the hidden process $z_i$ at ea
90
100
91
101
+++
92
102
93
-
### Let's describe the above GP-smoothing model in PyMC3
103
+
### Let's describe the above GP-smoothing model in PyMC
94
104
95
-
```{code-cell} ipython3
96
-
import pymc3 as pm
97
-
98
-
from pymc3.distributions.timeseries import GaussianRandomWalk
99
-
from scipy import optimize
100
-
from theano import shared
101
-
```
105
+
+++
102
106
103
107
Let's create a model with a shared parameter for specifying different levels of smoothing. We use very wide priors for the "mu" and "tau" parameters of the hidden Brownian motion, which you can adjust according to your application.
104
108
@@ -110,7 +114,9 @@ with model:
110
114
smoothing_param = shared(0.9)
111
115
mu = pm.Normal("mu", sigma=LARGE_NUMBER)
112
116
tau = pm.Exponential("tau", 1.0 / LARGE_NUMBER)
113
-
z = GaussianRandomWalk("z", mu=mu, tau=tau / (1.0 - smoothing_param), shape=y.shape)
@@ -134,9 +140,10 @@ Let's try to allocate 50% variance to the noise, and see if the result matches o
134
140
smoothing = 0.5
135
141
z_val = infer_z(smoothing)
136
142
137
-
plot(x, y)
138
-
plot(x, z_val)
139
-
title(f"Smoothing={smoothing}");
143
+
fig, ax = plt.subplots()
144
+
ax.plot(x, y)
145
+
ax.plot(x, z_val)
146
+
ax.set(title=f"Smoothing={smoothing}");
140
147
```
141
148
142
149
It appears that the variance is split evenly between the noise and the hidden process, as expected.
@@ -147,17 +154,18 @@ Let's try gradually increasing the smoothness parameter to see if we can obtain
147
154
smoothing = 0.9
148
155
z_val = infer_z(smoothing)
149
156
150
-
plot(x, y)
151
-
plot(x, z_val)
152
-
title(f"Smoothing={smoothing}");
157
+
fig, ax = plt.subplots()
158
+
ax.plot(x, y)
159
+
ax.plot(x, z_val)
160
+
ax.set(title=f"Smoothing={smoothing}");
153
161
```
154
162
155
163
### Smoothing "to the limits"
156
164
157
165
By increasing the smoothing parameter, we can gradually make the inferred values of the hidden Brownian motion approach the average value of the data. This is because as we increase the smoothing parameter, we allow less and less of the variance to be allocated to the Brownian motion, so eventually it approaches the process which almost doesn't change over the domain of $x$:
158
166
159
167
```{code-cell} ipython3
160
-
fig, axes = subplots(2, 2)
168
+
fig, axes = plt.subplots(nrows=2, ncols=2)
161
169
162
170
for ax, smoothing in zip(axes.ravel(), [0.95, 0.99, 0.999, 0.9999]):
163
171
z_val = infer_z(smoothing)
@@ -167,9 +175,19 @@ for ax, smoothing in zip(axes.ravel(), [0.95, 0.99, 0.999, 0.9999]):
167
175
ax.set_title(f"Smoothing={smoothing:05.4f}")
168
176
```
169
177
178
+
## References
179
+
180
+
:::{bibliography}
181
+
:filter: docname in docnames
182
+
:::
183
+
184
+
+++
185
+
186
+
## Authors
187
+
* Authored by [Andrey Kuzmenko](http://github.com/akuz)
188
+
* Updated to v5 by [Juan Orduz](https://juanitorduz.github.io/) in Nov 2023 ([pymc-examples#603](https://github.com/pymc-devs/pymc-examples/pull/603))
189
+
170
190
```{code-cell} ipython3
171
191
%load_ext watermark
172
192
%watermark -n -u -v -iv -w
173
193
```
174
-
175
-
This example originally contributed by: Andrey Kuzmenko, http://github.com/akuz
Copy file name to clipboardExpand all lines: examples/howto/LKJ.ipynb
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -161,7 +161,7 @@
161
161
"\n",
162
162
"The LKJ distribution provides a prior on the correlation matrix, $\\mathbf{C} = \\textrm{Corr}(x_i, x_j)$, which, combined with priors on the standard deviations of each component, [induces](http://www3.stat.sinica.edu.tw/statistica/oldpdf/A10n416.pdf) a prior on the covariance matrix, $\\Sigma$. Since inverting $\\Sigma$ is numerically unstable and inefficient, it is computationally advantageous to use the [Cholesky decompositon](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\\Sigma$, $\\Sigma = \\mathbf{L} \\mathbf{L}^{\\top}$, where $\\mathbf{L}$ is a lower-triangular matrix. This decompositon allows computation of the term $(\\mathbf{x} - \\mu)^{\\top} \\Sigma^{-1} (\\mathbf{x} - \\mu)$ using back-substitution, which is more numerically stable and efficient than direct matrix inversion.\n",
163
163
"\n",
164
-
"PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the [LKJCholeskyCov](https://docs.pymc.io/en/latest/api/distributions/generated/pymc.LKJCholeskyCov.html) distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\\mathbf{x}$. The LKJ distribution has the density $f(\\mathbf{C}\\ |\\ \\eta) \\propto |\\mathbf{C}|^{\\eta - 1}$, so $\\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\\eta \\to \\infty$.\n",
164
+
"PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the {class}`pymc.LKJCholeskyCov` distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\\mathbf{x}$. The LKJ distribution has the density $f(\\mathbf{C}\\ |\\ \\eta) \\propto |\\mathbf{C}|^{\\eta - 1}$, so $\\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\\eta \\to \\infty$.\n",
165
165
"\n",
166
166
"In this example, we model the standard deviations with $\\textrm{Exponential}(1.0)$ priors, and the correlation matrix as $\\mathbf{C} \\sim \\textrm{LKJ}(\\eta = 2)$."
167
167
]
@@ -308,7 +308,7 @@
308
308
"id": "QOCi1RKvr2Ph"
309
309
},
310
310
"source": [
311
-
"We sample from this model using NUTS and give the trace to [ArviZ](https://arviz-devs.github.io/arviz/) for summarization:"
311
+
"We sample from this model using NUTS and give the trace to {ref}`arviz` for summarization:"
The LKJ distribution provides a prior on the correlation matrix, $\mathbf{C} = \textrm{Corr}(x_i, x_j)$, which, combined with priors on the standard deviations of each component, [induces](http://www3.stat.sinica.edu.tw/statistica/oldpdf/A10n416.pdf) a prior on the covariance matrix, $\Sigma$. Since inverting $\Sigma$ is numerically unstable and inefficient, it is computationally advantageous to use the [Cholesky decompositon](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\Sigma$, $\Sigma = \mathbf{L} \mathbf{L}^{\top}$, where $\mathbf{L}$ is a lower-triangular matrix. This decompositon allows computation of the term $(\mathbf{x} - \mu)^{\top} \Sigma^{-1} (\mathbf{x} - \mu)$ using back-substitution, which is more numerically stable and efficient than direct matrix inversion.
103
103
104
-
PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the [LKJCholeskyCov](https://docs.pymc.io/en/latest/api/distributions/generated/pymc.LKJCholeskyCov.html) distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\mathbf{x}$. The LKJ distribution has the density $f(\mathbf{C}\ |\ \eta) \propto |\mathbf{C}|^{\eta - 1}$, so $\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\eta \to \infty$.
104
+
PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the {class}`pymc.LKJCholeskyCov` distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\mathbf{x}$. The LKJ distribution has the density $f(\mathbf{C}\ |\ \eta) \propto |\mathbf{C}|^{\eta - 1}$, so $\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\eta \to \infty$.
105
105
106
106
In this example, we model the standard deviations with $\textrm{Exponential}(1.0)$ priors, and the correlation matrix as $\mathbf{C} \sim \textrm{LKJ}(\eta = 2)$.
107
107
@@ -175,7 +175,7 @@ with model:
175
175
176
176
+++ {"id": "QOCi1RKvr2Ph"}
177
177
178
-
We sample from this model using NUTS and give the trace to [ArviZ](https://arviz-devs.github.io/arviz/) for summarization:
178
+
We sample from this model using NUTS and give the trace to {ref}`arviz` for summarization:
0 commit comments