pymc-devs
diff --git a/‎examples/bart/bart_heteroscedasticity.ipynb‎
Lines changed: 130 additions & 28 deletions b/‎examples/bart/bart_heteroscedasticity.ipynb‎
Lines changed: 130 additions & 28 deletions
diff --git a/‎examples/bart/bart_heteroscedasticity.myst.md‎
Lines changed: 15 additions & 8 deletions b/‎examples/bart/bart_heteroscedasticity.myst.md‎
Lines changed: 15 additions & 8 deletions
diff --git a/‎examples/bart/bart_introduction.ipynb‎
Lines changed: 381 additions & 129 deletions b/‎examples/bart/bart_introduction.ipynb‎
Lines changed: 381 additions & 129 deletions
diff --git a/‎examples/bart/bart_introduction.myst.md‎
Lines changed: 27 additions & 22 deletions b/‎examples/bart/bart_introduction.myst.md‎
Lines changed: 27 additions & 22 deletions
diff --git a/‎examples/bart/bart_quantile_regression.ipynb‎
Lines changed: 173 additions & 28 deletions b/‎examples/bart/bart_quantile_regression.ipynb‎
Lines changed: 173 additions & 28 deletions
diff --git a/‎examples/bart/bart_quantile_regression.myst.md‎
Lines changed: 3 additions & 1 deletion b/‎examples/bart/bart_quantile_regression.myst.md‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎examples/gaussian_processes/GP-smoothing.ipynb‎
Lines changed: 316 additions & 188 deletions b/‎examples/gaussian_processes/GP-smoothing.ipynb‎
Lines changed: 316 additions & 188 deletions
diff --git a/‎examples/gaussian_processes/GP-smoothing.myst.md‎
Lines changed: 50 additions & 32 deletions b/‎examples/gaussian_processes/GP-smoothing.myst.md‎
Lines changed: 50 additions & 32 deletions
diff --git a/‎examples/case_studies/LKJ.ipynb‎ ‎examples/howto/LKJ.ipynb‎examples/case_studies/LKJ.ipynb renamed to examples/howto/LKJ.ipynb
Lines changed: 2 additions & 2 deletions b/‎examples/case_studies/LKJ.ipynb‎ ‎examples/howto/LKJ.ipynb‎examples/case_studies/LKJ.ipynb renamed to examples/howto/LKJ.ipynb
Lines changed: 2 additions & 2 deletions
diff --git a/‎examples/case_studies/LKJ.myst.md‎ ‎examples/howto/LKJ.myst.md‎examples/case_studies/LKJ.myst.md renamed to examples/howto/LKJ.myst.md
Lines changed: 2 additions & 2 deletions b/‎examples/case_studies/LKJ.myst.md‎ ‎examples/howto/LKJ.myst.md‎examples/case_studies/LKJ.myst.md renamed to examples/howto/LKJ.myst.md
Lines changed: 2 additions & 2 deletions
@@ -1,5 +1,6 @@
 ---
 jupytext:
+  formats: ipynb,md
   text_representation:
     extension: .md
     format_name: myst
@@ -81,7 +82,7 @@ Next, we specify the model. Note that we just need one BART distribution which c
 
 ```{code-cell} ipython3
 with pm.Model() as model_marketing_full:
-    w = pmb.BART("w", X=X, Y=np.log(Y), m=200, shape=(2, n_obs))
+    w = pmb.BART("w", X=X, Y=np.log(Y), m=100, shape=(2, n_obs))
     y = pm.Gamma("y", mu=pm.math.exp(w[0]), sigma=pm.math.exp(w[1]), observed=Y)
 
 pm.model_to_graphviz(model=model_marketing_full)
@@ -91,7 +92,7 @@ We now fit the model.
 
 ```{code-cell} ipython3
 with model_marketing_full:
-    idata_marketing_full = pm.sample(random_seed=rng)
+    idata_marketing_full = pm.sample(2000, random_seed=rng, compute_convergence_checks=False)
     posterior_predictive_marketing_full = pm.sample_posterior_predictive(
         trace=idata_marketing_full, random_seed=rng
     )
@@ -104,7 +105,7 @@ We can now visualize the posterior predictive distribution of the mean and the l
 ```{code-cell} ipython3
 posterior_mean = idata_marketing_full.posterior["w"].mean(dim=("chain", "draw"))[0]
 
-w_hdi = az.hdi(ary=idata_marketing_full, group="posterior", var_names=["w"])
+w_hdi = az.hdi(ary=idata_marketing_full, group="posterior", var_names=["w"], hdi_prob=0.5)
 
 pps = az.extract(
     posterior_predictive_marketing_full, group="posterior_predictive", var_names=["y"]
@@ -116,14 +117,19 @@ idx = np.argsort(X[:, 0])
 
 
 fig, ax = plt.subplots()
-az.plot_hdi(x=X[:, 0], y=pps, ax=ax, fill_kwargs={"alpha": 0.3, "label": r"Likelihood $94\%$ HDI"})
+az.plot_hdi(
+    x=X[:, 0],
+    y=pps,
+    ax=ax,
+    hdi_prob=0.90,
+    fill_kwargs={"alpha": 0.3, "label": r"Observations $90\%$ HDI"},
+)
 az.plot_hdi(
     x=X[:, 0],
     hdi_data=np.exp(w_hdi["w"].sel(w_dim_0=0)),
     ax=ax,
-    fill_kwargs={"alpha": 0.6, "label": r"Mean $94\%$ HDI"},
+    fill_kwargs={"alpha": 0.6, "label": r"Mean $50\%$ HDI"},
 )
-ax.plot(X[:, 0][idx], np.exp(posterior_mean[idx]), c="black", lw=3, label="Posterior Mean")
 ax.plot(df["youtube"], df["sales"], "o", c="C0", label="Raw Data")
 ax.legend(loc="upper left")
 ax.set(
@@ -138,8 +144,9 @@ The fit looks good! In fact, we see that the mean and variance increase as a fun
 +++
 
 ## Authors
-- Authored by [Juan Orduz](https://juanitorduz.github.io/) in February 2023 
-- Rerun by Osvaldo Martin in March 2023
+- Authored by [Juan Orduz](https://juanitorduz.github.io/) in Feb, 2023 
+- Rerun by Osvaldo Martin in Mar, 2023
+- Rerun by Osvaldo Martin in Nov, 2023
 
 +++
 
 
@@ -1,5 +1,6 @@
 ---
 jupytext:
+  formats: ipynb,md
   text_representation:
     extension: .md
     format_name: myst
@@ -146,7 +147,8 @@ We can see that when we use a Normal likelihood, and from that fit we compute th
 
 ## Authors
 * Authored by Osvaldo Martin in Jan, 2023
-* Rerun by Osvaldo Martin in March 2023
+* Rerun by Osvaldo Martin in Mar, 2023
+* Rerun by Osvaldo Martin in Nov, 2023
 
 +++
 
 
@@ -21,35 +21,45 @@ If we assume the functional dependency between $x$ and $y$ is **linear** then, b
 However, the **functional form** of $y=f(x)$ is **not always known in advance**, and it might be hard to choose which one to fit, given the data. For example, you wouldn't necessarily know which function to use, given the following observed data. Assume you haven't seen the formula that generated it:
 
 ```{code-cell} ipython3
-%pylab inline
-figsize(12, 6);
+import arviz as az
+import matplotlib.pyplot as plt
+import numpy as np
+import pymc as pm
+import scipy.stats as stats
+
+from pytensor import shared
+
+%config InlineBackend.figure_format = "retina"
 ```
 
 ```{code-cell} ipython3
-import numpy as np
-import scipy.stats as stats
+RANDOM_SEED = 8927
+rng = np.random.default_rng(RANDOM_SEED)
+az.style.use("arviz-darkgrid")
+plt.rcParams["figure.figsize"] = (10, 4)
+```
 
+```{code-cell} ipython3
 x = np.linspace(0, 50, 100)
-y = np.exp(1.0 + np.power(x, 0.5) - np.exp(x / 15.0)) + np.random.normal(scale=1.0, size=x.shape)
+y = np.exp(1.0 + np.power(x, 0.5) - np.exp(x / 15.0)) + rng.normal(scale=1.0, size=x.shape)
 
-plot(x, y)
-xlabel("x")
-ylabel("y")
-title("Observed Data");
+fig, ax = plt.subplots()
+ax.plot(x, y)
+ax.set(title="Observed Data", xlabel="x", ylabel="y");
 ```
 
 ### Let's try a linear regression first
 
 As humans, we see that there is a non-linear dependency with some noise, and we would like to capture that dependency. If we perform a linear regression, we see that the "smoothed" data is less than satisfactory:
 
 ```{code-cell} ipython3
-plot(x, y)
-xlabel("x")
-ylabel("y")
+lin = stats.linregress(x, y)
 
+fig, ax = plt.subplots()
+ax.plot(x, y)
 lin = stats.linregress(x, y)
-plot(x, lin.intercept + lin.slope * x)
-title("Linear Smoothing");
+ax.plot(x, lin.intercept + lin.slope * x)
+ax.set(title="Linear Smoothing", xlabel="x", ylabel="y");
 ```
 
 ### Linear regression model recap
@@ -90,15 +100,9 @@ When we estimate the maximum likelihood values of the hidden process $z_i$ at ea
 
 +++
 
-### Let's describe the above GP-smoothing model in PyMC3
+### Let's describe the above GP-smoothing model in PyMC
 
-```{code-cell} ipython3
-import pymc3 as pm
-
-from pymc3.distributions.timeseries import GaussianRandomWalk
-from scipy import optimize
-from theano import shared
-```
++++
 
 Let's create a model with a shared parameter for specifying different levels of smoothing. We use very wide priors for the "mu" and "tau" parameters of the hidden Brownian motion, which you can adjust according to your application.
 
@@ -110,7 +114,9 @@ with model:
     smoothing_param = shared(0.9)
     mu = pm.Normal("mu", sigma=LARGE_NUMBER)
     tau = pm.Exponential("tau", 1.0 / LARGE_NUMBER)
-    z = GaussianRandomWalk("z", mu=mu, tau=tau / (1.0 - smoothing_param), shape=y.shape)
+    z = pm.GaussianRandomWalk(
+        "z", mu=mu, sigma=pm.math.sqrt((1.0 - smoothing_param) / tau), shape=y.shape
+    )
     obs = pm.Normal("obs", mu=z, tau=tau / smoothing_param, observed=y)
 ```
 
@@ -134,9 +140,10 @@ Let's try to allocate 50% variance to the noise, and see if the result matches o
 smoothing = 0.5
 z_val = infer_z(smoothing)
 
-plot(x, y)
-plot(x, z_val)
-title(f"Smoothing={smoothing}");
+fig, ax = plt.subplots()
+ax.plot(x, y)
+ax.plot(x, z_val)
+ax.set(title=f"Smoothing={smoothing}");
 ```
 
 It appears that the variance is split evenly between the noise and the hidden process, as expected. 
@@ -147,17 +154,18 @@ Let's try gradually increasing the smoothness parameter to see if we can obtain
 smoothing = 0.9
 z_val = infer_z(smoothing)
 
-plot(x, y)
-plot(x, z_val)
-title(f"Smoothing={smoothing}");
+fig, ax = plt.subplots()
+ax.plot(x, y)
+ax.plot(x, z_val)
+ax.set(title=f"Smoothing={smoothing}");
 ```
 
 ### Smoothing "to the limits"
 
 By increasing the smoothing parameter, we can gradually make the inferred values of the hidden Brownian motion approach the average value of the data. This is because as we increase the smoothing parameter, we allow less and less of the variance to be allocated to the Brownian motion, so eventually it approaches the process which almost doesn't change over the domain of $x$:
 
 ```{code-cell} ipython3
-fig, axes = subplots(2, 2)
+fig, axes = plt.subplots(nrows=2, ncols=2)
 
 for ax, smoothing in zip(axes.ravel(), [0.95, 0.99, 0.999, 0.9999]):
     z_val = infer_z(smoothing)
@@ -167,9 +175,19 @@ for ax, smoothing in zip(axes.ravel(), [0.95, 0.99, 0.999, 0.9999]):
     ax.set_title(f"Smoothing={smoothing:05.4f}")
 ```
 
+## References
+
+:::{bibliography}
+:filter: docname in docnames
+:::
+
++++
+
+## Authors
+* Authored by [Andrey Kuzmenko](http://github.com/akuz)
+* Updated to v5 by [Juan Orduz](https://juanitorduz.github.io/) in Nov 2023 ([pymc-examples#603](https://github.com/pymc-devs/pymc-examples/pull/603))
+
 ```{code-cell} ipython3
 %load_ext watermark
 %watermark -n -u -v -iv -w
 ```
-
-This example originally contributed by: Andrey Kuzmenko, http://github.com/akuz
 
@@ -161,7 +161,7 @@
     "\n",
     "The LKJ distribution provides a prior on the correlation matrix, $\\mathbf{C} = \\textrm{Corr}(x_i, x_j)$, which, combined with priors on the standard deviations of each component, [induces](http://www3.stat.sinica.edu.tw/statistica/oldpdf/A10n416.pdf) a prior on the covariance matrix, $\\Sigma$. Since inverting $\\Sigma$ is numerically unstable and inefficient, it is computationally advantageous to use the [Cholesky decompositon](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\\Sigma$, $\\Sigma = \\mathbf{L} \\mathbf{L}^{\\top}$, where $\\mathbf{L}$ is a lower-triangular matrix. This decompositon allows computation of the term $(\\mathbf{x} - \\mu)^{\\top} \\Sigma^{-1} (\\mathbf{x} - \\mu)$ using back-substitution, which is more numerically stable and efficient than direct matrix inversion.\n",
     "\n",
-    "PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the [LKJCholeskyCov](https://docs.pymc.io/en/latest/api/distributions/generated/pymc.LKJCholeskyCov.html) distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\\mathbf{x}$. The LKJ distribution has the density $f(\\mathbf{C}\\ |\\ \\eta) \\propto |\\mathbf{C}|^{\\eta - 1}$, so $\\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\\eta \\to \\infty$.\n",
+    "PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the {class}`pymc.LKJCholeskyCov` distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\\mathbf{x}$. The LKJ distribution has the density $f(\\mathbf{C}\\ |\\ \\eta) \\propto |\\mathbf{C}|^{\\eta - 1}$, so $\\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\\eta \\to \\infty$.\n",
     "\n",
     "In this example, we model the standard deviations with $\\textrm{Exponential}(1.0)$ priors, and the correlation matrix as $\\mathbf{C} \\sim \\textrm{LKJ}(\\eta = 2)$."
    ]
@@ -308,7 +308,7 @@
     "id": "QOCi1RKvr2Ph"
    },
    "source": [
-    "We sample from this model using NUTS and give the trace to [ArviZ](https://arviz-devs.github.io/arviz/) for summarization:"
+    "We sample from this model using NUTS and give the trace to {ref}`arviz` for summarization:"
    ]
   },
   {
 
@@ -101,7 +101,7 @@ $$f(\mathbf{x}\ |\ \mu, \Sigma^{-1}) = (2 \pi)^{-\frac{k}{2}} |\Sigma|^{-\frac{1
 
 The LKJ distribution provides a prior on the correlation matrix, $\mathbf{C} = \textrm{Corr}(x_i, x_j)$, which, combined with priors on the standard deviations of each component, [induces](http://www3.stat.sinica.edu.tw/statistica/oldpdf/A10n416.pdf) a prior on the covariance matrix, $\Sigma$. Since inverting $\Sigma$ is numerically unstable and inefficient, it is computationally advantageous to use the [Cholesky decompositon](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\Sigma$, $\Sigma = \mathbf{L} \mathbf{L}^{\top}$, where $\mathbf{L}$ is a lower-triangular matrix. This decompositon allows computation of the term $(\mathbf{x} - \mu)^{\top} \Sigma^{-1} (\mathbf{x} - \mu)$ using back-substitution, which is more numerically stable and efficient than direct matrix inversion.
 
-PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the [LKJCholeskyCov](https://docs.pymc.io/en/latest/api/distributions/generated/pymc.LKJCholeskyCov.html) distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\mathbf{x}$. The LKJ distribution has the density $f(\mathbf{C}\ |\ \eta) \propto |\mathbf{C}|^{\eta - 1}$, so $\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\eta \to \infty$.
+PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the {class}`pymc.LKJCholeskyCov` distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\mathbf{x}$. The LKJ distribution has the density $f(\mathbf{C}\ |\ \eta) \propto |\mathbf{C}|^{\eta - 1}$, so $\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\eta \to \infty$.
 
 In this example, we model the standard deviations with $\textrm{Exponential}(1.0)$ priors, and the correlation matrix as $\mathbf{C} \sim \textrm{LKJ}(\eta = 2)$.
 
@@ -175,7 +175,7 @@ with model:
 
 +++ {"id": "QOCi1RKvr2Ph"}
 
-We sample from this model using NUTS and give the trace to [ArviZ](https://arviz-devs.github.io/arviz/) for summarization:
+We sample from this model using NUTS and give the trace to {ref}`arviz` for summarization:
 
 ```{code-cell} ipython3
 ---
Original file line number	Diff line number	Diff line change
`@@ -161,7 +161,7 @@`
`161`	`161`	`"\n",`
`162`	`162`	"The LKJ distribution provides a prior on the correlation matrix, $\\mathbf{C} = \\textrm{Corr}(x_i, x_j)$, which, combined with priors on the standard deviations of each component, [induces](http://www3.stat.sinica.edu.tw/statistica/oldpdf/A10n416.pdf) a prior on the covariance matrix, $\\Sigma$. Since inverting $\\Sigma$ is numerically unstable and inefficient, it is computationally advantageous to use the [Cholesky decompositon](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\\Sigma$, $\\Sigma = \\mathbf{L} \\mathbf{L}^{\\top}$, where $\\mathbf{L}$ is a lower-triangular matrix. This decompositon allows computation of the term $(\\mathbf{x} - \\mu)^{\\top} \\Sigma^{-1} (\\mathbf{x} - \\mu)$ using back-substitution, which is more numerically stable and efficient than direct matrix inversion.\n",
`163`	`163`	`"\n",`
`164`		- "PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the [LKJCholeskyCov](https://docs.pymc.io/en/latest/api/distributions/generated/pymc.LKJCholeskyCov.html) distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\\mathbf{x}$. The LKJ distribution has the density $f(\\mathbf{C}\\ \|\\ \\eta) \\propto \|\\mathbf{C}\|^{\\eta - 1}$, so $\\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\\eta \\to \\infty$.\n",
	`164`	+ "PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the {class}`pymc.LKJCholeskyCov` distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\\mathbf{x}$. The LKJ distribution has the density $f(\\mathbf{C}\\ \|\\ \\eta) \\propto \|\\mathbf{C}\|^{\\eta - 1}$, so $\\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\\eta \\to \\infty$.\n",
`165`	`165`	`"\n",`
`166`	`166`	`"In this example, we model the standard deviations with $\\textrm{Exponential}(1.0)$ priors, and the correlation matrix as $\\mathbf{C} \\sim \\textrm{LKJ}(\\eta = 2)$."`
`167`	`167`	`]`
`@@ -308,7 +308,7 @@`
`308`	`308`	`"id": "QOCi1RKvr2Ph"`
`309`	`309`	`},`
`310`	`310`	`"source": [`
`311`		`- "We sample from this model using NUTS and give the trace to [ArviZ](https://arviz-devs.github.io/arviz/) for summarization:"`
	`311`	+ "We sample from this model using NUTS and give the trace to {ref}`arviz` for summarization:"
`312`	`312`	`]`
`313`	`313`	`},`
`314`	`314`	`{`