Skip to content

Commit 2d1664f

Browse files
authored
add overview/getting started page to docs (#30)
* add overview/getting started page to docs * add preview link to PR description gh action * update changelog
1 parent 5eccfd6 commit 2d1664f

File tree

7 files changed

+255
-13
lines changed

7 files changed

+255
-13
lines changed
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
name: Read the Docs Pull Request Preview
2+
on:
3+
pull_request_target:
4+
types:
5+
- opened
6+
7+
permissions:
8+
pull-requests: write
9+
10+
jobs:
11+
documentation-links:
12+
runs-on: ubuntu-latest
13+
steps:
14+
- uses: readthedocs/readthedocs-preview@main
15+
with:
16+
project-slug: "xarray-einstats"

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -330,7 +330,7 @@ and select the DOI corresponding to the version you used
330330

331331
or in bibtex format:
332332

333-
```bibtex
333+
```none
334334
@software{xarray_einstats2022,
335335
author = {Abril-Pla, Oriol},
336336
title = {{xarray-einstats}},

docs/source/changelog.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010

1111
### Documentation
1212
* Add a section on running tests locally in contributing docs {pull}`28`
13+
* Add a getting started page to the docs {pull}`30`
1314

1415
## v.0.3.0 (2022 Jun 19)
1516
### New features

docs/source/conf.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -53,20 +53,21 @@
5353
]
5454

5555
# The reST default role (used for this markup: `text`) to use for all documents.
56-
default_role = "code"
56+
default_role = "autolink"
5757

5858
# If true, '()' will be appended to :func: etc. cross-reference text.
5959
add_function_parentheses = False
6060

6161
extlinks = {
62-
"issue": ("https://github.com/arviz-devs/xarray-einstats/issues/%s", "GH#"),
63-
"pull": ("https://github.com/arviz-devs/xarray-einstats/pull/%s", "PR#"),
62+
"issue": ("https://github.com/arviz-devs/xarray-einstats/issues/%s", "GH#%s"),
63+
"pull": ("https://github.com/arviz-devs/xarray-einstats/pull/%s", "PR#%s"),
6464
}
6565

6666
# -- Options for extensions
6767

68-
jupyter_execute_notebooks = "auto"
69-
execution_excludepatterns = ["*.ipynb"]
68+
nb_execution_mode = "auto"
69+
nb_execution_excludepatterns = ["*.ipynb"]
70+
nb_kernel_rgx_aliases = {".*": "python3"}
7071
myst_enable_extensions = ["colon_fence", "deflist", "dollarmath", "amsmath"]
7172

7273
autosummary_generate = True
@@ -105,6 +106,7 @@
105106

106107
intersphinx_mapping = {
107108
"arviz": ("https://python.arviz.org/en/latest/", None),
109+
"arviz_org": ("https://www.arviz.org/en/latest/", None),
108110
"dask": ("https://docs.dask.org/en/latest/", None),
109111
"numba": ("https://numba.pydata.org/numba-doc/dev", None),
110112
"numpy": ("https://numpy.org/doc/stable/", None),

docs/source/getting_started.md

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
---
2+
jupytext:
3+
text_representation:
4+
extension: .md
5+
format_name: myst
6+
kernelspec:
7+
display_name: Python 3 (ipykernel)
8+
language: python
9+
name: python3
10+
---
11+
12+
(getting_started)=
13+
# Getting started
14+
15+
## Welcome to `xarray-einstats`!
16+
`xarray-einstats` is an open source Python library part of the
17+
{doc}`ArviZ project <arviz_org:index>`.
18+
It acts as a bridge between the [xarray](https://xarray.dev/)
19+
library for labelled arrays and libraries for raw arrays
20+
such as [NumPy](https://numpy.org/) or [SciPy](https://scipy.org/).
21+
22+
Xarray has as "Compatibility with the broader ecosystem" as
23+
one of its main {doc}`goals <xarray:getting-started-guide/why-xarray>`.
24+
Which is what allows `xarray-einstats` to perform this
25+
_bridge_ role with minimal code and duplication.
26+
27+
## Overview
28+
`xarray-einstats` provides wrappers for:
29+
30+
* Most of the functions in {mod}`numpy.linalg`
31+
* A subset of {mod}`scipy.stats`
32+
* `rearrange` and `reduce` from [einops](http://einops.rocks/)
33+
34+
These wrappers have the same names and functionality as the original functions.
35+
The difference in behaviour is that the wrappers will not make assumptions
36+
about the meaning of a dimension based on its position
37+
nor they have arguments like `axis` or `axes`.
38+
They will have `dims` argument that take _dimension names_ instead of
39+
integers indicating the positions of the dimensions on which to act.
40+
41+
It also provides a handful of re-implemented functions:
42+
43+
* {func}`xarray_einstats.numba.histogram`
44+
* {class}`xarray_einstats.stats.multivariate_normal`
45+
46+
These are partially reimplemented because the original function
47+
doesn't yet support multidimensional and/or batched computations.
48+
They also share the name with a function in NumPy or SciPy,
49+
but they only implement a subset of the features.
50+
Moreover, the goal is for those to eventually be wrappers too.
51+
52+
53+
## Using `xarray-einstats`
54+
### DataArray inputs
55+
Functions in `xarray-einstats` are designed to work on {class}`~xarray.DataArray` objects.
56+
57+
Let's load some example data:
58+
59+
```{code-cell} ipython3
60+
from xarray_einstats import linalg, stats, tutorial
61+
62+
da = tutorial.generate_matrices_dataarray(4)
63+
da
64+
```
65+
66+
and show an example:
67+
68+
```{code-cell} ipython3
69+
stats.skew(da, dims=["batch", "dim2"])
70+
```
71+
72+
`xarray-einstats` uses `dims` as argument throughout the codebase
73+
as an alternative to both `axis` or `axes` indistinctively,
74+
also as alternative to the `(..., M, M)` convention used by NumPy.
75+
76+
The use of `dims` follows {func}`~xarray.dot`, instead of the singular
77+
`dim` argument used for example in {meth}`~xarray.DataArray.mean`.
78+
Both a single dimension or multiple are valid inputs,
79+
and using `dims` emphasizes the fact that operations
80+
and reductions can be performed over multiple dimensions at the same time.
81+
Moreover, in linear algebra functions, `dims` is often restricted
82+
to a 2 element list as it indicates which dimensions define the matrices,
83+
interpreting all the others as batch dimensions.
84+
85+
That means that the two calls below are equivalent, even if the dimension
86+
names of the inputs are not, _because their dimension names are the same_.
87+
Thus,
88+
89+
```{code-cell} ipython3
90+
linalg.det(da, dims=["dim", "dim2"])
91+
```
92+
93+
returns the same as:
94+
95+
```{code-cell} ipython3
96+
linalg.det(da.transpose("dim2", "experiment", "dim", "batch"), dims=["dim", "dim2"])
97+
```
98+
99+
:::{important}
100+
In `xarray_einstats` only the dimension names matter, not their order.
101+
:::
102+
103+
### Dataset and GroupBy inputs
104+
While the `DataArray` is the base xarray object, there are also
105+
other xarray objects that are key while using the library.
106+
These other objects such as {class}`~xarray.Dataset` are implemented as
107+
a collection of `DataArray` objects, and all include a `.map`
108+
method in order to apply the same function to all its child `DataArrays`.
109+
110+
```{code-cell} ipython3
111+
ds = tutorial.generate_mcmc_like_dataset(9438)
112+
ds
113+
```
114+
115+
We can use {meth}`~xarray.Dataset.map` to apply the same function to
116+
all the 4 child `DataArray`s in `ds`, but this will not always be possible.
117+
When using `.map`, the function provided is applied to all child `DataArray`s
118+
with the same `**kwargs`.
119+
120+
If we try doing:
121+
122+
```{code-cell} ipython3
123+
:tags: [raises-exception, hide-output]
124+
125+
ds.map(stats.circmean, dims=("chain", "draw"))
126+
```
127+
128+
we get an exception. The `chain` and `draw` dimensions are not present in all
129+
child `DataArrays`. Instead, we could apply it only to the variables
130+
that have both `chain` and `dim` dimensions.
131+
132+
133+
```{code-cell} ipython3
134+
ds_samples = ds[["mu", "sigma", "score"]]
135+
ds_samples.map(stats.circmean, dims=("chain", "draw"))
136+
```
137+
138+
:::{attention}
139+
In general, you should prefer using `.map` attribute over using non-`DataArray` objects as
140+
input to the `xarray_einstats` directly.
141+
`.map` will ensure no unexpected broadcasting between the multiple child `DataArray`s takes place.
142+
See the examples below for some examples.
143+
144+
However, if you are using functions that reduce dimensions on non-`DataArray` inputs
145+
whose child `DataArray`s all have all the dimensions to reduce you will
146+
not trigger any such broadcasting,
147+
_and we have included that behaviour on our test suite to ensure it stays this way_.
148+
:::
149+
150+
It is also possible to do
151+
152+
153+
```{code-cell} ipython3
154+
stats.circmean(ds_samples, dims=("chain", "draw"))
155+
```
156+
157+
Here, all child `DataArray`s have both `chain` and `draw` dimension,
158+
so as expected, the result is the same.
159+
There are some cases however, in which _not_ using `.map` triggers
160+
some broadcasting operations which will generally not be the desired
161+
output.
162+
163+
If we use the `.map` attribute, the function is applied to each
164+
child `DataArray` independently from the others:
165+
166+
167+
```{code-cell} ipython3
168+
ds.map(stats.rankdata)
169+
```
170+
171+
whereas without using the `.map` attribute, extra broadcasting can happen:
172+
173+
174+
```{code-cell} ipython3
175+
stats.rankdata(ds)
176+
```
177+
178+
---
179+
180+
The behaviour on {class}`~xarray.core.groupby.DataArrayGroupBy` for example is very similar
181+
to the examples we have shown for `Dataset`s:
182+
183+
184+
```{code-cell} ipython3
185+
da = ds["mu"].assign_coords(team=["a", "b", "b", "a", "c", "b"])
186+
da
187+
```
188+
189+
when we apply a "group by" operation over the `team` dimension, we generate a
190+
`DataArrayGroupBy` with 3 groups.
191+
192+
```{code-cell} ipython3
193+
gb = da.groupby("team")
194+
gb
195+
```
196+
197+
on which we can use `.map` to apply a function from `xarray-einstats` over
198+
all groups independently:
199+
200+
```{code-cell} ipython3
201+
gb.map(stats.median_abs_deviation, dims=["draw", "team"])
202+
```
203+
204+
which as expected has performed the operation group-wise, yielding a different
205+
result than either
206+
207+
```{code-cell} ipython3
208+
stats.median_abs_deviation(da, dims=["draw", "team"])
209+
```
210+
211+
or
212+
213+
```{code-cell} ipython3
214+
stats.median_abs_deviation(da, dims="draw")
215+
```
216+
217+
:::{seealso}
218+
Check out the {ref}`xarray:groupby` page on xarray's documentation.
219+
:::

docs/source/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
:hidden:
77

88
installation
9+
getting_started
910
tutorials/index
1011
api/index
1112
background/index

src/xarray_einstats/stats.py

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -400,8 +400,9 @@ def _apply_nonreduce_func(func, da, dims, kwargs, func_kwargs=None):
400400
if dims is None:
401401
dims = get_default_dims(da.dims)
402402
if not isinstance(dims, str):
403-
da = da.stack(__aux_dim__=dims)
404-
core_dims = ["__aux_dim__"]
403+
aux_dim = f"__aux_dim__:{','.join(dims)}"
404+
da = da.stack({aux_dim: dims})
405+
core_dims = [aux_dim]
405406
unstack = True
406407
else:
407408
core_dims = [dims]
@@ -414,7 +415,7 @@ def _apply_nonreduce_func(func, da, dims, kwargs, func_kwargs=None):
414415
**kwargs,
415416
)
416417
if unstack:
417-
return out_da.unstack("__aux_dim__")
418+
return out_da.unstack(aux_dim)
418419
return out_da
419420

420421

@@ -427,8 +428,9 @@ def _apply_reduce_func(func, da, dims, kwargs, func_kwargs=None):
427428
if dims is None:
428429
dims = get_default_dims(da.dims)
429430
if not isinstance(dims, str):
430-
da = da.stack(__aux_dim__=dims)
431-
core_dims = ["__aux_dim__"]
431+
aux_dim = f"__aux_dim__:{','.join(dims)}"
432+
da = da.stack({aux_dim: dims})
433+
core_dims = [aux_dim]
432434
else:
433435
core_dims = [dims]
434436
out_da = xr.apply_ufunc(
@@ -573,8 +575,9 @@ def median_abs_deviation(da, dims=None, *, center=None, scale=1, nan_policy=None
573575
if dims is None:
574576
dims = get_default_dims(da.dims)
575577
if not isinstance(dims, str):
576-
da = da.stack(__aux_dim__=dims)
577-
core_dims = ["__aux_dim__"]
578+
aux_dim = f"__aux_dim__:{','.join(dims)}"
579+
da = da.stack({aux_dim: dims})
580+
core_dims = [aux_dim]
578581
else:
579582
core_dims = [dims]
580583

0 commit comments

Comments
 (0)