@@ -17,6 +17,8 @@ To install, run
1717(.venv) $ pip install xarray-einstats
1818```
1919
20+ See the docs for more [ extensive install instructions] ( https://einstats.python.arviz.org/en/latest/installation.html ) .
21+
2022## Overview
2123As stated in their website:
2224
@@ -39,278 +41,16 @@ In some other cases however, using xarray can result in overly verbose code
3941that often also becomes less clear. ` xarray_einstats ` provides wrappers
4042around some numpy and scipy functions (mostly ` numpy.linalg ` and ` scipy.stats ` )
4143and around [ einops] ( https://einops.rocks/ ) with an api and features adapted to xarray.
44+ Continue at the [ getting started page] ( https://einstats.python.arviz.org/en/latest/getting_started.html ) .
4245
43- % ⚠️ Attention: A nicer rendering of the content below is available at [ our documentation] ( https://xarray-einstats.readthedocs.io/en/latest/ )
44-
45- ### Data for examples
46- The examples in this overview page use the ` DataArray ` s from the ` Dataset ` below
47- (stored as ` ds ` variable) to illustrate ` xarray_einstats ` features:
48-
49- ``` none
50- <xarray.Dataset>
51- Dimensions: (dim_plot: 50, chain: 4, draw: 500, team: 6)
52- Coordinates:
53- * chain (chain) int64 0 1 2 3
54- * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 492 493 494 495 496 497 498 499
55- * team (team) object 'Wales' 'France' 'Ireland' ... 'Italy' 'England'
56- Dimensions without coordinates: dim_plot
57- Data variables:
58- x_plot (dim_plot) float64 0.0 0.2041 0.4082 0.6122 ... 9.592 9.796 10.0
59- atts (chain, draw, team) float64 0.1063 -0.01913 ... -0.2911 0.2029
60- sd_att (draw) float64 0.272 0.2685 0.2593 0.2612 ... 0.4112 0.2117 0.3401
61- ```
62-
63- ### Stats
64- {mod}` xarray_einstats.stats ` provides two wrapper classes {class}` xarray_einstats.stats.XrContinuousRV `
65- and {class}` xarray_einstats.stats.XrDiscreteRV ` that can be used to wrap any distribution
66- in {mod}` scipy.stats ` so they accept {class}` ~xarray.DataArray ` as inputs,
67- and some wrappers for other functions in the ` scipy.stats ` module
68- so you can use ` dims ` (supporting both string and iterable of strings)
69- instead of ` axis ` and keep the labels from the input DataArrays.
70-
71- The distribution wrappers perform broadcasting and alignment of
72- all the inputs automatically.
73- You can evaluate the logpdf using inputs that wouldn't align if using numpy
74- in a couple lines:
75-
76- ``` python
77- norm_dist = xarray_einstats.stats.XrContinuousRV(scipy.stats.norm)
78- # shapes: (50,) (4, 500, 6) (500,)
79- norm_dist.logpdf(ds[" x_plot" ], ds[" atts" ], ds[" sd_att" ])
80- ```
81-
82- which returns:
83-
84- ``` none
85- <xarray.DataArray (dim_plot: 50, chain: 4, draw: 500, team: 6)>
86- array([[[[ 3.06470249e-01, 3.80373065e-01, 2.56575936e-01,
87- ...
88- -4.41658154e+02, -4.57599982e+02, -4.14709280e+02]]]])
89- Coordinates:
90- * chain (chain) int64 0 1 2 3
91- * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 492 493 494 495 496 497 498 499
92- * team (team) object 'Wales' 'France' 'Ireland' ... 'Italy' 'England'
93- Dimensions without coordinates: dim_plot
94- ```
95-
96- More examples available at {ref}` stats_tutorial ` .
97-
98- ### Linear Algebra
99-
100- There is no one size fits all solution, but knowing the function
101- we are wrapping we can easily make the code more concise and clear.
102- Without ` xarray_einstats ` , to invert a batch of matrices stored in a 4d
103- array you have to do:
104-
105- ``` python
106- inv = xarray.apply_ufunc( # output is a 4d labeled array
107- numpy.linalg.inv,
108- batch_of_matrices, # input is a 4d labeled array
109- input_core_dims = [[" matrix_dim" , " matrix_dim_bis" ]],
110- output_core_dims = [[" matrix_dim" , " matrix_dim_bis" ]]
111- )
112- ```
113-
114- to calculate it's norm instead, it becomes:
115-
116- ``` python
117- norm = xarray.apply_ufunc( # output is a 2d labeled array
118- numpy.linalg.norm,
119- batch_of_matrices, # input is a 4d labeled array
120- input_core_dims = [[" matrix_dim" , " matrix_dim_bis" ]],
121- )
122- ```
123-
124- With {mod}` xarray_einstats.linalg ` , those operations become:
125-
126- ``` python
127- inv = xarray_einstats.inv(batch_of_matrices, dim = (" matrix_dim" , " matrix_dim_bis" ))
128- norm = xarray_einstats.norm(batch_of_matrices, dim = (" matrix_dim" , " matrix_dim_bis" ))
129- ```
130-
131- Moreover, if you use some internal conventions to label the dimensions
132- that correspond to matrices, so that they can always be identified
133- if given the list of all dimensions in the input, you can configure
134- ` xarray_einstats ` to follow that convention.
135- Take a look at {func}` ~xarray_einstats.linalg.get_default_dims `
136-
137- And if you still need more reasons for ` xarray_einstats ` , to complement
138- the ` einops ` wrappers, it also provides {func}` xarray_einstats.einsum ` !
139-
140- More examples available, also using ` einsum ` at {ref}` linalg_tutorial ` .
141-
142- ### einops
143- ** repeat wrapper still missing**
144-
145- [ einops] ( https://einops.rocks/ ) uses a convenient notation inspired in
146- Einstein notation to specify operations on multidimensional arrays.
147- It uses spaces as a delimiter between dimensions, parenthesis to
148- indicate splitting or stacking of dimensions and ` -> ` to separate
149- between input and output dim specification.
150-
151- {mod}` xarray_einstats.einops ` uses an adapted notation to take advantage of xarray,
152- where dimensions are already labeled,
153- and adapts to dimension names with spaces or parenthesis in them.
154- It then translates the expression and calls einops via {func}` xarray.apply_ufunc `
155- so you need to have einops installed for the functions in this
156- module to work.
157-
158- ` xarray_einstats ` uses two separate arguments, one for the input pattern (optional) and
159- another for the output pattern. Each is a list of dimensions (strings)
160- or dimension (lists or dictionaries).
161-
162- :::{tip}
163- If you are willing to impose some extra constraints to your dimension names,
164- you can also use the ` raw_ ` einops wrappers, with a syntax more concise and
165- much closer to the einops library.
166- :::
167-
168- ** Combine the chain and draw dimensions**
169-
170- ::::{tab-set}
171- :::{tab-item} rearrange
172- :sync: full
173-
174- We can combine the chain and draw dimensions and name the resulting dimension ` sample `
175- using a list with a single dictionary.
176-
177- ``` python
178- rearrange(ds.atts, [{" sample" : (" chain" , " draw" )}])
179- ```
180- :::
181- :::{tab-item} raw_rearrange
182- :sync: raw
183-
184- As you would do in einops, we indicate we want to combine the chain and draw dimensions
185- by putting the two inside a parenthesis. With ` xarray_einstats ` in addition,
186- you can add an ` =new_name ` to label this combined dimension, otherwise it gets
187- a default name.
188-
189- Moreover, as dimensions are already labeled in the input, we can skip the
190- left side of the expression. If no ` -> ` symbol is present in the pattern,
191- ` xarray_einstats ` generates the left side automatically.
192-
193- ``` python
194- raw_rearrange(ds.atts, " (chain draw)=sample" )
195- ```
196- :::
197- ::::
198-
199- The ` team ` dimension is not present in the pattern and is not modified.
200- As here dimensions are named already in the input object, we don't need
201- ellipsis nor adding dimensions in both input and output to indicate they
202- are left as is. You can see how the team dimension has not been modified
203- in the output below:
204-
205- ``` none
206- <xarray.DataArray 'atts' (team: 6, sample: 2000)>
207- array([[ 0.10632395, 0.1538294 , 0.17806237, ..., 0.16744257,
208- 0.14927569, 0.21803568],
209- ...,
210- [ 0.30447644, 0.22650416, 0.25523419, ..., 0.28405435,
211- 0.29232681, 0.20286656]])
212- Coordinates:
213- * team (team) object 'Wales' 'France' 'Ireland' ... 'Italy' 'England'
214- Dimensions without coordinates: sample
215- ```
216-
217- Note that following xarray convention, new dimensions and dimensions on which we operated
218- are moved to the end. This only matters when you access the underlying array with ` .values `
219- or ` .data ` and you can always transpose using {meth}` xarray.Dataset.transpose ` , but
220- it can matter. You can change the pattern to enforce the output dimension order:
221-
222- ::::{tab-set}
223- :::{tab-item} rearrange
224- :sync: full
225- ``` python
226- rearrange(ds.atts, [{" sample" : (" chain" , " draw" )}, " team" ])
227- ```
228- :::
229- :::{tab-item} raw_rearrange
230- :sync: raw
231- ``` python
232- raw_rearrange(ds.atts, " (chain draw)=sample team" )
233- ```
234- :::
235- ::::
236-
237- Out:
238-
239- ``` none
240- <xarray.DataArray 'atts' (sample: 2000, team: 6)>
241- array([[ 0.10632395, -0.01912607, 0.13671159, -0.06754783, -0.46083807,
242- 0.30447644],
243- ...,
244- [ 0.21803568, -0.11394285, 0.09447937, -0.11032643, -0.29111234,
245- 0.20286656]])
246- Coordinates:
247- * team (team) object 'Wales' 'France' 'Ireland' ... 'Italy' 'England'
248- Dimensions without coordinates: sample
249- ```
250-
251- ** Decompose and combine two dimensions in a different order**
252-
253- Now to a more complicated pattern. We will split the chain and team dimension,
254- then combine those split dimensions between them.
255-
256- ::::{tab-set}
257- :::{tab-item} rearrange
258- :sync: full
259-
260- Use a list of dictionaries to choose which dimensions to decompose,
261- note that lists with dimensions to decompose are not valid, you
262- _ need_ to indicate which dimension is the one to be decomposed.
263-
264- ``` python
265- rearrange(
266- ds.atts,
267- in_dims = [{" chain" : (" chain1" , " chain2" )}, {" team" : (" team1" , " team2" )}],
268- # combine split chain and team dims between them
269- # here we don't use a dict so the new dimensions get a default name
270- out_dims = [(" chain1" , " team1" ), (" team2" , " chain2" )],
271- # set the lengths of split dimensions as kwargs
272- chain1 = 2 , chain2 = 2 , team1 = 2 , team2 = 3
273- )
274- ```
275- :::
276- :::{tab-item} raw_rearrange
277- :sync: raw
278-
279- We use ` ()= ` on the left side because we _ need_ to indicate which dimensions
280- to decompose, but we can skip it if we want on the right side and ` xarray_einstats `
281- uses a default name for them.
282- ``` python
283- raw_rearrange(
284- ds.atts,
285- " (chain1 chain2)=chain (team1 team2)=team -> (chain1 team1) (team2 chain2)" ,
286- # set the lengths of split dimensions as kwargs
287- chain1 = 2 , chain2 = 2 , team1 = 2 , team2 = 3
288- )
289- ```
290- :::
291- ::::
292-
293- Out:
294-
295- ``` none
296- <xarray.DataArray 'atts' (draw: 500, chain1,team1: 4, team2,chain2: 6)>
297- array([[[ 1.06323952e-01, 2.47005252e-01, -1.91260714e-02,
298- -2.55769582e-02, 1.36711590e-01, 1.23165119e-01],
299- ...
300- [-2.76616968e-02, -1.10326428e-01, -3.99582340e-01,
301- -2.91112341e-01, 1.90714405e-01, 2.02866563e-01]]])
302- Coordinates:
303- * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 492 493 494 495 496 497 498 499
304- Dimensions without coordinates: chain1,team1, team2,chain2
305- ```
306-
307- More einops examples with both ` rearrange ` and ` reduce ` at {ref}` einops_tutorial `
46+ ## Contributing
47+ xarray-einstats is in active development and all types of contributions are welcome!
48+ See the [ contributing guide] ( https://einstats.python.arviz.org/en/latest/contributing/overview.html ) for details on how to contribute.
30849
309- ### Other features
310- ` xarray_einstats ` also includes some functions that are not direct wrappers of other
311- libraries. {func}` ~xarray_einstats.numba.histogram ` for example combines numba,
312- numpy and xarray to provide a vectorized version of ` numpy.histogram ` that works
313- on DataArrays.
50+ ## Relevant links
51+ * Documentation: https://einstats.python.arviz.org/en/latest/
52+ * Contributing guide: https://einstats.python.arviz.org/en/latest/contributing/overview.html
53+ * ArviZ project website: https://www.arviz.org
31454
31555## Similar projects
31656Here we list some similar projects we know of. Note that all of
0 commit comments