Documentation and typing of array types, dtypes, and dimensionality/shape

TLDR: Especially as libraries begin to support alternative backends vis the Array API, it might be useful to have a standard format documenting the type, dtype, and dimensionality/shape of array inputs and corresponding outputs (and a way of adding typing information to the code). I thought the summit might be a good place to discuss the topic potentially prepare a SPEC.

---

In SciPy, at least, the term "array-like" (without qualification) is commonly used to document the parameter type of functions. I'll give an example from [`stats.chatterjeexi`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chatterjeexi.html#scipy.stats.chatterjeexi), which is about on par with other `scipy.stats` functions in terms of detail.

<img width="928" alt="Image" src="https://github.com/user-attachments/assets/56e74b64-4e2a-4d37-8fc4-07e67a8f7e9e" />

"array-like", used throughout SciPy, is far too broad. I don't think it is defined anywhere in our documentation. The most "official" definition of "array_like" I can find is from the NumPy glossary:

<img width="1008" alt="Image" src="https://github.com/user-attachments/assets/d9cd8a4a-a449-48b5-8270-27094d180d13" />

Clearly "Any argument accepted by numpy.array is array_like." is not a useful working definition, as almost any Python object is accepted by `np.array` and coerced to an object array For example, the module `numpy` is array_like, according to this definition.
```python3
import numpy as np
np.array(np)
# array(<module 'numpy' from '/Users/matthaberland/miniforge3/envs/scipy-dev/lib/python3.13/site-packages/numpy/__init__.py'>,
#      dtype=object)
```

As for the object _type_, what_ we really mean these days (for many functions in `stats`, for example) is that the type should be one of the following:
- A Python list, which will be converted to a NumPy array (but this may change in SciPy 2.0)
- An Array-API compatible array subject to limitations that appear in a table in the documentation

For example, a table given in the [`ttest_ind` documentation](https://scipy.github.io/devdocs/reference/generated/scipy.stats.ttest_ind.html#scipy.stats.ttest_ind) looks like:

<img width="785" alt="Image" src="https://github.com/user-attachments/assets/5667941c-4069-4721-b5c2-00c590bfdeae" />

This does not mention two other important pieces of information, which are often omitted: dtype and shape. For dtype, sometimes "real floating" or similar is specified, but many `scipy.stats` functions assume the reader understands that the input must be real. Many elementwise and reducing functions are pretty flexible about allowed input shape, but even in that case, it would be useful to have a standard way of specifying the relationship between input and output shapes.

We have similar issues with the documentation of return values. Instead of "array-like", we often see `float` or "scalar or ndarray", which doesn't capture the full story.

I think we need:
- a term that means "array type from a backend that complies with the Python array API standard". (The array API standard docs just use "array", which might be fine, but I can see arguments that it will be confused with an informal, array-like definition.)
- a standard format for documenting allowed dtypes.
- a standard format for documenting the most shape requirements. 
- a standard format for the documenting the most common relationships between input and output type, dtype, and shape
 
The last part is probably the most complex. For instance, in SciPy, it is common to have:
- Elementwise functions, which typically accept array-API compatible arrays of any numerical dtype and shape. Usually:
  - Output type = input type, although sometimes the output may technically be a different type from the same backend that is more or less compatible with other arrays from that backend. (For example, `scipy.special` functions may accept a 0-d NumPy array and produce a NumPy scalar. Whether this is OK is debatable - see https://github.com/scientific-python/summit-2025/issues/38 - but there may be other backends which use multiple types to implement the overall array protocol.)
  - The dtype of the output matches the `result_type` of the input(s) with exceptions (e.g. integers may be promoted to floating point)
  - The shape of the output matches the shape of the input.
- Reducing functions, which follow similar rules as above, but reduce along the axis (or axes) specified by an `axis` argument; the output shape matches that of the input but eliminates these dimensions
- Generalized ufuncs like those in `scipy.linalg`, which follow similar rules as above, which preserve the "batch shape" between and outuput but may may have complicated input -> output "core dimension" relationships, which could be specified in terms of shapes like `(m,n),(n,p)->(m,p)` (but often aren't!)

I'd suggest that we can provide some common language for cases like these, which libraries can adapt to their needs. We would also suggest a way to link to more information within a libary's documentation, since it is very common for input/output rules to be complicated but at least consistent within a certain set of functions. For instance, essentially all [`scipy.linalg`](https://docs.scipy.org/doc/scipy/reference/linalg.html) functions now have a standard note that links to a tutorial about [batch operations](https://scipy.github.io/devdocs/tutorial/linalg_batch.html#linalg-batch).

<img width="949" alt="Image" src="https://github.com/user-attachments/assets/93efec63-5024-441b-a30f-cb55c4792b22" />

I think this is a lot more compact/readable (and TBH, more useful) than to spelll out *all* the rules in the documentation of *every* function. This might be a decent pattern to follow for documenting the relationship between input and output shapes and dtypes (e.g. give information for a representative case in the documentation, and link to a full set of common rules).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Documentation and typing of array types, dtypes, and dimensionality/shape #41

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Documentation and typing of array types, dtypes, and dimensionality/shape #41

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions