Skip to content

Commit 94bb5ba

Browse files
mtsokolkgryte
andauthored
docs: add migration guides and tutorial
PR-URL: #999 Co-authored-by: Athan Reines <kgryte@gmail.com> Reviewed-by: Athan Reines <kgryte@gmail.com> Reviewed-by: Evgeni Burovski
1 parent 017e9cf commit 94bb5ba

3 files changed

Lines changed: 401 additions & 0 deletions

File tree

spec/draft/index.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,13 @@ Contents
3030
verification_test_suite
3131
benchmark_suite
3232

33+
.. toctree::
34+
:caption: Guides and Tutorials
35+
:maxdepth: 1
36+
37+
migration_guide
38+
tutorial_basic
39+
3340
.. toctree::
3441
:caption: Other
3542
:maxdepth: 1

spec/draft/migration_guide.md

Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
(migration-guide)=
2+
3+
# Migration Guide
4+
5+
This page is meant to help migrate your codebase to an Array API compliant
6+
implementation. The guide is divided into two parts and, depending on your
7+
exact use-case, you should look thoroughly into at least one of them.
8+
9+
The first part is dedicated for {ref}`array-producers`. If your library
10+
mimics, for example, NumPy's or Dask's functionality, then you can find in
11+
the first part additional instructions and guidance on how to ensure
12+
downstream users can easily pick your solution as an array provider for
13+
their system/algorithm.
14+
15+
The second part delves into details for Array API compatibility for
16+
{ref}`array-consumers`. This pertains to any software that performs
17+
multidimensional array manipulation in Python, such as may be found in
18+
scikit-learn, SciPy, or statsmodels. If your software relies on a certain
19+
array producing library, such as NumPy or JAX, then you can use the second
20+
part to learn how to make it library agnostic and interchange array
21+
namespaces with significantly less friction.
22+
23+
## Ecosystem
24+
25+
Apart from the documented standard, the Array API ecosystem also provides
26+
a set of tools and packages to help you with the migration process:
27+
28+
29+
(array-api-compat)=
30+
31+
### Array API Compat
32+
33+
GitHub: [array-api-compat](https://github.com/data-apis/array-api-compat)
34+
35+
User group: Array Consumers
36+
37+
Although NumPy, Dask, CuPy, and PyTorch support the Array API Standard, there
38+
are still some corner cases where their behavior diverges from the standard.
39+
`array-api-compat` provides a compatibility layer to cover these cases.
40+
This is also accompanied by a few utility functions for easier introspection
41+
into array objects. As an array consumer, you can still rely on the original
42+
API while having access to the standard compatible one.
43+
44+
45+
(array-api-strict)=
46+
47+
### Array API Strict
48+
49+
GitHub: [array-api-strict](https://github.com/data-apis/array-api-strict)
50+
51+
User group: Array Consumers, Array Producers (for testing)
52+
53+
`array-api-strict` is a library that provides a strict and minimal
54+
implementation of the Array API Standard. For array producers, it is designed
55+
to be used as a reference implementation for testing and development purposes.
56+
You can compare your API calls with `array-api-strict` counterparts and
57+
ensure that your library is fully compliant with the standard and can
58+
serve as a reliable reference for other developers in the ecosystem.
59+
For consumers, you can use `array-api-strict` during the development as an
60+
array provider to ensure your code uses APIs compliant with the standard.
61+
62+
63+
(array-api-tests)=
64+
65+
### Array API Test
66+
67+
GitHub: [array-api-tests](https://github.com/data-apis/array-api-tests)
68+
69+
User group: Array Producers
70+
71+
`array-api-tests` is a collection of tests that can be used to verify the
72+
compliance of your library with the Array API Standard. It includes tests
73+
for array producers, covering a wide range of functionalities and use cases.
74+
By running these tests, you can ensure that your library adheres to the
75+
standard and can be used with compatible array consumer libraries.
76+
77+
78+
(array-api-extra)=
79+
80+
### Array API Extra
81+
82+
GitHub: [array-api-extra](https://github.com/data-apis/array-api-extra)
83+
84+
User group: Array Consumers
85+
86+
`array-api-extra` is a collection of additional utilities and tools that are
87+
missing from the Array API Standard but can be useful for compliant array
88+
consumers. It includes additional array manipulation and statistical functions.
89+
It is already used by SciPy and scikit-learn.
90+
91+
The sections below mention when and how to use them.
92+
93+
94+
(array-producers)=
95+
96+
## Array Producers
97+
98+
For array producers, the central task during the development/migration process
99+
is ensuring that the user-facing API adheres to the Array API Standard.
100+
101+
The complete API of the standard is documented in the
102+
[API specification](https://data-apis.org/array-api/latest/API_specification/index.html).
103+
104+
There, each function, constant, and object is described with details
105+
on parameters, return values, and special cases.
106+
107+
### Testing against Array API
108+
109+
There are two main ways to test your API for compliance: either using
110+
`array-api-tests` suite or testing your API manually against the
111+
`array-api-strict` reference implementation.
112+
113+
#### Array API Test suite (Recommended)
114+
115+
{ref}`array-api-tests` is a test suite which verifies that your API
116+
adheres to the standard. For each function or method, it confirms
117+
it's importable, verifies the signature, generates multiple test
118+
cases with the [hypothesis](https://hypothesis.readthedocs.io/en/latest/)
119+
package, and runs assertions on the outputs.
120+
121+
The setup details are enclosed in the GitHub repository, so here we
122+
cover only the minimal workflow:
123+
124+
1. Install your package (e.g., in editable mode).
125+
2. Clone `array-api-tests`, and set the `ARRAY_API_TESTS_MODULE` environment
126+
variable to your package import name.
127+
3. Inside the `array-api-tests` directory run the command for running pytest: `pytest`. There are
128+
multiple useful options delivered by the test suite. A few worth mentioning:
129+
- `--max-examples=1000` - maximal number of test cases to generate when using
130+
hypothesis. This allows you to balance between execution time of the test
131+
suite and thoroughness of the testing. It's advised to use as many examples
132+
as the time buget can fit. Each test case is a random combination of
133+
possible inputs: the more cases, the higher chance of finding an
134+
unsupported edge case.
135+
- With the `--xfails-file` option, you can describe which tests are expected
136+
to fail. It's impossible to get the whole API perfectly implemented on a
137+
first try, so tracking what still fails gives you more control over the
138+
state of your API.
139+
- `-o xfail_strict=<bool>` is often used with the previous option. If a test
140+
expected to fail actually passes (`XPASS`), then you can decide whether
141+
to ignore that fact or raise it as an error.
142+
- `--skips-file` for skipping tests. At times, some failing tests might stall
143+
the execution time of the test suite. In that case, the most convenient
144+
option is to skip these for the time being.
145+
146+
We strongly advise you to embed this setup in your CI as well. This will allow
147+
you to continuously monitor Array API coverage, and make sure new changes don't break existing
148+
APIs. As a reference, see [NumPy's Array API Tests CI setup](https://github.com/numpy/numpy/blob/581d10f43b539a189a2d37856e5130464de9e5f6/.github/workflows/linux.yml#L296).
149+
150+
151+
#### Array API Strict
152+
153+
A simpler, and more manual, way of testing Array API coverage is to
154+
run your API calls along with the {ref}`array-api-strict` Python implementation.
155+
156+
This way, you can ensure that the outputs coming from your API match the minimal
157+
reference implementation. Bear in mind, however, that you need to write
158+
the tests cases yourself, so you need to also take into account any applicable edge
159+
cases.
160+
161+
162+
(array-consumers)=
163+
164+
## Array Consumers
165+
166+
For array consumers, the main premise is to keep in mind that your **array
167+
manipulation operations should not lock in for a particular array producing
168+
library**. For instance, if you use NumPy for arrays, then your code could
169+
contain:
170+
171+
```python
172+
import numpy as np
173+
174+
# ...
175+
b = np.full(shape, val, dtype=dtype) @ a
176+
c = np.mean(a, axis=0)
177+
return np.dot(c, b)
178+
```
179+
180+
The first step should be as simple as assigning the `np` namespace to a dedicated
181+
namespace variable. The convention used in the ecosystem is to name it `xp`. Then,
182+
it is vital to ensure that each method and function call is something that the Array API
183+
supports. For example, `dot` is present in the NumPy's API, but the standard
184+
doesn't support it. For the sake of simplicity, let's assume both `c` and `b`
185+
are `ndim=2`; therefore, we select `tensordot` instead, as both NumPy and the
186+
standard define it:
187+
188+
```python
189+
import numpy as np
190+
191+
xp = np
192+
193+
# ...
194+
b = xp.full(shape, val, dtype=dtype) @ a
195+
c = xp.mean(a, axis=0)
196+
return xp.tensordot(c, b, axes=1)
197+
```
198+
199+
At this point, replacing one backend with another one should only require providing a different
200+
namespace, such as `xp = torch` (e.g., via an environment variable). This can be useful
201+
if you're writing a script or in your custom software. The other alternatives are:
202+
203+
- If you are building a library where the backend is determined by input arrays,
204+
and your function accepts array arguments, then a recommended way is to ask
205+
your input arrays for a namespace to use: `xp = arr.__array_namespace__()`.
206+
If the given library doesn't have it, then [`array_api_compat.array_namespace()`](https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.array_namespace)
207+
should be used instead:
208+
```python
209+
def func(array1, scalar1, scalar2):
210+
xp = array1.__array_namespace__() # or array_namespace(array1)
211+
return xp.arange(scalar1, scalar2) @ array1
212+
```
213+
- For a function that accepts scalars and returns arrays, use namespace `xp` as
214+
a parameter in the signature. Enforcing objects to have the same type as the
215+
provided backend can then be achieved with `arg1 = xp.asarray(arg1)` for each input:
216+
```python
217+
def func(s1, s2, xp):
218+
return xp.arange(s1, s2)
219+
```
220+
221+
If you're relying on NumPy, CuPy, PyTorch, Dask, or JAX then
222+
{ref}`array-api-compat` can come in handy for the transition. The compat layer
223+
allows you to still rely on your preferred array producing library, while
224+
making sure you're already using standard compatible API. Additionally, it
225+
offers a set of useful utility functions, such as:
226+
227+
- [array_namespace()](https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.array_namespace)
228+
for fetching the namespace based on input arrays.
229+
- [is_array_api_obj()](https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.is_array_api_obj)
230+
for inspecting whether a given object is Array API compatible.
231+
- [device()](https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.device)
232+
for retrieving the device on which an array resides.
233+
234+
For now, the migration from a specific library (e.g., NumPy) to a standard
235+
compatible setup requires a manual intervention for each failing API call,
236+
but, in the future, we're hoping to provide tools for automating the migration process.

0 commit comments

Comments
 (0)