Skip to content

Commit 477013e

Browse files
committed
Strip pandas material
TODO: remove from stats files
1 parent 65b578c commit 477013e

File tree

15 files changed

+170
-636
lines changed

15 files changed

+170
-636
lines changed

config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ episodes:
6565
- 03-interacting-with-tests.Rmd
6666
- 04-unit-tests-best-practices.Rmd
6767
- 05-testing-exceptions.Rmd
68-
- 06-testing-data-structures.Rmd
68+
- 06-floating-point-data.Rmd
6969
- 07-fixtures.Rmd
7070
- 08-parametrization.Rmd
7171
- 09-testing-output-files.Rmd

episodes/06-floating-point-data.Rmd

Lines changed: 29 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,14 @@ exercises: 5
77
:::::::::::::::::::::::::::::::::::::: questions
88

99
- What are the best practices when working with floating point data?
10-
- How do you compare objects in libraries like `pandas` and `numpy`?
10+
- How do you compare objects in libraries like `numpy`?
1111

1212
::::::::::::::::::::::::::::::::::::::::::::::::
1313

1414
::::::::::::::::::::::::::::::::::::: objectives
1515

1616
- Learn how to test floating point data with tolerances.
17-
- Learn how to compare objects in libraries like `pandas` and `numpy`.
17+
- Learn how to compare objects in libraries like `numpy`.
1818

1919
::::::::::::::::::::::::::::::::::::::::::::::::
2020

@@ -40,8 +40,9 @@ but it's possible that this test could erroneously fail in future for reasons
4040
outside our control. This lesson will teach best practices for handling this
4141
type of data.
4242

43-
Libraries like `numpy` and `pandas` are commonly used to interact with large quantities
44-
of floating point numbers, and they provide special functions to assist with testing.
43+
Libraries like NumPy, SciPy, and Pandas are commonly used to interact
44+
with large quantities of floating point numbers. NumPy provides special
45+
functions to assist with testing.
4546

4647
### Relative and Absolute Tolerances
4748

@@ -129,6 +130,8 @@ be very different!
129130

130131
:::::::::::::::::::::::::::::::::
131132

133+
:::::::::::::::::::::::::::::::::::::::::::::::
134+
132135
The built-in function `math.isclose` can be used to simplify these checks:
133136

134137
```python
@@ -164,16 +167,18 @@ def test_estimate_pi():
164167

165168
:::::::::::::::::::::::::::::::::
166169

170+
:::::::::::::::::::::::::::::::::::::::::::::::
171+
167172
### NumPy
168173

169174
NumPy is a common library used in research. Instead of the usual `assert a ==
170175
b`, NumPy has its own testing functions that are more suitable for comparing
171176
NumPy arrays. These functions are the ones you are most likely to use:
172177

173-
- `numpy.testing.assert_array_equal` is used to compare two NumPy arrays or array-like objects (such as list, tuples, etc).
174-
- `numpy.testing.assert_allclose` is used to compare two NumPy arrays or array-like objects with a tolerance for floating point numbers.
175-
176-
These may also be used on individual floating point numbers if you choose.
178+
- `numpy.testing.assert_array_equal` is used to compare two NumPy arrays for
179+
equality -- best used for integer data.
180+
- `numpy.testing.assert_allclose` is used to compare two NumPy arrays with a
181+
tolerance for floating point numbers.
177182

178183
Here are some examples of how to use these functions:
179184

@@ -206,59 +211,23 @@ def test_numpy_arrays_with_tolerance():
206211
np.testing.assert_allclose(array1, array2, atol=1e-3)
207212
```
208213

209-
::::::::::::::::::::::::::::::::::::: callout
210-
211-
### Data structures with numpy arrays
212-
213-
When you have data structures that contain numpy arrays, such as lists or dictionaries, you cannot use `==` to compare them.
214-
Instead, you can use `numpy.testing.assert_equal` to compare the data structures.
215-
216-
```python
217-
def test_dictionaries_with_numpy_arrays():
218-
"""Test that dictionaries with numpy arrays are equal"""
219-
# Create two dictionaries with numpy arrays
220-
dict1 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
221-
dict2 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
222-
# Check that the dictionaries are equal
223-
np.testing.assert_equal(dict1, dict2)
224-
```
225-
226-
::::::::::::::::::::::::::::::::::::::::::::::::
214+
The NumPy testing functions can be used on anything NumPy considers to be 'array-like'.
215+
This includes lists, tuples, and even individual floating point numbers if you choose.
216+
They can also be used for other objects in the scientific Python ecosystem, such
217+
as Pandas Series/DataFrames.
227218

219+
:::::::::::::::::::::::: callout
228220

229-
### pandas
221+
The Pandas library also provides its own testing functions:
230222

231-
Pandas is another common library used in research for storing and manipulating datasets.
232-
Pandas has its own testing functions that are more suitable for comparing Pandas objects.
233-
These two functions are the ones you are most likely to use:
234-
- `pandas.testing.assert_frame_equal` is used to compare two Pandas DataFrames.
235-
- `pandas.testing.assert_series_equal` is used to compare two Pandas Series.
223+
- `pandas.testing.assert_frame_equal`
224+
- `pandas.testing.assert_series_equal`
236225

226+
These functions can also take `rtol` and `atol` arguments, so can fulfill the
227+
role of both `numpy.testing.assert_array_equal` and
228+
`numpy.testing.assert_allclose`.
237229

238-
Here are some examples of how to use these functions:
239-
240-
```python
241-
242-
def test_pandas_dataframes():
243-
"""Test that pandas DataFrames are equal"""
244-
# Create two pandas DataFrames
245-
df1 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
246-
df2 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
247-
# Check that the DataFrames are equal
248-
pd.testing.assert_frame_equal(df1, df2)
249-
250-
def test_pandas_series():
251-
"""Test that pandas Series are equal"""
252-
# Create two pandas Series
253-
s1 = pd.Series([1, 2, 3])
254-
s2 = pd.Series([1, 2, 3])
255-
# Check that the Series are equal
256-
pd.testing.assert_series_equal(s1, s2)
257-
```
258-
259-
There is no equivalent to `np.assert_allclose` in Pandas. If you need to compare DataFrames
260-
or Series containing floating point data, it is recommended to use the `np.testing` functions directly
261-
on the Pandas objects.
230+
::::::::::::::::::::::::::::::::
262231

263232

264233
::::::::::::::::::::::::::::::::::::: challenge
@@ -299,79 +268,15 @@ def test_calculate_cumulative_sum():
299268

300269
:::::::::::::::::::::::::::::::::
301270

302-
### Checking if Pandas DataFrames are equal
303-
304-
In `statistics/stats.py` add this function to calculate the average score of each player in a Pandas DataFrame:
305-
306-
```python
307-
import pandas as pd
308-
309-
def calculate_player_average_scores(df: pd.DataFrame) -> pd.DataFrame:
310-
"""Calculate the average score of each player in a pandas DataFrame.
311-
312-
Example input:
313-
| | player | score_1 | score_2 |
314-
|---|---------|---------|---------|
315-
| 0 | Alice | 1 | 2 |
316-
| 1 | Bob | 3 | 4 |
317-
318-
Example output:
319-
| | player | score_1 | score_2 | average_score |
320-
|---|---------|---------|---------|---------------|
321-
| 0 | Alice | 1 | 2 | 1.5 |
322-
| 1 | Bob | 3 | 4 | 3.5 |
323-
"""
324-
325-
df["average_score"] = df[["score_1", "score_2"]].mean(axis=1)
326-
327-
return df
328-
```
329-
330-
Then write a test for this function by comparing Pandas DataFrames.
331-
332-
Hint: You can create a dataframe like this:
333-
334-
```python
335-
df = pd.DataFrame({
336-
"player": ["Alice", "Bob"],
337-
"score_1": [1, 3],
338-
"score_2": [2, 4]
339-
})
340-
```
341-
342-
:::::::::::::::::::::::: solution
343-
344-
```python
345-
import pandas as pd
346-
from stats import calculate_player_average_scores
347-
348-
def test_calculate_player_average_scores():
349-
"""Test calculate_player_average_scores function"""
350-
df = pd.DataFrame({
351-
"player": ["Alice", "Bob"],
352-
"score_1": [1, 3],
353-
"score_2": [2, 4]
354-
})
355-
expected_result = pd.DataFrame({
356-
"player": ["Alice", "Bob"],
357-
"score_1": [1, 3],
358-
"score_2": [2, 4],
359-
"average_score": [1.5, 3.5]
360-
})
361-
pd.testing.assert_frame_equal(calculate_player_average_scores(df), expected_result)
362-
```
363-
364-
:::::::::::::::::::::::::::::::::
365-
366-
367271
::::::::::::::::::::::::::::::::::::::::::::::::
368272

369273

370274
::::::::::::::::::::::::::::::::::::: keypoints
371275

372-
- When comparing floating point data, you should use relative/absolute tolerances instead of testing for equality.
373-
- Numpy arrays cannot be compared using the `==` operator. Instead, use `numpy.testing.assert_array_equal` and `numpy.testing.assert_allclose`.
374-
- Pandas DataFrames and Series should be compared using `pandas.testing.assert_frame_equal` and `pandas.testing.assert_series_equal`.
276+
- When comparing floating point data, you should use relative/absolute
277+
tolerances instead of testing for equality.
278+
- Numpy arrays cannot be compared using the `==` operator. Instead, use
279+
`numpy.testing.assert_array_equal` and `numpy.testing.assert_allclose`.
375280

376281
::::::::::::::::::::::::::::::::::::::::::::::::
377282

learners/files/06-floating-point-data/data_structures.py

Lines changed: 0 additions & 2 deletions
This file was deleted.

learners/files/06-floating-point-data/test_data_structures.py

Lines changed: 0 additions & 123 deletions
This file was deleted.
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
import numpy as np
2+
3+
def test_numpy_arrays():
4+
"""Test that numpy arrays are equal"""
5+
# Create two numpy arrays
6+
array1 = np.array([1, 2, 3])
7+
array2 = np.array([1, 2, 3])
8+
# Check that the arrays are equal
9+
np.testing.assert_array_equal(array1, array2)
10+
11+
12+
def test_2d_numpy_arrays():
13+
"""Test that 2d numpy arrays are equal"""
14+
# Create two 2d numpy arrays
15+
array1 = np.array([[1, 2], [3, 4]])
16+
array2 = np.array([[1, 2], [3, 4]])
17+
# Check that the nested arrays are equal
18+
np.testing.assert_array_equal(array1, array2)
19+
20+
21+
def test_numpy_arrays_with_tolerance():
22+
"""Test that numpy arrays are equal with tolerance"""
23+
# Create two numpy arrays
24+
array1 = np.array([1.0, 2.0, 3.0])
25+
array2 = np.array([1.00009, 2.0005, 3.0001])
26+
# Check that the arrays are equal with tolerance
27+
np.testing.assert_allclose(array1, array2, atol=1e-3)

learners/files/07-fixtures/data_structures.py

Lines changed: 0 additions & 2 deletions
This file was deleted.

0 commit comments

Comments
 (0)