@@ -7,14 +7,14 @@ exercises: 5
77:::::::::::::::::::::::::::::::::::::: questions
88
99- What are the best practices when working with floating point data?
10- - How do you compare objects in libraries like ` pandas ` and ` numpy ` ?
10+ - How do you compare objects in libraries like ` numpy ` ?
1111
1212::::::::::::::::::::::::::::::::::::::::::::::::
1313
1414::::::::::::::::::::::::::::::::::::: objectives
1515
1616- Learn how to test floating point data with tolerances.
17- - Learn how to compare objects in libraries like ` pandas ` and ` numpy ` .
17+ - Learn how to compare objects in libraries like ` numpy ` .
1818
1919::::::::::::::::::::::::::::::::::::::::::::::::
2020
@@ -40,8 +40,9 @@ but it's possible that this test could erroneously fail in future for reasons
4040outside our control. This lesson will teach best practices for handling this
4141type of data.
4242
43- Libraries like ` numpy ` and ` pandas ` are commonly used to interact with large quantities
44- of floating point numbers, and they provide special functions to assist with testing.
43+ Libraries like NumPy, SciPy, and Pandas are commonly used to interact
44+ with large quantities of floating point numbers. NumPy provides special
45+ functions to assist with testing.
4546
4647### Relative and Absolute Tolerances
4748
@@ -129,6 +130,8 @@ be very different!
129130
130131:::::::::::::::::::::::::::::::::
131132
133+ :::::::::::::::::::::::::::::::::::::::::::::::
134+
132135The built-in function ` math.isclose ` can be used to simplify these checks:
133136
134137``` python
@@ -164,16 +167,18 @@ def test_estimate_pi():
164167
165168:::::::::::::::::::::::::::::::::
166169
170+ :::::::::::::::::::::::::::::::::::::::::::::::
171+
167172### NumPy
168173
169174NumPy is a common library used in research. Instead of the usual `assert a ==
170175b`, NumPy has its own testing functions that are more suitable for comparing
171176NumPy arrays. These functions are the ones you are most likely to use:
172177
173- - ` numpy.testing.assert_array_equal ` is used to compare two NumPy arrays or array-like objects (such as list, tuples, etc).
174- - ` numpy.testing.assert_allclose ` is used to compare two NumPy arrays or array-like objects with a tolerance for floating point numbers .
175-
176- These may also be used on individual floating point numbers if you choose .
178+ - ` numpy.testing.assert_array_equal ` is used to compare two NumPy arrays for
179+ equality -- best used for integer data .
180+ - ` numpy.testing.assert_allclose ` is used to compare two NumPy arrays with a
181+ tolerance for floating point numbers.
177182
178183Here are some examples of how to use these functions:
179184
@@ -206,59 +211,23 @@ def test_numpy_arrays_with_tolerance():
206211 np.testing.assert_allclose(array1, array2, atol = 1e-3 )
207212```
208213
209- ::::::::::::::::::::::::::::::::::::: callout
210-
211- ### Data structures with numpy arrays
212-
213- When you have data structures that contain numpy arrays, such as lists or dictionaries, you cannot use ` == ` to compare them.
214- Instead, you can use ` numpy.testing.assert_equal ` to compare the data structures.
215-
216- ``` python
217- def test_dictionaries_with_numpy_arrays ():
218- """ Test that dictionaries with numpy arrays are equal"""
219- # Create two dictionaries with numpy arrays
220- dict1 = {" a" : np.array([1 , 2 , 3 ]), " b" : np.array([4 , 5 , 6 ])}
221- dict2 = {" a" : np.array([1 , 2 , 3 ]), " b" : np.array([4 , 5 , 6 ])}
222- # Check that the dictionaries are equal
223- np.testing.assert_equal(dict1, dict2)
224- ```
225-
226- ::::::::::::::::::::::::::::::::::::::::::::::::
214+ The NumPy testing functions can be used on anything NumPy considers to be 'array-like'.
215+ This includes lists, tuples, and even individual floating point numbers if you choose.
216+ They can also be used for other objects in the scientific Python ecosystem, such
217+ as Pandas Series/DataFrames.
227218
219+ :::::::::::::::::::::::: callout
228220
229- ### pandas
221+ The Pandas library also provides its own testing functions:
230222
231- Pandas is another common library used in research for storing and manipulating datasets.
232- Pandas has its own testing functions that are more suitable for comparing Pandas objects.
233- These two functions are the ones you are most likely to use:
234- - ` pandas.testing.assert_frame_equal ` is used to compare two Pandas DataFrames.
235- - ` pandas.testing.assert_series_equal ` is used to compare two Pandas Series.
223+ - ` pandas.testing.assert_frame_equal `
224+ - ` pandas.testing.assert_series_equal `
236225
226+ These functions can also take ` rtol ` and ` atol ` arguments, so can fulfill the
227+ role of both ` numpy.testing.assert_array_equal ` and
228+ ` numpy.testing.assert_allclose ` .
237229
238- Here are some examples of how to use these functions:
239-
240- ``` python
241-
242- def test_pandas_dataframes ():
243- """ Test that pandas DataFrames are equal"""
244- # Create two pandas DataFrames
245- df1 = pd.DataFrame({" A" : [1 , 2 , 3 ], " B" : [4 , 5 , 6 ]})
246- df2 = pd.DataFrame({" A" : [1 , 2 , 3 ], " B" : [4 , 5 , 6 ]})
247- # Check that the DataFrames are equal
248- pd.testing.assert_frame_equal(df1, df2)
249-
250- def test_pandas_series ():
251- """ Test that pandas Series are equal"""
252- # Create two pandas Series
253- s1 = pd.Series([1 , 2 , 3 ])
254- s2 = pd.Series([1 , 2 , 3 ])
255- # Check that the Series are equal
256- pd.testing.assert_series_equal(s1, s2)
257- ```
258-
259- There is no equivalent to ` np.assert_allclose ` in Pandas. If you need to compare DataFrames
260- or Series containing floating point data, it is recommended to use the ` np.testing ` functions directly
261- on the Pandas objects.
230+ ::::::::::::::::::::::::::::::::
262231
263232
264233::::::::::::::::::::::::::::::::::::: challenge
@@ -299,79 +268,15 @@ def test_calculate_cumulative_sum():
299268
300269:::::::::::::::::::::::::::::::::
301270
302- ### Checking if Pandas DataFrames are equal
303-
304- In ` statistics/stats.py ` add this function to calculate the average score of each player in a Pandas DataFrame:
305-
306- ``` python
307- import pandas as pd
308-
309- def calculate_player_average_scores (df : pd.DataFrame) -> pd.DataFrame:
310- """ Calculate the average score of each player in a pandas DataFrame.
311-
312- Example input:
313- | | player | score_1 | score_2 |
314- |---|---------|---------|---------|
315- | 0 | Alice | 1 | 2 |
316- | 1 | Bob | 3 | 4 |
317-
318- Example output:
319- | | player | score_1 | score_2 | average_score |
320- |---|---------|---------|---------|---------------|
321- | 0 | Alice | 1 | 2 | 1.5 |
322- | 1 | Bob | 3 | 4 | 3.5 |
323- """
324-
325- df[" average_score" ] = df[[" score_1" , " score_2" ]].mean(axis = 1 )
326-
327- return df
328- ```
329-
330- Then write a test for this function by comparing Pandas DataFrames.
331-
332- Hint: You can create a dataframe like this:
333-
334- ``` python
335- df = pd.DataFrame({
336- " player" : [" Alice" , " Bob" ],
337- " score_1" : [1 , 3 ],
338- " score_2" : [2 , 4 ]
339- })
340- ```
341-
342- :::::::::::::::::::::::: solution
343-
344- ``` python
345- import pandas as pd
346- from stats import calculate_player_average_scores
347-
348- def test_calculate_player_average_scores ():
349- """ Test calculate_player_average_scores function"""
350- df = pd.DataFrame({
351- " player" : [" Alice" , " Bob" ],
352- " score_1" : [1 , 3 ],
353- " score_2" : [2 , 4 ]
354- })
355- expected_result = pd.DataFrame({
356- " player" : [" Alice" , " Bob" ],
357- " score_1" : [1 , 3 ],
358- " score_2" : [2 , 4 ],
359- " average_score" : [1.5 , 3.5 ]
360- })
361- pd.testing.assert_frame_equal(calculate_player_average_scores(df), expected_result)
362- ```
363-
364- :::::::::::::::::::::::::::::::::
365-
366-
367271::::::::::::::::::::::::::::::::::::::::::::::::
368272
369273
370274::::::::::::::::::::::::::::::::::::: keypoints
371275
372- - When comparing floating point data, you should use relative/absolute tolerances instead of testing for equality.
373- - Numpy arrays cannot be compared using the ` == ` operator. Instead, use ` numpy.testing.assert_array_equal ` and ` numpy.testing.assert_allclose ` .
374- - Pandas DataFrames and Series should be compared using ` pandas.testing.assert_frame_equal ` and ` pandas.testing.assert_series_equal ` .
276+ - When comparing floating point data, you should use relative/absolute
277+ tolerances instead of testing for equality.
278+ - Numpy arrays cannot be compared using the ` == ` operator. Instead, use
279+ ` numpy.testing.assert_array_equal ` and ` numpy.testing.assert_allclose ` .
375280
376281::::::::::::::::::::::::::::::::::::::::::::::::
377282
0 commit comments