The next example seems reasonable to me:
df = pandas.DataFrame({'col1': ['1', '2', '3'],
'col2': ['9', '9', '9']})
df.apply(int)
And looks like it should convert the data in the DataFrame to integers, by calling the int() function for every element.
This would be true for Series.apply, but DataFrame.apply parameter is a function that receives a whole Series at a time, not individual (scalar) values. The function that receives one value at a time is DataFrame.applymap.
This is how pandas is designed, and while probably a bit confusing is reasonable. So, the previous example actually fails. The error is:
TypeError: ("cannot convert the series to <class 'int'>", 'occurred at index col1')
Feel free to disagree, but personally I think the error message doesn't do a great job at telling the user what's wrong, or give hints on how to fix it. I think something like the next should be more useful:
TypeError: The function `int` passed to `DataFrame.apply` should expect a `Series` as the argument. To apply a function that receives a single item at a time use `DataFrame.applymap`.
While this may look straight-forward, this is easy and surely not as easy as replacing the error message. The current reported message is reported by the Series when is trying to be converted to an integer by int(pandas.Series()), so it has nothing to do with apply.
I think it's doable to have an appropriate error message, but not sure about the implications.
Feel free to discuss your proposals on how to fix it here, or to try your approach and open a PR, and have the discussion there.
The next example seems reasonable to me:
And looks like it should convert the data in the DataFrame to integers, by calling the
int()function for every element.This would be true for
Series.apply, butDataFrame.applyparameter is a function that receives a wholeSeriesat a time, not individual (scalar) values. The function that receives one value at a time isDataFrame.applymap.This is how pandas is designed, and while probably a bit confusing is reasonable. So, the previous example actually fails. The error is:
Feel free to disagree, but personally I think the error message doesn't do a great job at telling the user what's wrong, or give hints on how to fix it. I think something like the next should be more useful:
While this may look straight-forward, this is easy and surely not as easy as replacing the error message. The current reported message is reported by the
Serieswhen is trying to be converted to an integer byint(pandas.Series()), so it has nothing to do withapply.I think it's doable to have an appropriate error message, but not sure about the implications.
Feel free to discuss your proposals on how to fix it here, or to try your approach and open a PR, and have the discussion there.