-
Notifications
You must be signed in to change notification settings - Fork 57
Description
In context of machine learning, many of the optimization algorithms rightfully preclude the presence of NaN values.
The documentation of the function may sometime mention, or not mention if a function can return NaN, and also, how it process NaN as input.
Alas, this is not systematically described, and also, people will just try functions left and right, when they are doing exploratory feature engineering.
The first focus would be to make sure the library offers some batteries included for those that don't want to find out "too late" in the pipeline (as they are long to setup, adjust, run, troubleshoot, etc.).
Without going too far in terms of how to make things perfect, and most sophisticated for long term maintenance, in all places, there is a plan that could bring some safety and long term maintainability:
- Offering an FSharp.Stats.NumericallySafe module (people open it after FSharp.Stats and it shadows the unchecked variants), there could also be a module with assertions that would defensively throw
- the module would call the existing APIs but wrap the values in a type that enforce the inspection via pattern matching or helper functions, borrowing idioms from F# core around
optionorresult - the existing API should have CLR attributes on the functions / methods signalling "emits NaN", "accepts NaN"
- there would be property based tests, possibly guided with code coverage, that would validate against presence of those attributes
- there would be a page in the documentation pages that list all the functions, with filters about those "emits NaN" and other attributes
One can dream :)
In the meantime:
- I wanted to point out that
meanGeometriccan emitNaNbut the documentation says nothing about this, and it is not exposed underFSharp.Stats.NumericallySafe. - In the documentation pages, we'd want to display warning sections after describing the formula, logic, sample code, with a styling that will catch the attention.
related: #280