Skip to content

[Feature Request] NaN safety, we probably need something more than doc strings. #311

@smoothdeveloper

Description

@smoothdeveloper

In context of machine learning, many of the optimization algorithms rightfully preclude the presence of NaN values.

The documentation of the function may sometime mention, or not mention if a function can return NaN, and also, how it process NaN as input.

Alas, this is not systematically described, and also, people will just try functions left and right, when they are doing exploratory feature engineering.

The first focus would be to make sure the library offers some batteries included for those that don't want to find out "too late" in the pipeline (as they are long to setup, adjust, run, troubleshoot, etc.).

Without going too far in terms of how to make things perfect, and most sophisticated for long term maintenance, in all places, there is a plan that could bring some safety and long term maintainability:

  • Offering an FSharp.Stats.NumericallySafe module (people open it after FSharp.Stats and it shadows the unchecked variants), there could also be a module with assertions that would defensively throw
  • the module would call the existing APIs but wrap the values in a type that enforce the inspection via pattern matching or helper functions, borrowing idioms from F# core around option or result
  • the existing API should have CLR attributes on the functions / methods signalling "emits NaN", "accepts NaN"
  • there would be property based tests, possibly guided with code coverage, that would validate against presence of those attributes
  • there would be a page in the documentation pages that list all the functions, with filters about those "emits NaN" and other attributes

One can dream :)

In the meantime:

  • I wanted to point out that meanGeometric can emit NaN but the documentation says nothing about this, and it is not exposed under FSharp.Stats.NumericallySafe.
  • In the documentation pages, we'd want to display warning sections after describing the formula, logic, sample code, with a styling that will catch the attention.

related: #280

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions