-
-
Notifications
You must be signed in to change notification settings - Fork 34
Description
We found the PR #399 that adds support for the option :disable-na-as-missing?, but found that if we do not want to support promotion, the option is not used in the FixedTypeParser.
We are using tech.ml.dataset to read CSV files in and do validation processing on the values, so we would like to keep everything as strings and have promotion be done at a later point, so we have been using the :parser-fn :string option.
We were hoping to be able to take advantage of the :disable-na-as-missing? to keep NAs as untouched as well after finding that option is there for the promotional parser.
Sample of the difference between parser-type and parser-fn option keys where both have the value of :string.
user=> (spit "/tmp/foo.csv" "a,b\n9,1\n8,2\n7,3\n6,NA\n5,5")
nil
user=> (ds/->dataset "/tmp/foo.csv" {:parser-fn :string})
/tmp/foo.csv [5 2]:
| a | b |
|---|---|
| 9 | 1 |
| 8 | 2 |
| 7 | 3 |
| 6 | |
| 5 | 5 |
user=> (map meta (vals *1))
({:categorical? true, :name "a", :datatype :string, :n-elems 5} {:categorical? true, :name "b", :datatype :string, :n-elems 5})
user=> (ds/missing *2)
{3}
user=> (ds/->dataset "/tmp/foo.csv" {:parser-type :string :disable-na-as-missing? true})
/tmp/foo.csv [5 2]:
| a | b |
|--:|----|
| 9 | 1 |
| 8 | 2 |
| 7 | 3 |
| 6 | NA |
| 5 | 5 |
user=> (map meta (vals *1))
({:name "a", :datatype :int16, :n-elems 5} {:categorical? true, :name "b", :datatype :string, :n-elems 5})
user=> (ds/missing *2)
{}
user=> (ds/->dataset "/tmp/foo.csv" {:parser-fn :string :disable-na-as-missing? true})
/tmp/foo.csv [5 2]:
| a | b |
|---|---|
| 9 | 1 |
| 8 | 2 |
| 7 | 3 |
| 6 | |
| 5 | 5 |
user=> (map meta (vals *1))
({:categorical? true, :name "a", :datatype :string, :n-elems 5} {:categorical? true, :name "b", :datatype :string, :n-elems 5})
user=> (ds/missing *2)
{3}Looking at the PR linked above, the :disable-na-as-missing? option was inspected for the Promotional and Object types but not the Fixed, so wondering if there are objections to doing the same for the Fixed type.
Not sure if it should also to be guarded to take effect only when the :parser-fn value is :string, or if there are any other caveats or things to think about.