Update fieldset ingestion to use convert modules#40
Update fieldset ingestion to use convert modules#40VeckoTheGecko wants to merge 11 commits intomainfrom
Conversation
I'm going to get started setting the groundwork on this - keen to discuss if either of you have ideas so this can be further refined :) |
Joe mentions that we have a convert function for this |
|
The FESOM benchmark was working last I had run it. From the error shown in the top, did something in Parcels change in the |
OK - I've gone ahead and updated the ingestion code here so that its inline with Parcels-code/Parcels#2549 . We are closer to having a working benchmark suite, but unfortunately we're not there yet. Hence I propose that we go ahead and merge this anyway as it gets us closer to the end goal.
MOI error
Currently ingestion works, but we get an error during the execution itself (note this PR now closes #33, as ingestion works and we now have a different error).
FESOM error
Here we get an error on the selection of the interpolator - this is a bug upstream in Parcels (this dataset has dims
('time', 'nz1', 'elem', 'nod2', 'nz')but_select_uxinterpolatordoesn't expect these dimension namings isn't able to determine the right interpolators. AFAICT this problem was always here for this dataset. Let me know what you think @fluidnumerics-joe ).pixi run setup-data pixi run asv run --bench 'fesom2.*'Future work
Better testing
I'm finding it quite difficult to debug all of this since it's working using heavy datasets, and the iteration loop using
asv runis very frustrating (e.g., run benchmark, find error (that I can't easily open using pdb since that has poor integration withasv), recreate the error using a normal python script, realise the bug is in Parcels, etc).At the moment we have the following (which can be thought of as a pyramid - from the top to the most foundational):
There is, however, the possibility of a layer in between:
open_cdl? pydata/xarray#6269 andds.to_dict(data=False)is close to what we need, but excludes coordiantes)This intermediate layer is hinted at in our "Participating in the issue tracker: 'Parcels doesn't work with my data'" doc page section, but I think can be formalised and also extended to the coordinates (as those are also quite important). Regarding implementation, we can create a separate repo to host these small files (similar to https://github.com/Parcels-code/parcels-data )
This intermediate layer has the following benefits:
Keen to hear your thoughts @erikvansebille .
Better cataloguing
From https://discourse.pangeo.io/t/data-pipelining-and-cataloging-best-practices-using-intake-xarray-to-transform-and-combine-data-metadata/5550/6 , I think we can streamline how we ingest data (by using Intake 2 in combination with the
convertmodule or in combination withuxarray). Honestly, this is a low priority - I'm happy with what we have at the moment.The important this from my POV is the "Better testing" above as that will flag any errors with our convert module.