Skip to content

Update tutorial dataset storage and loading #2571

@VeckoTheGecko

Description

@VeckoTheGecko

What version of Parcels are you running?

main

Is your feature request related to a problem?

Currently I'm looking at building out tooling for #2570 so that we can add these items easily to the test suite. I will end up needing quite similar tooling to that already in

https://github.com/Parcels-code/Parcels/blob/1c6369438d8623d67722bc34441dde2dd9180041/src/parcels/_tutorial.py

except that in the _tutorial.py file it assumes that the underlying data is netcdf files, and there being some sort of implicit folder structure. This isn't really ideal:

  • download_example_dataset(...) returns the folder in which the data is downloaded. From there users have to open it - which is cumbersome as it means they have to be familiar with the structure of the files, calling open_dataset/open_mfdataset etc appropriately with the right paths and globs
  • users mainly just want to open an xarray dataset straight after downloading it (I don't see a usecase for users needing access to the files themselves)
  • adding separate tooling similar to _tutorial.py in tests/utils/_datasets.py seems like unnecessary duplication

Describe the solution you'd like

To simplify things, I wonder whether we should migrate all our example datasets to be these zipped zarr stores like in #2570

Changes:

  • make a new branch v4 in parcels-examples repo
    • unfortunately we can't just use main otherwise that would break people trying to use v3 code
  • remove download_example_dataset(...) and replace with open_example_dataset(...) (the latter returning a dataset object)
    • optionally (but I think would be a good idea), maybe we should delineate between testing and tutorial datasets by doing open_tutorial_dataset and open_testing_dataset (similarly, list_tutorial_datasets and list_testing_datasets). This is solely to delineate stability (i.e., as devs, we can confidently add or remove testing datasets. Tutorial datasets can also be used in testing, but shouldn't be removed/changed in breaking ways as that would negatively affect users following tutorials. Testing datasets shouldn't be used in tutorials.

cc @erikvansebille @fluidnumerics-joe keen on your thoughts as this will change how we approach datasets in tutorials and testing

Describe alternatives you've considered

Two separate files. This would be manageable in the short term, but I think that this issue is bets for the longterm maintainence of parcels

The main disadvantage of this approach:

  • if users are working from netcdf files, this will impact our ability to highlight "if you want to open multiple netcdf files in xarray, you have to do xr.open_mfdataset("something-*.nc")
    • I think this is a completely acceptable disadvantage. There are a million ways to open xarray datasets depending on storage, and I don't think its necessarily up to us to teach xarray API through our code examples (we can have a note block somewhere if we really want to mention it). The main thing I think is that we show how users can get from their model example datasets, to S/Ugrid compliant data, and pass that to the rest of Parcels

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions