Avoid issue with deepcopying/pickling IntakeESGFDatasets#2990
Avoid issue with deepcopying/pickling IntakeESGFDatasets#2990bouweandela wants to merge 6 commits intomainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2990 +/- ##
=======================================
Coverage ? 95.69%
=======================================
Files ? 267
Lines ? 15733
Branches ? 0
=======================================
Hits ? 15055
Misses ? 678
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
60f584e to
11574e6
Compare
7f8dc19 to
2216c23
Compare
| for arg, value in kwargs.items() | ||
| } | ||
| for fn, kwargs in settings.items() | ||
| } |
There was a problem hiding this comment.
why don't you dump to a json payload that can be serialized and passed around?
There was a problem hiding this comment.
We use requests_cache.CachedSession objects with intake-esgf to make repeat searches fast. Those aren't trivially serializeable (see requests-cache/requests-cache#707 for some background), regardless of how you serialize them. We don't need to serialize Dataset.files because we can just search for them again. Therefore, this seems the simplest solution.
There was a problem hiding this comment.
very interesting - thanks for the pointer, bud! Why do I have the feeling that a Dataset can be serialized since it's a dictionary of simple objects (not too complicated for JSON) and also the Session can be serialized since the connection configuration is simple - OK no other stuff like backends and more complicated things though - question is, would that help vs searching for the file again? or am I trying to fit a square peg in a round hole? I am not a fan of pickles though, security-wise 🥒
There was a problem hiding this comment.
I really don't think this is worth optimizing. It should be no problem to search for the same file more than once occasionally, with the current code this only happens when the target dataset for the regrid preprocessor function is a dataset.
There was a problem hiding this comment.
yeah that's why I decided to dump (pun intended) my idea above - just realized it's a corner case anyway 😀
valeriupredoi
left a comment
There was a problem hiding this comment.
leave the guff about JSON aside for now, adopting the quick solution 🍺 BTW are you familiar with CBOR (since I am on about JSON) - just found out about it - lots smoother than JSON
|
I haven't used CBOR before, but it looks nice. |
Description
Closes #2989
Link to documentation:
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: