Skip to content

Conversation

@rwb27
Copy link
Collaborator

@rwb27 rwb27 commented Jan 9, 2026

This PR makes use of the changes in #242 to serialise Blobs more neatly. It contains those commits and should be merged afterwards.

I refactored Blob more significantly, making a number of under-the-hood changes that I think make it much clearer. One of the early commits includes a working version of Blob that uses url_for but hasn't yet been refactored. However, I don't think there's much point reviewing two versions of the same thing, hence not splitting the PR.

  • Blob is no longer a BaseModel subclass. Instead, it's a regular class that also works as a pydantic type, just like URLFor.
  • Blob may now refer to local (bytes or file) or remote (URL) data, with BlobData subclasses for each.
  • ClientBlobOutput is gone, we can use Blob instead. That eliminates a whole module and is a step towards client and server types matching.
  • I've abandoned the use of Protocol in favour of base classes - I don't see many (if any) reasons to add new BlobData types, and if we do I don't see any reason they can't also inherit from BlobData.
  • Blob objects now serialise properly when returned from any endpoint in the server. Serialising them in other contexts requires the use of labthings_fastapi.testing.use_dummy_url_for.

I've also removed the need to pass request to Invocations when generating their response - we can use URLFor to generate the URLs that need to show up in the output.

This ended up being a larger change than I intended, but I think it results in a cleaner structure, and provides pydantic functionality in a way that doesn't mess up the structure of classes that client code needs to use.

@barecheck
Copy link

barecheck bot commented Jan 14, 2026

@rwb27 rwb27 requested a review from julianstirling February 2, 2026 09:39
Copy link
Contributor

@julianstirling julianstirling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be my misunderstanding of the MR, but I am worried that actions that return blobs will be saving each blob to the blob data manager each time the blob is created. If this means this data isn't garbage collected then this will mean a possible memory leak in a long running action that calls an action that returns a blob many times.

If this is not the case some further comments or docs might be needed where the comments are

rwb27 added 12 commits February 3, 2026 16:06
This commit makes use of the new `url_for` middleware to eliminate the Blob-specific context variables.

BlobData objects are now added to a singleton BlobManager when they are created, and the URL is filled in at serialisation time.

This is a slight simplification of the old behaviour, but it's equivalent in all the ways
that matter.
Having now learned more about custom types in pydantic, I've done some more tidying here:

* Blob is no longer a BaseModel subclass. I've separated out the model (used for serialisation/validation) and the class that user code will interact with.
* BlobData is now a base class not a protocol, and there's a subclass for remote blob data that downloads on demand.

This removes most of the complicated logic from `Blob` around when we do and don't need a `BlobData`: a `Blob` is **always** backed by `BlobData` whether it's local or remote. This also means we can get rid of `ClientBlobOutput` and just use `Blob` instead.
This now correctly tells clients the media type, and uses a descriptive title. I believe it's now at least as good as the old schema.
We can now use one Blob class for client and server :)

I realised we had the potential to have inconsistencies between BlobData and the host Blob in the media type.

We now check the types match, and allow the BlobData to override the Blob's default if it's a matching but more specific type.

I've also take a pass through the blob documentation to update it where needed. Happily, as this PR only touches implementation details,
not much has changed.
I got rid of the conversion of "*" to None, I think it's clearer this way.

I also fixed a typo and ignored a codespell false positive.
The `codespell:ignore` directives made lines too long, and I'm happy that
"ser" is an abbreviation we are stuck with.
This gets full coverage of `blob.py` and checks a few things that weren't tested directly. I actually don't quite understand why coverage thought we weren't downloading the data as I'm fairly sure that was done already - but it's no bad thing to have an explicit test that doesn't go via the ThingServer.
This adds a sequence of events for blob serialisation,
BlobManager was confusingly named: it kept track of blobs but did not manage them.

I have kept the weak value dictionary of blob data, but now it's a class variable on `LocalBlobData` which feels more appropriate.

I have replaced the blob manager with a FastAPI router, which is a cleaner way to add the download_blob API route.
Copy link
Contributor

@julianstirling julianstirling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the changes. I think the endpoints for blobs rather than the manager is clearer.

The CI is failing, but this seems to just be Flake8 complaining about a docstring or two. The base_coverage gob passed, so tests are fine. This should be ready once the docstring is fixed.

@rwb27
Copy link
Collaborator Author

rwb27 commented Feb 9, 2026

I'm happy with the changes. I think the endpoints for blobs rather than the manager is clearer.

The CI is failing, but this seems to just be Flake8 complaining about a docstring or two. The base_coverage gob passed, so tests are fine. This should be ready once the docstring is fixed.

Thanks - should be done now and I'll merge when it passes.

I am confused by this - it's possible `dmypy` confused me into changing it, but I have now changed it back and it's passing mypy.

This line should be fixed by #258 in any case.
@rwb27 rwb27 merged commit 51d2349 into main Feb 9, 2026
14 checks passed
@rwb27 rwb27 deleted the refactor-blob branch February 9, 2026 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make url_for easier to access from pydantic serialisation.

2 participants