Skip to content

Conversation

@edan-bainglass
Copy link
Member

@edan-bainglass edan-bainglass commented Oct 12, 2025

This PR discards the Pydantic models introduced in aiida-restapi in favor of those introduced in AiiDA in aiidateam/aiida-core#6255 and updated in aiidateam/aiida-core#6990.

Major changes

  • Routers now work with aiida-core models for most requests/responses
    • The nodes router uses a union of CreateModel classes discriminated by node_type
    • Node model handling is centeralized in a NodeModelRegistry class
  • Some separation of control introduced in the router layer:
    • Routers now only handle request/response processing
    • AiiDA CRUD logic is mostly delegated to a new repository layer (EntityRepository)
    • NodeRepository(EntityRepository) is introduced for AiiDA-node-specific CRUD logic
  • models.py module is discarded
  • Some modules only leveraged by graphQL are moved to that package
  • /[entities]/schema endpoints are introduced to all entities parameterized by "?which=[get|post]" (default=get)
    • /nodes/schema can also take "?type=[node-type-string]", e.g., "/nodes/schema?type=data.core.int.Int."
    • These endpoints return the JSON schema for the given entity taken from its aiida-core Model (get) / CreateModel (post)
  • Model fields that can be large are excluded from responses via entity.to_model(minimal=True). These fields are accessible via dedicated endpoints:
    • /computers/metadata
    • /nodes/{node_id}/[attributes|extras|repo/metadata]
  • Node repository metadata is given as objects of file name, type (FILE/DIRECTORY), size, binary (true/false), and a download url pointing at /nodes/{node_id}/repo/content?filename=[filename]
    • If any files exist in the repo, a zip option is added to the metadata list, which is downloadable via /repo/content without the filename parameter
  • Lastly, node POST requests are unified in a single /nodes endpoint capable of handling bare nodes (JSON requests), as well as one or more files (multipart requests)
    • The multipart path supports posting SinglefileData, ArrayData, FolderData, etc., as well as any node with files
    • SinglefileData can also be posted with a content attribute (str)
    • ArrayData can also be posted with an arrays attribute (single or dictionary of plain arrays)
    • Multipart posting is standardized (TBD) as follows:
      • params - the JSON part, i.e., node_type + attributes
      • files - an optional file entry - path is taken as the filename
      • files[<path>] - optional file to be stored at a specific path, e.g., files[test.txt], or (for FolderData) files[subfolder/test.txt]
      • client can append multiple files entries

Planned follow-up PRs

  • Switch from PK to UUIDs for accessing specific entities
  • Implement a QueryBuilder endpoint for general DB fetching
  • Implement read-only API mode
  • Lower priority:
    • Add server endpoints
      • General info
      • All entrypoints (currently at / endpoint)
    • Add statistics endpoints
    • Standardize request/response format against JSON-API for wider interoperability

Closes #91
Closes #92

@edan-bainglass edan-bainglass changed the title Use-aiida-orm-models Leverage AiiDA ORM models in REST API Oct 12, 2025
@edan-bainglass edan-bainglass marked this pull request as ready for review November 19, 2025 09:28
@edan-bainglass edan-bainglass marked this pull request as draft November 19, 2025 14:13
@edan-bainglass edan-bainglass marked this pull request as ready for review November 20, 2025 07:41
This was referenced Dec 2, 2025
Copy link
Member

@eimrek eimrek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @edan-bainglass. I think these changes are great, and i like the overall structure. Some extra high-level comments:

  • Some of the top level files (e.g. aiida_db_mapping) are only used in GraphQL now. Maybe for clarity they could be moved inside that subfolder.

  • As discussed earlier, the /nodes endpoint should not include /attributes and /extras/ by default as those can be large. But this also applies to other endpoints - e.g. for /groups i currently get a lot of output due to extras, but i would just want the list.

  • (also mentioned already before, but important: the uuid should be used instead of pk when retrieving singular entities).

  • Bit of a conceptual question: in the POST models, the attributes are not contained in a nested dictionary any more, so e.g.

    {
      "node_type": "data.core.int.Int",
      "value": 8
    } 
    

    instead of

    {
      "node_type": "data.core.int.Int",
      "attributes": {"value": 8}
    } 
    

    This means that for nodes that have a lot of attributes, it's more difficult to distinguish from other types of inputs (e.g. extras that are still in a dictionary). Additionally, this doesn't reflect the corresponding GET models. thoughts?

node = self.entity_cls.collection.get(pk=model.pk)

total_size = 0
for path in repo_metadata['o']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the 'o' signify?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Idea. This is from aiida-core. Object? See aiida.repository.common.File.serialize:

    def serialize(self) -> dict[str, typing.Any]:
        """Serialize the metadata into a JSON-serializable format.

        .. note:: the serialization format is optimized to reduce the size in bytes.

        :return: dictionary with the content metadata.
        """
        if self.file_type == FileType.DIRECTORY:
            if self.objects:
                return {'o': {key: obj.serialize() for key, obj in self.objects.items()}}
            return {}
        return {'k': self.key}

raise HTTPException(status_code=404, detail=f'Could not find any Computer with id {comp_id}')


@router.post(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

500 if i try to create a computer with a label that already exists. Would be great to check for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to update all try/except blocks, which currently (mostly) only handle Exception. Need to at least add a few obvious exceptions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this should be changed in aiida-core (the pydantic PR). The computer model should validate this. Then we get it via FastAPI automatically. I'll update the pydantic PR.


total_size = 0
for path in repo_metadata['o']:
size = node.base.repository.get_object_size(path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be slow for many files? (e.g. for the s3 backend or so on). Either way good to keep in a more specialized endpoint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up as part of the repository metadata. which, like attributes and extras, I'm thinking of removing by default from the node GET, placing them back via dedicated endpoints, e.g., /metadata, etc.

raise HTTPException(status_code=404, detail=f'Could no find any node with id {nodes_id}')
raise HTTPException(status_code=404, detail=f'Could not find any node with id {node_id}')

if download_format is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice here to return the download formats for the node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so sure. This endpoint is to be hit AFTER the user knows the download formats, for which we have another endpoint. See the comment there regarding adding a per-type endpoint also.

:return: A dictionary with available download formats as keys and their descriptions as values.
"""
return resources.get_all_download_formats()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the node types still have the pipe symbol at the end. I guess we can to drop this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure 🙂 I left this endpoint as is. Good to know how they are used in practice. Also, I can add an endpoint for download formats per node type, and add a pre-constructed one to the response of /nodes/types. Again, for convenience, as node_type is no fun 🙂


@router.post('/nodes/singlefile', response_model=models.Node)
# TODO what about folderdata?
@router.post(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this endpoint should instead be unified the the /nodes POST somehow. Many nodes can have files inside them apart from FolderData and SingleFileData, and creating them might not make sense without uploading the files at the same time (e.g. BandsData, ...).

Copy link
Member Author

@edan-bainglass edan-bainglass Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss in person

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at the /nodes/file-upload endpoint's parameters. Unlike /nodes, it does not take a model as input, but rather a parameters Form + an UploadFile. This is not my work. See #25 (comment). I believe this makes this endpoint incompatible w.r.t the /nodes endpoint

Copy link
Member Author

@edan-bainglass edan-bainglass Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now unified under /nodes. Please test 🙏

@edan-bainglass
Copy link
Member Author

edan-bainglass commented Dec 4, 2025

Thanks for the review @eimrek. Most of your comments I already agree with (per our in-person discussion). I provide further comments in the review, and also here below 🙂

Thanks @edan-bainglass. I think these changes are great, and i like the overall structure. Some extra high-level comments:

  • Some of the top level files (e.g. aiida_db_mapping) are only used in GraphQL now. Maybe for clarity they could be moved inside that subfolder.

Will do 👍

  • As discussed earlier, the /nodes endpoint should not include /attributes and /extras/ by default as those can be large. But this also applies to other endpoints - e.g. for /groups i currently get a lot of output due to extras, but i would just want the list.

Will do, also for repo metadata (anything that can be large) 👍

  • (also mentioned already before, but important: the uuid should be used instead of pk when retrieving singular entities).

This was using PKs prior to this PR, so will keep it as is here. Will open another PR to move everything to UUIDs 👍

  • Bit of a conceptual question: in the POST models, the attributes are not contained in a nested dictionary any more, so e.g.
    {
      "node_type": "data.core.int.Int",
      "value": 8
    } 
    
    instead of
    {
      "node_type": "data.core.int.Int",
      "attributes": {"value": 8}
    } 
    
    This means that for nodes that have a lot of attributes, it's more difficult to distinguish from other types of inputs (e.g. extras that are still in a dictionary). Additionally, this doesn't reflect the corresponding GET models. thoughts?

GET returns NodeModel, whereas POST works with NodeCreateModel, or rather specific ones, e.g., IntCreateModel, therefor "doesn't reflect". The idea of moving attributes outside of the attributes dict is to enable validation. If they are provided in attributes they hit Node.Model.attributes, which is a dict, so the attributes aren't actually validated. But IntCreateModel has value in its model, so providing it flat WILL validate it against IntCreateModel. As for extras, well that's REALLY just a dict and will be validated as such against the inherited Node.Model.extras field.

@eimrek
Copy link
Member

eimrek commented Dec 4, 2025

GET returns NodeModel, whereas POST works with NodeCreateModel, or rather specific ones, e.g., IntCreateModel, therefor "doesn't reflect". The idea of moving attributes outside of the attributes dict is to enable validation. If they are provided in attributes they hit Node.Model.attributes, which is a dict, so the attributes aren't actually validated. But IntCreateModel has value in its model, so providing it flat WILL validate it against IntCreateModel. As for extras, well that's REALLY just a dict and will be validated as such against the inherited Node.Model.extras field.

Couldn't one in principle validate the attributes dict separately for each node type? if it's possible, it might be worth considering, e.g. for potential ideas in the future to just easily GET and POST nodes (e.g. to transfer via the web from one db to another). But of course if it makes the code very messy, this probably also is fine.

@edan-bainglass
Copy link
Member Author

edan-bainglass commented Dec 4, 2025

Couldn't one in principle validate the attributes dict separately for each node type? if it's possible, it might be worth considering, e.g. for potential ideas in the future to just easily GET and POST nodes (e.g. to transfer via the web from one db to another). But of course if it makes the code very messy, this probably also is fine.

Sure, but not via the built-in pydantic model system. It would be an ad-hoc system, which is potentially messier, as you say. But good to keep this in mind.

Update

Referencing here @eimrek's proposal detailed in aiidateam/aiida-core#7131

@edan-bainglass
Copy link
Member Author

@eimrek this is in a better state. In particular, have a look at the now-merged /nodes POST endpoint.

@edan-bainglass
Copy link
Member Author

@eimrek this is in a better state. In particular, have a look at the now-merged /nodes POST endpoint.

@eimrek turns out, though merging the endpoint works in practice, it removes much of FastAPIs ability to leverage pydantic for auto-documenting the endpoint. Basically, if you type the endpoint parameter with the model union, FastAPI can clear document all the possible node POST payloads, clearly. I attempted to restore this feature in 1be188d, only to discover that though it works for auto-documentation, it also breaks the merged mechanism. I unfortunately do not see a way around it. As a compromise between clear documentation and use, I am now strongly in favor of separating the endpoints (in the absence of a solution that I'm missing). I will do this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

zipped all contents endpoint? Additional metadata attached to the endpoints for files.

2 participants