Skip to content

HERD Changes internal file#2111

Merged
rly merged 25 commits intodevfrom
herd_changes
Mar 24, 2026
Merged

HERD Changes internal file#2111
rly merged 25 commits intodevfrom
herd_changes

Conversation

@mavaylon1
Copy link
Copy Markdown
Contributor

@mavaylon1 mavaylon1 commented Jul 8, 2025

Allow HERD to be stored internally, but still allow external override

Future development:
On the surface to the user, HERD is a table and when you query before any I/O, you will see lists and tuples. Under the hood, HERD is written as a structured array and on read is no longer a list/tuple but a structured array. This was a bit surprising and so I spent some time to map over the array to a list on read in the get_item. However, this grew in complexity reaching __swap_refs in AbstractH5TableDataset. Without a clear easy solution, I am leaving this for future development. It is because of this the tests are structured the way they are in the asserts.

What was the reasoning behind this change? Please explain the changes briefly.

How to test the behavior?

Show how to reproduce the new behavior (can be a bug fix or a new feature)

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running ruff check . && codespell from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

@mavaylon1
Copy link
Copy Markdown
Contributor Author

Future Development:
After reviewing the HERD paper, there needs be work on updating the IO on write. In HDMFIO write, it will go through and create a HERD instance or use a HERD instance to resolve the TermSetWrapper calls that were used in the file. This is currently stored externally. I am keeping HDMF as original as possible, with everything being stored external as we discussed. That being said, there needs to be work updating the write in NWBHDFIO to write it internally.

@mavaylon1
Copy link
Copy Markdown
Contributor Author

Review Notes: This should be good to go. It needs nwb schema to be merged first.

bendichter and others added 3 commits February 19, 2026 22:19
Move 'child' and 'required_name' keys from @DocVal to __nwbfields__
where they belong, fixing the import-time crash:
  "docval for external_resources: keys ['child', 'required_name']
   are not supported by docval"

Also update nwb-schema submodule to latest herd branch (with dev
merge conflict resolved).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The nwb-schema herd branch now includes hdmf-common-schema 1.9.0,
which moved HERD from hdmf-experimental to hdmf-common. This is
required so the NWB core namespace can find the HERD type.

Note: This also requires hdmf >= 4.3.2 (unreleased) which bundles
hdmf-common-schema 1.9.0. CI should install hdmf from dev branch
until 4.3.2 is released.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bendichter
Copy link
Copy Markdown
Collaborator

Status update on this PR

I've fixed two issues that were blocking this PR:

1. Fixed: docval crash on import (pynwb)

The external_resources parameter in NWBFile.__init__'s @docval used 'child': True and 'required_name': 'external_resources' — keys that belong in __nwbfields__, not @docval. This caused every CI job to fail with:

Exception: docval for external_resources: keys ['child', 'required_name'] are not supported by docval

Fixed by moving those keys to __nwbfields__ and adding a proper @docval entry with type: HERD.

2. Fixed: nwb-schema merge conflict and hdmf-common-schema version (nwb-schema)

  • Resolved the merge conflict on the herd branch of nwb-schema (PR HERD in NWBFile schema nwb-schema#646).
  • Updated the hdmf-common-schema submodule from 1.8.0 to 1.9.0. This is necessary because nwb.file.yaml references HERD via neurodata_type_inc: HERD, but in hdmf-common-schema 1.8.0, HERD lived in the hdmf-experimental namespace. The NWB core namespace only depends on hdmf-common, so it couldn't find HERD. In hdmf-common-schema 1.9.0 (Move HERD from experimental to common hdmf-dev/hdmf-common-schema#93), HERD was moved to hdmf-common, resolving this.

Remaining blocker: hdmf release needed

The remaining 8 CI failures (test_extension.py) all hit:

ValueError: No specification for 'HERD' in namespace 'core'

This is because the released hdmf (4.3.1) bundles hdmf-common-schema 1.8.0. The hdmf dev branch already includes hdmf-common-schema 1.9.0, and all tests pass locally with it. Once hdmf releases a new version, this PR just needs to pin that version and CI should go green.

hdmf 5.0.0 will bundle hdmf-common-schema 1.9.0, which moved HERD
from the hdmf-experimental namespace to hdmf-common. This is required
for the NWB core namespace to resolve the HERD type used in
nwb.file.yaml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bendichter
Copy link
Copy Markdown
Collaborator

depends on #2141

rly and others added 5 commits March 5, 2026 18:32
- Rename external_herd/internal_herd to _external_herd/_internal_herd
- Use external_resources setter in __init__ to properly set parent
- Remove reset_herd flag; check _external_herd is not None instead
- Add docstrings to external_resources property, setter, and link_resources
- Fix test assertions for HERD keys array shape (1-D not 2-D)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolved conflict in requirements.txt by taking newer versions from dev.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.10%. Comparing base (bd9bc5a) to head (cba3474).
⚠️ Report is 2 commits behind head on dev.

Additional details and impacted files
@@           Coverage Diff           @@
##              dev    #2111   +/-   ##
=======================================
  Coverage   95.10%   95.10%           
=======================================
  Files          29       29           
  Lines        2942     2943    +1     
  Branches      443      443           
=======================================
+ Hits         2798     2799    +1     
  Misses         86       86           
  Partials       58       58           
Flag Coverage Δ
integration 72.85% <66.66%> (+<0.01%) ⬆️
unit 85.35% <100.00%> (+0.31%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rly rly marked this pull request as ready for review March 17, 2026 17:58
Comment thread src/pynwb/file.py Outdated
rly and others added 6 commits March 17, 2026 11:17
- Fix grammar and enhance HERD changelog entry, add @rly
- Use tempfile.TemporaryDirectory for test file cleanup
- Use realistic session_description in tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Combine duplicate hdmf.common imports in file.py
- Extract shared test setup into _create_nwbfile_with_herd helper
- Fix import ordering, quote consistency, and indentation in tests
- Use ISO 8601 duration format for age field

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Server requires at least version 0.74.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rly rly requested a review from oruebel March 17, 2026 20:10
@rly
Copy link
Copy Markdown
Contributor

rly commented Mar 17, 2026

A HERD tutorial will be added as part of #1984

Comment thread src/pynwb/file.py Outdated
If an external HERD has been linked via ``link_resources``, that object
is returned. Otherwise, the internal HERD set via ``__init__`` or the
setter is returned.
"""
Copy link
Copy Markdown
Contributor

@oruebel oruebel Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this setup, it looks like there is no way for a user to access the self._internal_herd once the self._external_herd is set. In general I think the whole _interal_herd and external_herd may be confusing. To clarify this a bit I would propose a few things:

  1. Rename self._external_herd to self._linked_herd. This to a) make the terminology consistent with how it used in the user-facing functions, e.g,. link_resources, and b) to avoid overloading the term external because it used both in external_resources to mean linking to things outside of the file and self._external_herd to mean data stored outside of the file.
  2. Add a function get_external_resources(bool linked) where a user can explictly say if they want the internal HERD or the linked HERD, (or maybe even a join of both, but that's not essential)
  3. Add documentation for what self._external_herd and self._internal_herd are.
  4. To be consistent, we may not want to use the term herd but just stick with self._linked_external_resources and self._external_resources

BTW, why is _internal_herd needed? Wouldn't I access that via general_external_resources?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those suggestions make sense.

The automatic __-joined naming of fields only applies to untyped spec components. Allowing that for typed spec components that are then remapped in the NWBFile ObjectMapper would require a bit of refactoring because there would be multiple mappings for the same spec, and then order matters.

Copy link
Copy Markdown
Contributor

@oruebel oruebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good. Just some inconsistency in terminology

rly and others added 3 commits March 20, 2026 10:18
- Rename _external_herd/_internal_herd to _linked_external_resources/_external_resources
- Rename herd parameter to external_resources in link_resources and setter
- Add get_external_resources(linked) method to access specific HERD
- Add test for get_external_resources

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove custom external_resources property/setter on NWBFile and rely on
the auto-generated property from __nwbfields__. The HERDManager mixin
is now an abstract interface in hdmf, and declaring external_resources
in __nwbfields__ with child=True satisfies it.

The external_resources attribute now always returns the owned HERD.
Use get_external_resources(linked=True) to access a linked HERD.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/pynwb/file.py Outdated
Comment on lines +579 to +588
def link_resources(self, external_resources):
"""Link an external HERD object as the external resources for this file.

The linked HERD will not be written on export; the original HERD
(if any) is preserved in the exported file. Use
``get_external_resources(linked=True)`` to access the linked HERD.
"""
self._linked_external_resources = external_resources

def get_external_resources(self, linked=False):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rly I'm wondering whether it would make sense to have this functionality to support linked external resources on the HERDManager base class instead. It just seems like this would be more generally useful functionality rather than being something that would be specific to NWBFile. We can leave this for now as is and make it a separate issue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I hadn't thought of that. Let's just handle it now. I'll update the HDMF PR and this one.

oruebel
oruebel previously approved these changes Mar 23, 2026
Remove link_resources and get_external_resources from NWBFile since
these methods now live on the HERDManager base class in hdmf.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rly rly merged commit 286de2d into dev Mar 24, 2026
25 of 26 checks passed
@rly rly deleted the herd_changes branch March 24, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants