Skip to content

Comments

Enrich BIDS cache with comprehensive metadata, HED annotations, and paper-sourced enrichment#974

Open
bruAristimunha wants to merge 7 commits intodevelopfrom
feature/hed-bids-integration
Open

Enrich BIDS cache with comprehensive metadata, HED annotations, and paper-sourced enrichment#974
bruAristimunha wants to merge 7 commits intodevelopfrom
feature/hed-bids-integration

Conversation

@bruAristimunha
Copy link
Collaborator

@bruAristimunha bruAristimunha commented Feb 16, 2026

Summary

  • Enrich BIDS dataset cache with comprehensive structured metadata from the DatasetMetadata schema
  • Add HED (Hierarchical Event Descriptors) v8.4.0 annotations to BIDS event sidecars for machine-actionable event descriptions
  • Generate rich README files for each BIDS dataset from docstrings and metadata
  • Extract and apply detailed metadata from original research papers for 73 datasets, covering acquisition parameters, participant demographics, experimental protocols, documentation, signal processing, and performance metrics
  • Address remaining BIDS validator warnings (0 errors, reduced warnings)
  • Fix update_docstring_list in utils.py to use regex header matching

Details

BIDS Metadata Enrichment

  • Populates dataset_description.json, participants.tsv, *_eeg.json, *_electrodes.tsv, and *_events.json with metadata from the DatasetMetadata schema
  • Adds sidecar fields for acquisition hardware, filter settings, electrode coordinates, and experimental context

HED Annotations

  • Declares HEDVersion: "8.4.0" in dataset_description.json
  • Adds paradigm-specific HED tag mappings for motor imagery, P300, SSVEP, c-VEP, and resting state events
  • Supports per-dataset HED tag overrides via ExperimentMetadata.hed_tags

Paper-Sourced Metadata Enrichment

  • Enriched METADATA blocks for 73/74 datasets by extracting information from original research papers
  • Added missing schema imports across 31 dataset files
  • Corrected numerous metadata inaccuracies identified during paper review (wrong paradigms, participant counts, hardware specs, etc.)

README Generation

  • Auto-generates a comprehensive README for each BIDS output directory
  • Includes dataset description, paradigm details, acquisition setup, HED event tree visualization, and citation info

Test plan

  • All 269 BIDS enrichment tests pass (pytest moabb/tests/test_bids_enrichment.py)
  • All 95 dataset classes instantiate without errors
  • All 40 dataset modules import successfully
  • Pre-commit passes on all modified files
  • BIDS validator shows 0 errors on exported datasets

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 945a4eb4bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 789 to 790
if metadata and metadata.experiment and metadata.experiment.hed_tags:
return dict(metadata.experiment.hed_tags)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Merge custom HED overrides with default event annotations

ExperimentMetadata.hed_tags is documented as an override map, but this early return treats it as a full replacement and skips paradigm/default/fallback resolution for all other events. If a dataset provides only one or two custom HED tags, every non-overridden event_id entry is dropped from the exported HED mapping, so events.json ends up missing annotations for valid trial types.

Useful? React with 👍 / 👎.

Comment on lines 853 to 854
if "HED" not in sidecar["trial_type"]:
sidecar["trial_type"]["HED"] = hed_tags

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Merge HED mappings when events sidecar already has HED

This condition only writes HED tags when the trial_type block has no HED key at all, so files with an existing but partial HED map are never completed. In practice, if events.json already contains one custom/stale HED entry, additional event labels from hed_tags are silently omitted, leaving inconsistent per-event annotation coverage.

Useful? React with 👍 / 👎.

@bruAristimunha bruAristimunha force-pushed the feature/hed-bids-integration branch from 7855f79 to 7fbd71f Compare February 16, 2026 20:28
Overhaul the BIDS cache export to inject rich metadata from dataset
METADATA into every layer of the BIDS structure, making cached datasets
self-describing and validator-compliant.

Sidecar enrichment (_build_sidecar_enrichment):
- EEG-specific fields: PowerLineFrequency, EEGReference, SamplingFrequency
- Hardware: Manufacturer, ManufacturersModelName (via _MANUFACTURER_LOOKUP)
- Recording: RecordingType, RecordingDuration, InstitutionName
- Filtering: SoftwareFilters with highpass/lowpass/notch details
- Task: TaskName, TaskDescription, Instructions, CogAtlasID

Dataset description (_build_dataset_description_kwargs):
- Authors, License, Funding, ReferencesAndLinks, DOI
- HEDVersion "8.4.0" for HED annotation support
- Extra fields via _update_dataset_description_extra: Keywords,
  Acknowledgements, HowToAcknowledge, EthicsApprovals

Participants TSV (_update_participants_tsv):
- Per-subject sex/handedness from metadata lists or aggregate stats
- Age columns (mean/min/max) from metadata
- Species field (default "homo sapiens")

Electrodes TSV (_update_electrodes_tsv):
- material, manufacturer, model columns from acquisition metadata

HED annotations:
- 83 validated HED 8.4.0 tags across imagery (62), p300 (2), rstate (8),
  cvep (15) paradigms with dynamic SSVEP frequency generation
- Three-tier resolution: per-dataset override, static paradigm lookup,
  Label/ fallback for unmapped events
- Events.json sidecar patching with merge logic preserving existing entries
- Experimental-stimulus on MI cue events; Label qualifiers for
  disambiguation (pronation/supination, mental tasks, grasp types)

Schema additions (metadata/schema.py):
- AcquisitionMetadata: cap_manufacturer, cap_model, electrode_type,
  electrode_material, device_serial
- DocumentationMetadata: institution_address, institution_department,
  ethics_approval, acknowledgements, how_to_acknowledge, keywords
- ParticipantMetadata: sexes, handedness_list, species
- ExperimentMetadata: instructions, cog_atlas_id, hed_tags

Raw info enrichment (_enrich_raw_info_from_metadata):
- Sets subject_info sex/handedness from per-subject metadata
- Falls back to aggregate gender/handedness when homogeneous
- Sets line_freq from metadata when not present in raw

Tests: 79 unit tests covering all enrichment helpers
@bruAristimunha bruAristimunha force-pushed the feature/hed-bids-integration branch from 7fbd71f to 9481b8e Compare February 16, 2026 20:30
@bruAristimunha bruAristimunha changed the title Add HED annotations to BIDS export Enrich BIDS cache with comprehensive metadata and HED annotations Feb 16, 2026
- Fix scans.json path: create at session level next to scans.tsv
  instead of inside eeg/ subdirectory with desc entity
- Add CogPOID, HeadCircumference to EEG sidecar enrichment
- Add StimulusPresentation support to events.json sidecar
- Add Description for all participants.tsv columns (participant_id,
  weight, height)
- Add head_circumference, cog_po_id, stimulus_presentation to
  metadata schema
- Rename _update_events_json_with_hed to _update_events_json_sidecar
  to reflect broader scope
Add _build_readme() that produces a rich plain-text README incorporating:
- Dataset class docstring (description and references)
- Core dataset attributes (code, paradigm, events, subjects, sessions)
- All metadata sections: Acquisition, Participants, Experiment,
  Paradigm-Specific, Data Structure, Preprocessing, Signal Processing,
  Cross-Validation, Performance, BCI Application, Tags, Documentation,
  External Links, and Abstract
- Standard BIDS references (MNE-BIDS, EEG-BIDS)

The README is written after write_raw_bids() to overwrite the mne_bids
boilerplate. Sections with no populated fields are omitted. Values of
'n/a' are excluded.

Add _extract_references_from_docstring() to parse rst citation
directives (.. [N]) into clean plain-text references with proper
multi-line joining.

Add 19 tests covering all README sections, reference extraction,
empty section omission, and n/a value filtering.
…on guard

Expand the has_content guard in the Documentation section to include
all rendered fields (senior_author, contact_info, associated_paper_doi,
institution_address, institution_department, ethics_approval,
acknowledgements, how_to_acknowledge, keywords). Also adds missing
fields across other sections: device_serial, eog_type, head_circumference,
tasks, study_domain, sessions, contributing_labs, methodology, timestamps.

Adds 14 parametrized test classes (144+ test cases) ensuring every
metadata field renders correctly in the README. All 242 tests pass.
Parse HED tag strings into tree structures and render them as ASCII
art with box-drawing characters in the README. Each event gets a
collapsible tree showing the tag hierarchy, with leaf-only groups
rendered inline for compactness.

New helper functions: _split_hed_top_level, _hed_element_to_tree,
_render_hed_tree. Adds 27 tests covering parsing, tree building,
rendering, and README integration. Total: 269 tests passing.
Extract detailed metadata from original research papers for all
dataset classes, enriching acquisition parameters, participant
demographics, experimental protocols, documentation, signal
processing details, and performance metrics.

Also fixes update_docstring_list in utils.py to use regex header
matching instead of string containment, and adds codespell ignore
words for false positives (aline, theses).
@bruAristimunha bruAristimunha changed the title Enrich BIDS cache with comprehensive metadata and HED annotations Enrich BIDS cache with comprehensive metadata, HED annotations, and paper-sourced enrichment Feb 16, 2026
Populate events, file_format, sensor_type, and pathology fields
that can be derived from existing BaseDataset attributes without
requiring paper research. Covers 32 dataset files with 79 field
fills total, bringing overall fill rate from 51% to 59%.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant