Enrich BIDS cache with comprehensive metadata, HED annotations, and paper-sourced enrichment by bruAristimunha · Pull Request #974 · NeuroTechX/moabb

bruAristimunha · 2026-02-16T20:09:33Z

Summary

Enrich BIDS dataset cache with comprehensive structured metadata from the DatasetMetadata schema
Add HED (Hierarchical Event Descriptors) v8.4.0 annotations to BIDS event sidecars for machine-actionable event descriptions
Generate rich README files for each BIDS dataset from docstrings and metadata
Extract and apply detailed metadata from original research papers for 73 datasets, covering acquisition parameters, participant demographics, experimental protocols, documentation, signal processing, and performance metrics
Address remaining BIDS validator warnings (0 errors, reduced warnings)
Fix update_docstring_list in utils.py to use regex header matching

Details

BIDS Metadata Enrichment

Populates dataset_description.json, participants.tsv, *_eeg.json, *_electrodes.tsv, and *_events.json with metadata from the DatasetMetadata schema
Adds sidecar fields for acquisition hardware, filter settings, electrode coordinates, and experimental context

HED Annotations

Declares HEDVersion: "8.4.0" in dataset_description.json
Adds paradigm-specific HED tag mappings for motor imagery, P300, SSVEP, c-VEP, and resting state events
Supports per-dataset HED tag overrides via ExperimentMetadata.hed_tags

Paper-Sourced Metadata Enrichment

Enriched METADATA blocks for 73/74 datasets by extracting information from original research papers
Added missing schema imports across 31 dataset files
Corrected numerous metadata inaccuracies identified during paper review (wrong paradigms, participant counts, hardware specs, etc.)

README Generation

Auto-generates a comprehensive README for each BIDS output directory
Includes dataset description, paradigm details, acquisition setup, HED event tree visualization, and citation info

Test plan

All 269 BIDS enrichment tests pass (pytest moabb/tests/test_bids_enrichment.py)
All 95 dataset classes instantiate without errors
All 40 dataset modules import successfully
Pre-commit passes on all modified files
BIDS validator shows 0 errors on exported datasets

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 945a4eb4bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-16T20:14:19Z

moabb/datasets/bids_interface.py

+    if metadata and metadata.experiment and metadata.experiment.hed_tags:
+        return dict(metadata.experiment.hed_tags)


Merge custom HED overrides with default event annotations

ExperimentMetadata.hed_tags is documented as an override map, but this early return treats it as a full replacement and skips paradigm/default/fallback resolution for all other events. If a dataset provides only one or two custom HED tags, every non-overridden event_id entry is dropped from the exported HED mapping, so events.json ends up missing annotations for valid trial types.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-16T20:14:19Z

moabb/datasets/bids_interface.py

+    if "HED" not in sidecar["trial_type"]:
+        sidecar["trial_type"]["HED"] = hed_tags


Merge HED mappings when events sidecar already has HED

This condition only writes HED tags when the trial_type block has no HED key at all, so files with an existing but partial HED map are never completed. In practice, if events.json already contains one custom/stale HED entry, additional event labels from hed_tags are silently omitted, leaving inconsistent per-event annotation coverage.

Useful? React with 👍 / 👎.

Overhaul the BIDS cache export to inject rich metadata from dataset METADATA into every layer of the BIDS structure, making cached datasets self-describing and validator-compliant. Sidecar enrichment (_build_sidecar_enrichment): - EEG-specific fields: PowerLineFrequency, EEGReference, SamplingFrequency - Hardware: Manufacturer, ManufacturersModelName (via _MANUFACTURER_LOOKUP) - Recording: RecordingType, RecordingDuration, InstitutionName - Filtering: SoftwareFilters with highpass/lowpass/notch details - Task: TaskName, TaskDescription, Instructions, CogAtlasID Dataset description (_build_dataset_description_kwargs): - Authors, License, Funding, ReferencesAndLinks, DOI - HEDVersion "8.4.0" for HED annotation support - Extra fields via _update_dataset_description_extra: Keywords, Acknowledgements, HowToAcknowledge, EthicsApprovals Participants TSV (_update_participants_tsv): - Per-subject sex/handedness from metadata lists or aggregate stats - Age columns (mean/min/max) from metadata - Species field (default "homo sapiens") Electrodes TSV (_update_electrodes_tsv): - material, manufacturer, model columns from acquisition metadata HED annotations: - 83 validated HED 8.4.0 tags across imagery (62), p300 (2), rstate (8), cvep (15) paradigms with dynamic SSVEP frequency generation - Three-tier resolution: per-dataset override, static paradigm lookup, Label/ fallback for unmapped events - Events.json sidecar patching with merge logic preserving existing entries - Experimental-stimulus on MI cue events; Label qualifiers for disambiguation (pronation/supination, mental tasks, grasp types) Schema additions (metadata/schema.py): - AcquisitionMetadata: cap_manufacturer, cap_model, electrode_type, electrode_material, device_serial - DocumentationMetadata: institution_address, institution_department, ethics_approval, acknowledgements, how_to_acknowledge, keywords - ParticipantMetadata: sexes, handedness_list, species - ExperimentMetadata: instructions, cog_atlas_id, hed_tags Raw info enrichment (_enrich_raw_info_from_metadata): - Sets subject_info sex/handedness from per-subject metadata - Falls back to aggregate gender/handedness when homogeneous - Sets line_freq from metadata when not present in raw Tests: 79 unit tests covering all enrichment helpers

- Fix scans.json path: create at session level next to scans.tsv instead of inside eeg/ subdirectory with desc entity - Add CogPOID, HeadCircumference to EEG sidecar enrichment - Add StimulusPresentation support to events.json sidecar - Add Description for all participants.tsv columns (participant_id, weight, height) - Add head_circumference, cog_po_id, stimulus_presentation to metadata schema - Rename _update_events_json_with_hed to _update_events_json_sidecar to reflect broader scope

Add _build_readme() that produces a rich plain-text README incorporating: - Dataset class docstring (description and references) - Core dataset attributes (code, paradigm, events, subjects, sessions) - All metadata sections: Acquisition, Participants, Experiment, Paradigm-Specific, Data Structure, Preprocessing, Signal Processing, Cross-Validation, Performance, BCI Application, Tags, Documentation, External Links, and Abstract - Standard BIDS references (MNE-BIDS, EEG-BIDS) The README is written after write_raw_bids() to overwrite the mne_bids boilerplate. Sections with no populated fields are omitted. Values of 'n/a' are excluded. Add _extract_references_from_docstring() to parse rst citation directives (.. [N]) into clean plain-text references with proper multi-line joining. Add 19 tests covering all README sections, reference extraction, empty section omission, and n/a value filtering.

…on guard Expand the has_content guard in the Documentation section to include all rendered fields (senior_author, contact_info, associated_paper_doi, institution_address, institution_department, ethics_approval, acknowledgements, how_to_acknowledge, keywords). Also adds missing fields across other sections: device_serial, eog_type, head_circumference, tasks, study_domain, sessions, contributing_labs, methodology, timestamps. Adds 14 parametrized test classes (144+ test cases) ensuring every metadata field renders correctly in the README. All 242 tests pass.

Parse HED tag strings into tree structures and render them as ASCII art with box-drawing characters in the README. Each event gets a collapsible tree showing the tag hierarchy, with leaf-only groups rendered inline for compactness. New helper functions: _split_hed_top_level, _hed_element_to_tree, _render_hed_tree. Adds 27 tests covering parsing, tree building, rendering, and README integration. Total: 269 tests passing.

Extract detailed metadata from original research papers for all dataset classes, enriching acquisition parameters, participant demographics, experimental protocols, documentation, signal processing details, and performance metrics. Also fixes update_docstring_list in utils.py to use regex header matching instead of string containment, and adds codespell ignore words for false positives (aline, theses).

Populate events, file_format, sensor_type, and pathology fields that can be derived from existing BaseDataset attributes without requiring paper research. Covers 32 dataset files with 79 field fills total, bringing overall fill rate from 51% to 59%.

chatgpt-codex-connector bot reviewed Feb 16, 2026

View reviewed changes

bruAristimunha force-pushed the feature/hed-bids-integration branch from 7855f79 to 7fbd71f Compare February 16, 2026 20:28

bruAristimunha force-pushed the feature/hed-bids-integration branch from 7fbd71f to 9481b8e Compare February 16, 2026 20:30

bruAristimunha changed the title ~~Add HED annotations to BIDS export~~ Enrich BIDS cache with comprehensive metadata and HED annotations Feb 16, 2026

bruAristimunha added 5 commits February 16, 2026 21:57

bruAristimunha changed the title ~~Enrich BIDS cache with comprehensive metadata and HED annotations~~ Enrich BIDS cache with comprehensive metadata, HED annotations, and paper-sourced enrichment Feb 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Enrich BIDS cache with comprehensive metadata, HED annotations, and paper-sourced enrichment#974

Enrich BIDS cache with comprehensive metadata, HED annotations, and paper-sourced enrichment#974
bruAristimunha wants to merge 7 commits intodevelopfrom
feature/hed-bids-integration

bruAristimunha commented Feb 16, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 16, 2026

Uh oh!

chatgpt-codex-connector bot Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if metadata and metadata.experiment and metadata.experiment.hed_tags:
		return dict(metadata.experiment.hed_tags)

		if "HED" not in sidecar["trial_type"]:
		sidecar["trial_type"]["HED"] = hed_tags

Comments

Conversation

bruAristimunha commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

BIDS Metadata Enrichment

HED Annotations

Paper-Sourced Metadata Enrichment

README Generation

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bruAristimunha commented Feb 16, 2026 •

edited

Loading