Enrich BIDS cache with comprehensive metadata, HED annotations, and paper-sourced enrichment#974
Enrich BIDS cache with comprehensive metadata, HED annotations, and paper-sourced enrichment#974bruAristimunha wants to merge 7 commits intodevelopfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 945a4eb4bb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
moabb/datasets/bids_interface.py
Outdated
| if metadata and metadata.experiment and metadata.experiment.hed_tags: | ||
| return dict(metadata.experiment.hed_tags) |
There was a problem hiding this comment.
Merge custom HED overrides with default event annotations
ExperimentMetadata.hed_tags is documented as an override map, but this early return treats it as a full replacement and skips paradigm/default/fallback resolution for all other events. If a dataset provides only one or two custom HED tags, every non-overridden event_id entry is dropped from the exported HED mapping, so events.json ends up missing annotations for valid trial types.
Useful? React with 👍 / 👎.
moabb/datasets/bids_interface.py
Outdated
| if "HED" not in sidecar["trial_type"]: | ||
| sidecar["trial_type"]["HED"] = hed_tags |
There was a problem hiding this comment.
Merge HED mappings when events sidecar already has HED
This condition only writes HED tags when the trial_type block has no HED key at all, so files with an existing but partial HED map are never completed. In practice, if events.json already contains one custom/stale HED entry, additional event labels from hed_tags are silently omitted, leaving inconsistent per-event annotation coverage.
Useful? React with 👍 / 👎.
7855f79 to
7fbd71f
Compare
Overhaul the BIDS cache export to inject rich metadata from dataset METADATA into every layer of the BIDS structure, making cached datasets self-describing and validator-compliant. Sidecar enrichment (_build_sidecar_enrichment): - EEG-specific fields: PowerLineFrequency, EEGReference, SamplingFrequency - Hardware: Manufacturer, ManufacturersModelName (via _MANUFACTURER_LOOKUP) - Recording: RecordingType, RecordingDuration, InstitutionName - Filtering: SoftwareFilters with highpass/lowpass/notch details - Task: TaskName, TaskDescription, Instructions, CogAtlasID Dataset description (_build_dataset_description_kwargs): - Authors, License, Funding, ReferencesAndLinks, DOI - HEDVersion "8.4.0" for HED annotation support - Extra fields via _update_dataset_description_extra: Keywords, Acknowledgements, HowToAcknowledge, EthicsApprovals Participants TSV (_update_participants_tsv): - Per-subject sex/handedness from metadata lists or aggregate stats - Age columns (mean/min/max) from metadata - Species field (default "homo sapiens") Electrodes TSV (_update_electrodes_tsv): - material, manufacturer, model columns from acquisition metadata HED annotations: - 83 validated HED 8.4.0 tags across imagery (62), p300 (2), rstate (8), cvep (15) paradigms with dynamic SSVEP frequency generation - Three-tier resolution: per-dataset override, static paradigm lookup, Label/ fallback for unmapped events - Events.json sidecar patching with merge logic preserving existing entries - Experimental-stimulus on MI cue events; Label qualifiers for disambiguation (pronation/supination, mental tasks, grasp types) Schema additions (metadata/schema.py): - AcquisitionMetadata: cap_manufacturer, cap_model, electrode_type, electrode_material, device_serial - DocumentationMetadata: institution_address, institution_department, ethics_approval, acknowledgements, how_to_acknowledge, keywords - ParticipantMetadata: sexes, handedness_list, species - ExperimentMetadata: instructions, cog_atlas_id, hed_tags Raw info enrichment (_enrich_raw_info_from_metadata): - Sets subject_info sex/handedness from per-subject metadata - Falls back to aggregate gender/handedness when homogeneous - Sets line_freq from metadata when not present in raw Tests: 79 unit tests covering all enrichment helpers
7fbd71f to
9481b8e
Compare
- Fix scans.json path: create at session level next to scans.tsv instead of inside eeg/ subdirectory with desc entity - Add CogPOID, HeadCircumference to EEG sidecar enrichment - Add StimulusPresentation support to events.json sidecar - Add Description for all participants.tsv columns (participant_id, weight, height) - Add head_circumference, cog_po_id, stimulus_presentation to metadata schema - Rename _update_events_json_with_hed to _update_events_json_sidecar to reflect broader scope
Add _build_readme() that produces a rich plain-text README incorporating: - Dataset class docstring (description and references) - Core dataset attributes (code, paradigm, events, subjects, sessions) - All metadata sections: Acquisition, Participants, Experiment, Paradigm-Specific, Data Structure, Preprocessing, Signal Processing, Cross-Validation, Performance, BCI Application, Tags, Documentation, External Links, and Abstract - Standard BIDS references (MNE-BIDS, EEG-BIDS) The README is written after write_raw_bids() to overwrite the mne_bids boilerplate. Sections with no populated fields are omitted. Values of 'n/a' are excluded. Add _extract_references_from_docstring() to parse rst citation directives (.. [N]) into clean plain-text references with proper multi-line joining. Add 19 tests covering all README sections, reference extraction, empty section omission, and n/a value filtering.
…on guard Expand the has_content guard in the Documentation section to include all rendered fields (senior_author, contact_info, associated_paper_doi, institution_address, institution_department, ethics_approval, acknowledgements, how_to_acknowledge, keywords). Also adds missing fields across other sections: device_serial, eog_type, head_circumference, tasks, study_domain, sessions, contributing_labs, methodology, timestamps. Adds 14 parametrized test classes (144+ test cases) ensuring every metadata field renders correctly in the README. All 242 tests pass.
Parse HED tag strings into tree structures and render them as ASCII art with box-drawing characters in the README. Each event gets a collapsible tree showing the tag hierarchy, with leaf-only groups rendered inline for compactness. New helper functions: _split_hed_top_level, _hed_element_to_tree, _render_hed_tree. Adds 27 tests covering parsing, tree building, rendering, and README integration. Total: 269 tests passing.
Extract detailed metadata from original research papers for all dataset classes, enriching acquisition parameters, participant demographics, experimental protocols, documentation, signal processing details, and performance metrics. Also fixes update_docstring_list in utils.py to use regex header matching instead of string containment, and adds codespell ignore words for false positives (aline, theses).
Populate events, file_format, sensor_type, and pathology fields that can be derived from existing BaseDataset attributes without requiring paper research. Covers 32 dataset files with 79 field fills total, bringing overall fill rate from 51% to 59%.
Summary
DatasetMetadataschemaREADMEfiles for each BIDS dataset from docstrings and metadataupdate_docstring_listinutils.pyto use regex header matchingDetails
BIDS Metadata Enrichment
dataset_description.json,participants.tsv,*_eeg.json,*_electrodes.tsv, and*_events.jsonwith metadata from theDatasetMetadataschemaHED Annotations
HEDVersion: "8.4.0"indataset_description.jsonExperimentMetadata.hed_tagsPaper-Sourced Metadata Enrichment
METADATAblocks for 73/74 datasets by extracting information from original research papersREADME Generation
READMEfor each BIDS output directoryTest plan
pytest moabb/tests/test_bids_enrichment.py)