eegdash.bids_metadata module#

BIDS metadata processing and query building utilities.

This module provides functions for building database queries from user parameters and enriching metadata records with participant information from BIDS datasets.

eegdash.bids_metadata.attach_participants_extras(raw: Any, description: Any, extras: dict[str, Any]) None[source]

Attach extra participant data to a raw object and its description.

Parameters:
  • raw (mne.io.Raw) – The MNE Raw object to be updated.

  • description (dict or pandas.Series) – The description object to be updated.

  • extras (dict) – Extra participant information to attach.

eegdash.bids_metadata.build_description(record: dict[str, Any], description_fields: list[str], description_precedence: str = 'record', participants_row: dict[str, Any] | None = None) dict[str, Any][source]

Build a description dict for one record, merging participant metadata.

eegdash.bids_metadata.build_query_from_kwargs(*, allowed_fields: AbstractSet[str] | None = None, field_spec: Mapping[str, Mapping[str, Any]] | None = None, **kwargs: Any) dict[str, Any][source]

Build and validate a MongoDB query from keyword arguments.

Converts user-friendly keyword arguments into a valid MongoDB query dictionary. Scalar values become exact matches; list-like values become $in queries.

Entity fields (subject, task, session, run) are queried at the top level since the inject script flattens these from nested entities.

Parameters:
  • allowed_fields (set of str, optional) – Override the default eegdash.const.ALLOWED_QUERY_FIELDS whitelist. Useful when querying a different collection (e.g. the datasets collection from search_datasets()).

  • field_spec (mapping of str to mapping, optional) –

    Per-field rule map describing how a friendly key translates to a MongoDB filter. Each rule is a dict with optional keys:

    • "paths" (sequence of str): DB field paths the key resolves to. When more than one path is given, the rule emits an $or over {path: value} per path. Default: [<key>].

    • "operator" (str, e.g. "$gte"): wrap the value in {operator: value} (range operators). Default: exact match.

    • "value_aliases" (callable): v -> list returns the full list of values to OR-match (e.g. lambda v: [v, v.lower()] for case-insensitive fallback, or lambda v: [int(v)] to coerce). Default: just [v]. Result is deduplicated.

    Keys without a spec follow the legacy scalar / $in rules.

  • **kwargs – Query filters. Allowed keys are constrained by allowed_fields.

Returns:

A MongoDB query dictionary. Multiple OR-flavoured fields collect under a top-level $and; single-path single-value fields stay flat.

Return type:

dict

Raises:

ValueError – If an unsupported field is provided, or if a value is None/empty.

eegdash.bids_metadata.enrich_from_participants(bids_root: str | Path, bidspath: Any, raw: Any, description: Any) dict[str, Any][source]

Read participants.tsv and attach extra info for the subject.

Parameters:
  • bids_root (str or Path) – Root directory of the BIDS dataset.

  • bidspath (mne_bids.BIDSPath) – BIDSPath object for the current data file.

  • raw (mne.io.Raw) – The MNE Raw object to be updated.

  • description (dict or pandas.Series) – The description object to be updated.

Returns:

The extras that were attached.

Return type:

dict

eegdash.bids_metadata.find_key_in_nested_dict(data: Any, target_key: str) Any[source]

Recursively search nested dicts/lists for target_key (case- and separator-insensitive).

eegdash.bids_metadata.get_entities_from_record(record: dict[str, Any], entities: tuple[str, ...] = ('subject', 'session', 'run', 'task')) dict[str, Any][source]

Get multiple entity values from a record.

Parameters:
  • record (dict) – A record dictionary.

  • entities (tuple of str) – Entity names to extract.

Returns:

Dictionary of entity values (only non-None values included).

Return type:

dict

eegdash.bids_metadata.get_entity_from_record(record: dict[str, Any], entity: str) Any[source]

Get an entity value from a record, supporting both v1 (flat) and v2 (nested) formats.

Parameters:
  • record (dict) – A record dictionary.

  • entity (str) – Entity name (e.g., “subject”, “task”, “session”, “run”).

Returns:

The entity value, or None if not found.

Return type:

Any

Examples

>>> # v2 record (nested)
>>> rec = {"entities": {"subject": "01", "task": "rest"}}
>>> get_entity_from_record(rec, "subject")
'01'
>>> # v1 record (flat)
>>> rec = {"subject": "01", "task": "rest"}
>>> get_entity_from_record(rec, "subject")
'01'
eegdash.bids_metadata.merge_participants_fields(description: dict[str, Any], participants_row: dict[str, Any] | None, description_fields: list[str] | None = None) dict[str, Any][source]

Merge fields from a participants.tsv row into a description dict.

Parameters:
  • description (dict) – The description dictionary to enrich.

  • participants_row (dict or None) – A row from participants.tsv. If None, returns description unchanged.

  • description_fields (list of str, optional) – Specific fields to include (matched using normalized keys).

Returns:

The enriched description dictionary.

Return type:

dict

eegdash.bids_metadata.merge_query(query: dict[str, Any] | None = None, require_query: bool = True, **kwargs) dict[str, Any][source]

Merge a raw query dict with keyword arguments into a final query.

Parameters:
  • query (dict or None) – Raw MongoDB query dictionary. Pass {} to match all documents.

  • require_query (bool, default True) – If True, raise ValueError when no query or kwargs provided.

  • **kwargs – User-friendly field filters (converted via build_query_from_kwargs).

Returns:

The merged MongoDB query.

Return type:

dict

Raises:

ValueError – If require_query=True and neither query nor kwargs provided, or if conflicting constraints are detected.

eegdash.bids_metadata.normalize_key(key: str) str[source]

Normalize a string key for robust matching.

Converts to lowercase, replaces non-alphanumeric chars with underscores.

eegdash.bids_metadata.participants_extras_from_tsv(bids_root: str | Path, subject: str, *, id_columns: tuple[str, ...] = ('participant_id', 'participant', 'subject'), na_like: tuple[str, ...] = ('', 'n/a', 'na', 'nan', 'unknown', 'none')) dict[str, Any][source]

Extract additional participant information from participants.tsv.

Parameters:
  • bids_root (str or Path) – Root directory of the BIDS dataset.

  • subject (str) – Subject identifier.

  • id_columns (tuple of str) – Column names treated as identifiers (excluded from output).

  • na_like (tuple of str) – Values considered as “Not Available” (excluded).

Returns:

Extra participant information.

Return type:

dict

eegdash.bids_metadata.participants_row_for_subject(bids_root: str | Path, subject: str, id_columns: tuple[str, ...] = ('participant_id', 'participant', 'subject')) Series | None[source]

Load participants.tsv and return the row for a specific subject.

Parameters:
  • bids_root (str or Path) – Root directory of the BIDS dataset.

  • subject (str) – Subject identifier (e.g., “01” or “sub-01”).

  • id_columns (tuple of str) – Column names to search for the subject identifier.

Returns:

Subject’s data if found, otherwise None.

Return type:

pandas.Series or None

eegdash.bids_metadata.records_to_dataframe(records: Iterable[Mapping[str, Any]], columns: Sequence[str], aliases: Mapping[str, Sequence[str]] | None = None) DataFrame[source]

Project a list of MongoDB JSON records onto a fixed DataFrame layout.

Uses pandas.json_normalize() to flatten one level of nesting (so dotted alias paths like "clinical.group" resolve), then for each canonical column picks the first non-null value across its alias list. Records that are not mappings are skipped.

Parameters:
  • records (iterable of dict) – Raw JSON records (e.g., from EEGDash.find_datasets()).

  • columns (sequence of str) – Canonical column names in the order they should appear in the returned DataFrame.

  • aliases (mapping of str to sequence of str, optional) – For each canonical column, the ordered list of source field paths to look at (back-fill). Dotted paths supported via json_normalize. When omitted, the canonical column name itself is the only source.

Returns:

One row per mapping record, with exactly columns. Missing fields surface as None/NaN. Empty input returns an empty DataFrame with the right column set, so callers get a stable schema regardless of result size.

Return type:

pandas.DataFrame