eegdash.bids_metadata module#
BIDS metadata processing and query building utilities.
This module provides functions for building database queries from user parameters and enriching metadata records with participant information from BIDS datasets.
- eegdash.bids_metadata.attach_participants_extras(raw: Any, description: Any, extras: dict[str, Any]) None[source]
Attach extra participant data to a raw object and its description.
- Parameters:
raw (mne.io.Raw) – The MNE Raw object to be updated.
description (dict or pandas.Series) – The description object to be updated.
extras (dict) – Extra participant information to attach.
- eegdash.bids_metadata.build_description(record: dict[str, Any], description_fields: list[str], description_precedence: str = 'record', participants_row: dict[str, Any] | None = None) dict[str, Any][source]
Build a description dict for one record, merging participant metadata.
- eegdash.bids_metadata.build_query_from_kwargs(*, allowed_fields: AbstractSet[str] | None = None, field_spec: Mapping[str, Mapping[str, Any]] | None = None, **kwargs: Any) dict[str, Any][source]
Build and validate a MongoDB query from keyword arguments.
Converts user-friendly keyword arguments into a valid MongoDB query dictionary. Scalar values become exact matches; list-like values become
$inqueries.Entity fields (subject, task, session, run) are queried at the top level since the inject script flattens these from nested entities.
- Parameters:
allowed_fields (set of str, optional) – Override the default
eegdash.const.ALLOWED_QUERY_FIELDSwhitelist. Useful when querying a different collection (e.g. thedatasetscollection fromsearch_datasets()).field_spec (mapping of str to mapping, optional) –
Per-field rule map describing how a friendly key translates to a MongoDB filter. Each rule is a dict with optional keys:
"paths"(sequence of str): DB field paths the key resolves to. When more than one path is given, the rule emits an$orover{path: value}per path. Default:[<key>]."operator"(str, e.g."$gte"): wrap the value in{operator: value}(range operators). Default: exact match."value_aliases"(callable):v -> listreturns the full list of values to OR-match (e.g.lambda v: [v, v.lower()]for case-insensitive fallback, orlambda v: [int(v)]to coerce). Default: just[v]. Result is deduplicated.
Keys without a spec follow the legacy scalar /
$inrules.**kwargs – Query filters. Allowed keys are constrained by
allowed_fields.
- Returns:
A MongoDB query dictionary. Multiple OR-flavoured fields collect under a top-level
$and; single-path single-value fields stay flat.- Return type:
- Raises:
ValueError – If an unsupported field is provided, or if a value is None/empty.
- eegdash.bids_metadata.enrich_from_participants(bids_root: str | Path, bidspath: Any, raw: Any, description: Any) dict[str, Any][source]
Read participants.tsv and attach extra info for the subject.
- Parameters:
bids_root (str or Path) – Root directory of the BIDS dataset.
bidspath (mne_bids.BIDSPath) – BIDSPath object for the current data file.
raw (mne.io.Raw) – The MNE Raw object to be updated.
description (dict or pandas.Series) – The description object to be updated.
- Returns:
The extras that were attached.
- Return type:
- eegdash.bids_metadata.find_key_in_nested_dict(data: Any, target_key: str) Any[source]
Recursively search nested dicts/lists for target_key (case- and separator-insensitive).
- eegdash.bids_metadata.get_entities_from_record(record: dict[str, Any], entities: tuple[str, ...] = ('subject', 'session', 'run', 'task')) dict[str, Any][source]
Get multiple entity values from a record.
- eegdash.bids_metadata.get_entity_from_record(record: dict[str, Any], entity: str) Any[source]
Get an entity value from a record, supporting both v1 (flat) and v2 (nested) formats.
- Parameters:
- Returns:
The entity value, or None if not found.
- Return type:
Any
Examples
>>> # v2 record (nested) >>> rec = {"entities": {"subject": "01", "task": "rest"}} >>> get_entity_from_record(rec, "subject") '01' >>> # v1 record (flat) >>> rec = {"subject": "01", "task": "rest"} >>> get_entity_from_record(rec, "subject") '01'
- eegdash.bids_metadata.merge_participants_fields(description: dict[str, Any], participants_row: dict[str, Any] | None, description_fields: list[str] | None = None) dict[str, Any][source]
Merge fields from a participants.tsv row into a description dict.
- Parameters:
- Returns:
The enriched description dictionary.
- Return type:
- eegdash.bids_metadata.merge_query(query: dict[str, Any] | None = None, require_query: bool = True, **kwargs) dict[str, Any][source]
Merge a raw query dict with keyword arguments into a final query.
- Parameters:
- Returns:
The merged MongoDB query.
- Return type:
- Raises:
ValueError – If
require_query=Trueand neither query nor kwargs provided, or if conflicting constraints are detected.
- eegdash.bids_metadata.normalize_key(key: str) str[source]
Normalize a string key for robust matching.
Converts to lowercase, replaces non-alphanumeric chars with underscores.
- eegdash.bids_metadata.participants_extras_from_tsv(bids_root: str | Path, subject: str, *, id_columns: tuple[str, ...] = ('participant_id', 'participant', 'subject'), na_like: tuple[str, ...] = ('', 'n/a', 'na', 'nan', 'unknown', 'none')) dict[str, Any][source]
Extract additional participant information from participants.tsv.
- Parameters:
- Returns:
Extra participant information.
- Return type:
- eegdash.bids_metadata.participants_row_for_subject(bids_root: str | Path, subject: str, id_columns: tuple[str, ...] = ('participant_id', 'participant', 'subject')) Series | None[source]
Load participants.tsv and return the row for a specific subject.
- Parameters:
- Returns:
Subject’s data if found, otherwise None.
- Return type:
pandas.Series or None
- eegdash.bids_metadata.records_to_dataframe(records: Iterable[Mapping[str, Any]], columns: Sequence[str], aliases: Mapping[str, Sequence[str]] | None = None) DataFrame[source]
Project a list of MongoDB JSON records onto a fixed DataFrame layout.
Uses
pandas.json_normalize()to flatten one level of nesting (so dotted alias paths like"clinical.group"resolve), then for each canonical column picks the first non-null value across its alias list. Records that are not mappings are skipped.- Parameters:
records (iterable of dict) – Raw JSON records (e.g., from
EEGDash.find_datasets()).columns (sequence of str) – Canonical column names in the order they should appear in the returned DataFrame.
aliases (mapping of str to sequence of str, optional) – For each canonical column, the ordered list of source field paths to look at (back-fill). Dotted paths supported via
json_normalize. When omitted, the canonical column name itself is the only source.
- Returns:
One row per mapping record, with exactly
columns. Missing fields surface asNone/NaN. Empty input returns an empty DataFrame with the right column set, so callers get a stable schema regardless of result size.- Return type: