Metadata and BIDS entities#
EEG-BIDS is the contract that lets EEGDash advertise 700+ datasets as a single queryable catalogue. Every recording the library knows about has been described using the same vocabulary — subject, session, task, run, acquisition system, electrode montage, units, sampling frequency — so that filters work the same way regardless of the originating lab or repository. This page explains what those entities mean, how they map to EEGDash query keywords, and where participant-level metadata enters the picture.
The EEG-BIDS specification was published as Pernet et al. (2019) [1]; if you have not skimmed the paper, the short version is that BIDS standardises file names, directory layout, and accompanying JSON/TSV sidecars so that any tool can walk a dataset without bespoke loaders. EEGDash reuses those entity names verbatim in its query API.
BIDS entities at a glance#
A BIDS-compliant EEG file path encodes the following entities (with the most common ones in bold):
subject (
sub-XYZ): one participant.session (
ses-N): one acquisition appointment for that participant. A subject can have multiple sessions when data are collected across days.task (
task-ABC): the experimental paradigm (resting state, oddball, steady-state visual stimulation, etc.).run (
run-K): a repetition of the same task within one session.acquisition (
acq-XYZ): an acquisition variant, such as a different amplifier or montage.processing (
proc-XYZ): a derivative pipeline tag.
The entities are hierarchical: a recording is uniquely identified by the combination of dataset and the entities present in its filename. EEGDash’s catalogue stores these as separate fields in MongoDB, which is what makes queries like “all resting-state runs from subject sub-002 of ds002718” trivial to express.
Mapping BIDS entities to EEGDash query keywords#
EEGDash’s filter keywords are intentionally identical to the BIDS entity
names. The table below shows the mapping. A keyword accepts a single string
or a list of strings; lists are interpreted as MongoDB $in queries.
BIDS entity |
EEGDash query keyword |
Notes |
|---|---|---|
|
|
One label per recording. Pass a list to filter many subjects. |
|
|
Optional. Many datasets have a single session and omit it. |
|
|
Required by BIDS for EEG. Use |
|
|
Optional. Several runs may share a task and session. |
dataset id |
|
The BIDS dataset identifier (e.g. |
modality |
|
Catalogue-level: |
sampling rate |
|
From the BIDS sidecar JSON. Numeric. |
channel count |
|
From the channels TSV. Numeric. |
n samples |
|
Sample count of the underlying file. |
Anything not listed above is currently not a query field; if a record
attribute matters but is not selectable, please open an issue. The
authoritative list lives in the source as
eegdash.const.ALLOWED_QUERY_FIELDS.
Participant metadata and description_fields#
The entities above describe recordings. A different layer of metadata
describes the participant: age, sex, group, handedness, clinical
diagnosis, and so on. In BIDS this lives in participants.tsv next to
the dataset root. EEGDash exposes those columns through the
description_fields argument of EEGDashDataset.
For each recording, the dataset object builds a description row that
combines the BIDS entities (subject, session, run, task)
with whatever participant fields you ask for:
from eegdash import EEGDashDataset
ds = EEGDashDataset(
cache_dir="./eegdash_cache",
dataset="ds002718",
description_fields=["subject", "session", "task", "age", "sex"],
)
print(ds.description.head())
This is the table you should plot, group by, and split on. Building a
braindecode-compatible WindowsDataset will copy these descriptors onto
each window automatically, so subject-aware splitters and metadata-driven
analyses both read from the same source of truth.
Why standardised metadata matters#
Two practical consequences fall out of using BIDS as the primary metadata layer:
Cross-dataset queries are cheap. Asking “every resting-state recording with at least 64 channels and a sampling rate above 250 Hz” is a single MongoDB query because every dataset is indexed by the same fields. Without BIDS this would require writing 700+ ad hoc loaders.
Splits are honest. The leakage and evaluation pages (Leakage and evaluation) rely on the fact that
subjectandsessionare reliably populated. A splitter that usesds.description["subject"]as the grouping key only works because BIDS forces every dataset to expose that label.
If you load data from a non-BIDS source, you can still use EEGDashRaw
manually — but you lose the ability to discover, split, and merge across
datasets. Spending an afternoon converting a folder of .set files to
BIDS is almost always worth it; see the EEG-BIDS paper [1] for the full
specification.
Further reading#
Gorgolewski, K. J., et al. (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data, 3(1), 160044. https://doi.org/10.1038/sdata.2016.44
Cisotto, G., & Chicco, D. (2024). Ten quick tips for clinical electroencephalographic (EEG) data acquisition and signal processing. PeerJ Computer Science, 10, e2256. https://doi.org/10.7717/peerj-cs.2256