Note
Go to the end to download the full example code or to run this example in your browser via Binder.
EEGDash API Tutorial#
This tutorial demonstrates how to use the EEGDash API to query and explore EEG recording metadata without downloading any data files.
Initializing EEGDash: Create an
EEGDashclient to connect to the metadata database.Finding Records: Use
find()to retrieve recording metadata for a specific dataset.Exploring Record Keys: Inspect the fields available in each record (e.g.,
subject,task,sampling_frequency,ntimes).Filtering Records: Narrow down results by applying additional query filters such as task, subject, or session.
Basic Statistics: Compute summary statistics such as the number of subjects, recordings, and the total duration of a dataset.
## Initializing EEGDash
Creating an EEGDash instance opens a connection to the
EEGDash metadata database. No credentials are required for read-only access.
from eegdash import EEGDash
eegdash = EEGDash()
## Finding Records
Use find() to retrieve metadata records for all
recordings in a dataset. The method accepts a MongoDB-style query dictionary.
Only metadata is transferred at this stage — no EEG data is downloaded.
Note
Passing limit avoids unbounded pagination and keeps the query fast.
DATASET_ID = "ds003039"
try:
records = eegdash.find({"dataset": DATASET_ID}, limit=50)
except Exception as exc:
print(f"API unavailable ({exc}); using empty result set.")
records = []
print(f"Found {len(records)} records for dataset {DATASET_ID}.")
Found 19 records for dataset ds003039.
## Exploring Record Keys
Each record is a dictionary containing metadata fields such as the subject identifier, task name, sampling frequency, and number of time points. Printing the keys of the first record gives an overview of available fields.
if records:
print("Keys available in a record:")
for key in sorted(records[0].keys()):
print(f" {key}: {records[0][key]!r}")
Keys available in a record:
_has_missing_files: False
_id: '695936adb52d41d9f98c408e'
bids_relpath: 'sub-019/eeg/sub-019_task-neurCorrYoung_eeg.set'
bidspath: 'ds003039/sub-019/eeg/sub-019_task-neurCorrYoung_eeg.set'
ch_names: ['Fp1', 'Fz', 'F3', 'F7', 'F9', 'FT9', 'FC1', 'C3', 'T7', 'TP9', 'CP5', 'CP1', 'Pz', 'P3', 'P7', 'O1', 'Oz', 'O2', 'P4', 'P8', 'TP10', 'CP6', 'CP2', 'Cz', 'C4', 'T8', 'FT10', 'FC2', 'F4', 'F8', 'F10', 'Fp2', 'Fpz', 'AF3', 'AF7', 'F5', 'FT7', 'FC3', 'C1', 'C5', 'TP7', 'CP3', 'P5', 'P9', 'PO9', 'PO7', 'PO3', 'I1', 'I2', 'PO4', 'PO8', 'PO10', 'P10', 'P6', 'CPz', 'CP4', 'TP8', 'C6', 'C2', 'FC4', 'FT8', 'F6', 'AF4', 'AF8', 'x_dir', 'y_dir', 'z_dir']
data_name: 'ds003039_sub-019_task-neurCorrYoung_eeg.set'
dataset: 'ds003039'
datatype: 'eeg'
digested_at: '2026-04-04T19:41:28.872782+00:00'
entities_mne: {'subject': '019', 'session': None, 'task': 'neurCorrYoung', 'run': None, 'acquisition': None}
extension: '.set'
nchans: 67
ntimes: 1777354
participant_tsv: {'gender': 'M', 'age': 20.0, 'handedness': 'R'}
recording_modality: ['eeg']
run: None
sampling_frequency: 500.0
session: None
storage: {'backend': 's3', 'base': 's3://openneuro.org/ds003039', 'raw_key': 'sub-019/eeg/sub-019_task-neurCorrYoung_eeg.set', 'dep_keys': ['sub-019/eeg/sub-019_task-neurCorrYoung_channels.tsv', 'sub-019/eeg/sub-019_task-neurCorrYoung_events.tsv', 'sub-019/eeg/sub-019_task-neurCorrYoung_events.json', 'sub-019/eeg/sub-019_task-neurCorrYoung_electrodes.tsv', 'sub-019/eeg/sub-019_task-neurCorrYoung_coordsystem.json', 'sub-019/eeg/sub-019_task-neurCorrYoung_eeg.json', 'sub-019/eeg/sub-019_task-neurCorrYoung_eeg.fdt']}
subject: '019'
suffix: 'eeg'
task: 'neurCorrYoung'
## Filtering Records
find() supports a rich set of query operators.
You can pass keyword arguments as a shorthand for simple equality filters,
or combine a query dictionary with keyword filters.
The examples below show how to select recordings by task or by a list of subject identifiers.
Filter by task using keyword argument
try:
task_records = eegdash.find({"dataset": DATASET_ID}, task="rest", limit=50)
except Exception:
task_records = []
print(f"Records with task='rest': {len(task_records)}")
# Filter using the $in operator to select specific subjects
if records:
subjects_of_interest = [r["subject"] for r in records[:3]]
try:
subject_records = eegdash.find(
{"dataset": DATASET_ID, "subject": {"$in": subjects_of_interest}},
limit=50,
)
except Exception:
subject_records = []
print(f"Records for subjects {subjects_of_interest}: {len(subject_records)}")
Records with task='rest': 0
Records for subjects ['019', '010', '017']: 3
## Computing Dataset Statistics
Because each record contains ntimes (number of samples) and
sampling_frequency (Hz), it is straightforward to compute the duration of
every recording and derive summary statistics for the whole dataset.
if records:
durations = [r["ntimes"] / r["sampling_frequency"] for r in records]
subjects = set(r["subject"] for r in records)
print(
f"{len(subjects)} subjects. "
f"{len(records)} recordings. "
f"{sum(durations) / 3600:.2f} hours."
)
else:
print("No records available — skipping statistics.")
19 subjects. 19 recordings. 17.51 hours.
Total running time of the script: (0 minutes 0.390 seconds)