EEGDash API Tutorial#

Estimated reading time:3 minutes

This tutorial demonstrates how to use the EEGDash API to query and explore EEG recording metadata without downloading any data files.

  1. Initializing EEGDash: Create an EEGDash client to connect to the metadata database.

  2. Finding Records: Use find() to retrieve recording metadata for a specific dataset.

  3. Exploring Record Keys: Inspect the fields available in each record (e.g., subject, task, sampling_frequency, ntimes).

  4. Filtering Records: Narrow down results by applying additional query filters such as task, subject, or session.

  5. Basic Statistics: Compute summary statistics such as the number of subjects, recordings, and the total duration of a dataset.

## Initializing EEGDash

Creating an EEGDash instance opens a connection to the EEGDash metadata database. No credentials are required for read-only access.

from eegdash import EEGDash

eegdash = EEGDash()

## Finding Records

Use find() to retrieve metadata records for all recordings in a dataset. The method accepts a MongoDB-style query dictionary. Only metadata is transferred at this stage — no EEG data is downloaded.

Note

Passing limit avoids unbounded pagination and keeps the query fast.

DATASET_ID = "ds003039"

try:
    records = eegdash.find({"dataset": DATASET_ID}, limit=50)
except Exception as exc:
    print(f"API unavailable ({exc}); using empty result set.")
    records = []

print(f"Found {len(records)} records for dataset {DATASET_ID}.")
Found 19 records for dataset ds003039.

## Exploring Record Keys

Each record is a dictionary containing metadata fields such as the subject identifier, task name, sampling frequency, and number of time points. Printing the keys of the first record gives an overview of available fields.

if records:
    print("Keys available in a record:")
    for key in sorted(records[0].keys()):
        print(f"  {key}: {records[0][key]!r}")
Keys available in a record:
  _has_missing_files: False
  _id: '695936adb52d41d9f98c408e'
  bids_relpath: 'sub-019/eeg/sub-019_task-neurCorrYoung_eeg.set'
  bidspath: 'ds003039/sub-019/eeg/sub-019_task-neurCorrYoung_eeg.set'
  ch_names: ['Fp1', 'Fz', 'F3', 'F7', 'F9', 'FT9', 'FC1', 'C3', 'T7', 'TP9', 'CP5', 'CP1', 'Pz', 'P3', 'P7', 'O1', 'Oz', 'O2', 'P4', 'P8', 'TP10', 'CP6', 'CP2', 'Cz', 'C4', 'T8', 'FT10', 'FC2', 'F4', 'F8', 'F10', 'Fp2', 'Fpz', 'AF3', 'AF7', 'F5', 'FT7', 'FC3', 'C1', 'C5', 'TP7', 'CP3', 'P5', 'P9', 'PO9', 'PO7', 'PO3', 'I1', 'I2', 'PO4', 'PO8', 'PO10', 'P10', 'P6', 'CPz', 'CP4', 'TP8', 'C6', 'C2', 'FC4', 'FT8', 'F6', 'AF4', 'AF8', 'x_dir', 'y_dir', 'z_dir']
  data_name: 'ds003039_sub-019_task-neurCorrYoung_eeg.set'
  dataset: 'ds003039'
  datatype: 'eeg'
  digested_at: '2026-04-04T19:41:28.872782+00:00'
  entities_mne: {'subject': '019', 'session': None, 'task': 'neurCorrYoung', 'run': None, 'acquisition': None}
  extension: '.set'
  nchans: 67
  ntimes: 1777354
  participant_tsv: {'gender': 'M', 'age': 20.0, 'handedness': 'R'}
  recording_modality: ['eeg']
  run: None
  sampling_frequency: 500.0
  session: None
  storage: {'backend': 's3', 'base': 's3://openneuro.org/ds003039', 'raw_key': 'sub-019/eeg/sub-019_task-neurCorrYoung_eeg.set', 'dep_keys': ['sub-019/eeg/sub-019_task-neurCorrYoung_channels.tsv', 'sub-019/eeg/sub-019_task-neurCorrYoung_events.tsv', 'sub-019/eeg/sub-019_task-neurCorrYoung_events.json', 'sub-019/eeg/sub-019_task-neurCorrYoung_electrodes.tsv', 'sub-019/eeg/sub-019_task-neurCorrYoung_coordsystem.json', 'sub-019/eeg/sub-019_task-neurCorrYoung_eeg.json', 'sub-019/eeg/sub-019_task-neurCorrYoung_eeg.fdt']}
  subject: '019'
  suffix: 'eeg'
  task: 'neurCorrYoung'

## Filtering Records

find() supports a rich set of query operators. You can pass keyword arguments as a shorthand for simple equality filters, or combine a query dictionary with keyword filters.

The examples below show how to select recordings by task or by a list of subject identifiers.

Filter by task using keyword argument

try:
    task_records = eegdash.find({"dataset": DATASET_ID}, task="rest", limit=50)
except Exception:
    task_records = []
print(f"Records with task='rest': {len(task_records)}")

# Filter using the $in operator to select specific subjects
if records:
    subjects_of_interest = [r["subject"] for r in records[:3]]
    try:
        subject_records = eegdash.find(
            {"dataset": DATASET_ID, "subject": {"$in": subjects_of_interest}},
            limit=50,
        )
    except Exception:
        subject_records = []
    print(f"Records for subjects {subjects_of_interest}: {len(subject_records)}")
Records with task='rest': 0
Records for subjects ['019', '010', '017']: 3

## Computing Dataset Statistics

Because each record contains ntimes (number of samples) and sampling_frequency (Hz), it is straightforward to compute the duration of every recording and derive summary statistics for the whole dataset.

if records:
    durations = [r["ntimes"] / r["sampling_frequency"] for r in records]
    subjects = set(r["subject"] for r in records)

    print(
        f"{len(subjects)} subjects. "
        f"{len(records)} recordings. "
        f"{sum(durations) / 3600:.2f} hours."
    )
else:
    print("No records available — skipping statistics.")
19 subjects. 19 recordings. 17.51 hours.

Total running time of the script: (0 minutes 0.390 seconds)

Gallery generated by Sphinx-Gallery