Note
Go to the end to download the full example code. or to run this example in your browser via Binder
Working Offline with EEGDash#
Many HPC clusters restrict or block network access. It’s common to have dedicated queues for internet-enabled jobs that differ from GPU queues. This tutorial shows how to use EEGChallengeDataset offline once a dataset is present on disk.
from pathlib import Path
import platformdirs
from eegdash.const import RELEASE_TO_OPENNEURO_DATASET_MAP
from eegdash.dataset.dataset import EEGChallengeDataset
# We'll use Release R2 as an example (HBN subset).
# :doc:`EEGChallengeDataset </api/dataset/eegdash.dataset.EEGChallengeDataset>`
# uses a suffixed cache folder for the competition data (e.g., "-bdf-mini").
release = "R2"
dataset_id = RELEASE_TO_OPENNEURO_DATASET_MAP[release]
task = "RestingState"
# Choose a cache directory. This should be on a fast local filesystem.
cache_dir = Path(platformdirs.user_cache_dir("EEGDash"))
cache_dir.mkdir(parents=True, exist_ok=True)
Step 1: Populate the local cache (Online)#
This block downloads the dataset from S3 to your local cache directory.
Run this part on a machine with internet access. If the dataset is already
on your disk at the specified cache_dir
, you can comment out or skip
this section.
To keep this example self-contained, we prefetch the data here.
ds_online = EEGChallengeDataset(
release=release,
cache_dir=cache_dir,
task=task,
mini=True,
)
# Optional prefetch of all recordings (downloads everything to cache).
from joblib import Parallel, delayed
_ = Parallel(n_jobs=-1)(delayed(lambda d: d.raw)(d) for d in ds_online.datasets)
╭────────────────────── EEG 2025 Competition Data Notice ──────────────────────╮
│ This object loads the HBN dataset that has been preprocessed for the EEG │
│ Challenge: │
│ * Downsampled from 500Hz to 100Hz │
│ * Bandpass filtered (0.5-50 Hz) │
│ │
│ For full preprocessing applied for competition details, see: │
│ https://github.com/eeg2025/downsample-datasets │
│ │
│ The HBN dataset have some preprocessing applied by the HBN team: │
│ * Re-reference (Cz Channel) │
│ │
│ IMPORTANT: The data accessed via `EEGChallengeDataset` is NOT identical to │
│ what you get from EEGDashDataset directly. │
│ If you are participating in the competition, always use │
│ `EEGChallengeDataset` to ensure consistency with the challenge data. │
╰──────────────────────── Source: EEGChallengeDataset ─────────────────────────╯
Step 2: Basic Offline Usage#
Once the data is cached locally, you can interact with it without needing an
internet connection. The key is to instantiate your dataset object with the
download=False
flag. This tells EEGChallengeDataset
to look for data in the cache_dir
instead of trying to connect to the
database or S3.
# Here we check that the local cache folder exists
offline_root = cache_dir / f"{dataset_id}-bdf-mini"
print(f"Local dataset folder exists: {offline_root.exists()}\n{offline_root}")
ds_offline = EEGChallengeDataset(
release=release,
cache_dir=cache_dir,
task=task,
download=False,
)
print(f"Found {len(ds_offline.datasets)} recording(s) offline.")
if ds_offline.datasets:
print("First record bidspath:", ds_offline.datasets[0].record["bidspath"])
Local dataset folder exists: True
/home/runner/.cache/EEGDash/ds005506-bdf-mini
╭────────────────────── EEG 2025 Competition Data Notice ──────────────────────╮
│ This object loads the HBN dataset that has been preprocessed for the EEG │
│ Challenge: │
│ * Downsampled from 500Hz to 100Hz │
│ * Bandpass filtered (0.5-50 Hz) │
│ │
│ For full preprocessing applied for competition details, see: │
│ https://github.com/eeg2025/downsample-datasets │
│ │
│ The HBN dataset have some preprocessing applied by the HBN team: │
│ * Re-reference (Cz Channel) │
│ │
│ IMPORTANT: The data accessed via `EEGChallengeDataset` is NOT identical to │
│ what you get from EEGDashDataset directly. │
│ If you are participating in the competition, always use │
│ `EEGChallengeDataset` to ensure consistency with the challenge data. │
╰──────────────────────── Source: EEGChallengeDataset ─────────────────────────╯
Found 20 recording(s) offline.
First record bidspath: ds005506/sub-NDARAB793GL3/eeg/sub-NDARAB793GL3_task-RestingState_eeg.bdf
Step 3: Filtering Entities Offline#
Even without a database connection, you can still filter your dataset by
BIDS entities like subject, session, or task. When download=False
,
EEGChallengeDataset
uses the BIDS directory structure and filenames to apply these filters. This
example shows how to load data for a specific subject from the local cache.
ds_offline_sub = EEGChallengeDataset(
cache_dir=cache_dir,
release=release,
download=False,
subject="NDARAB793GL3",
)
print(f"Filtered by subject=NDARAB793GL3: {len(ds_offline_sub.datasets)} recording(s).")
if ds_offline_sub.datasets:
keys = ("dataset", "subject", "task", "run")
print("Records (dataset, subject, task, run):")
for idx, base_ds in enumerate(ds_offline_sub.datasets, start=1):
rec = base_ds.record
summary = ", ".join(f"{k}={rec.get(k)}" for k in keys)
print(f" {idx:03d}: {summary}")
╭────────────────────── EEG 2025 Competition Data Notice ──────────────────────╮
│ This object loads the HBN dataset that has been preprocessed for the EEG │
│ Challenge: │
│ * Downsampled from 500Hz to 100Hz │
│ * Bandpass filtered (0.5-50 Hz) │
│ │
│ For full preprocessing applied for competition details, see: │
│ https://github.com/eeg2025/downsample-datasets │
│ │
│ The HBN dataset have some preprocessing applied by the HBN team: │
│ * Re-reference (Cz Channel) │
│ │
│ IMPORTANT: The data accessed via `EEGChallengeDataset` is NOT identical to │
│ what you get from EEGDashDataset directly. │
│ If you are participating in the competition, always use │
│ `EEGChallengeDataset` to ensure consistency with the challenge data. │
╰──────────────────────── Source: EEGChallengeDataset ─────────────────────────╯
Filtered by subject=NDARAB793GL3: 1 recording(s).
Records (dataset, subject, task, run):
001: dataset=ds005506, subject=NDARAB793GL3, task=RestingState, run=None
Step 4: Comparing Online vs. Offline Data#
As a sanity check, you can verify that the data loaded from your local cache is identical to the data fetched from the online sources. This section compares the shape of the raw data from the online and offline datasets to ensure they match. This is a good way to confirm your local cache is complete and correct.
If you have network access, you can uncomment the block below to download and compare shapes.
raw_online = ds_online.datasets[0].raw
raw_offline = ds_offline.datasets[0].raw
print("online shape:", raw_online.get_data().shape)
print("offline shape:", raw_offline.get_data().shape)
print("shapes equal:", raw_online.get_data().shape == raw_offline.get_data().shape)
online shape: (129, 40800)
offline shape: (129, 40800)
shapes equal: True
Step 4.1: Comparing Descriptions, Online vs. Offline Data#
If you have network access, you can uncomment the block below to download and compare shapes.
description_online = ds_online.description
description_offline = ds_offline.description
print(description_offline)
print(description_online)
print("Online description shape:", description_online.shape)
print("Offline description shape:", description_offline.shape)
print("Descriptions equal:", description_online.equals(description_offline))
subject task ... seqlearning8target symbolsearch
0 NDARAB793GL3 RestingState ... available available
1 NDARAM675UR8 RestingState ... unavailable available
2 NDARBM839WR5 RestingState ... available available
3 NDARBU730PN8 RestingState ... available available
4 NDARCT974NAJ RestingState ... available available
5 NDARCW933FD5 RestingState ... available available
6 NDARCZ770BRG RestingState ... available available
7 NDARDW741HCF RestingState ... unavailable available
8 NDARDZ058NZN RestingState ... unavailable available
9 NDAREC377AU2 RestingState ... available available
10 NDAREM500WWH RestingState ... unavailable available
11 NDAREV527ZRF RestingState ... available available
12 NDAREV601CE7 RestingState ... available available
13 NDARFF070XHV RestingState ... available available
14 NDARFR108JNB RestingState ... unavailable available
15 NDARFT305CG1 RestingState ... unavailable available
16 NDARGA056TMW RestingState ... available available
17 NDARGH775KF5 RestingState ... available available
18 NDARGJ878ZP4 RestingState ... unavailable available
19 NDARHA387FPM RestingState ... available available
[20 rows x 25 columns]
subject task ... seqlearning8target symbolsearch
0 NDARAB793GL3 RestingState ... available available
1 NDARAM675UR8 RestingState ... unavailable available
2 NDARBM839WR5 RestingState ... available available
3 NDARBU730PN8 RestingState ... available available
4 NDARCT974NAJ RestingState ... available available
5 NDARCW933FD5 RestingState ... available available
6 NDARCZ770BRG RestingState ... available available
7 NDARDW741HCF RestingState ... unavailable available
8 NDARDZ058NZN RestingState ... unavailable available
9 NDAREC377AU2 RestingState ... available available
10 NDAREM500WWH RestingState ... unavailable available
11 NDAREV527ZRF RestingState ... available available
12 NDAREV601CE7 RestingState ... available available
13 NDARFF070XHV RestingState ... available available
14 NDARFR108JNB RestingState ... unavailable available
15 NDARFT305CG1 RestingState ... unavailable available
16 NDARGA056TMW RestingState ... available available
17 NDARGH775KF5 RestingState ... available available
18 NDARGJ878ZP4 RestingState ... unavailable available
19 NDARHA387FPM RestingState ... available available
[20 rows x 25 columns]
Online description shape: (20, 25)
Offline description shape: (20, 25)
Descriptions equal: True
Notes and troubleshooting#
Working offline selects recordings by parsing BIDS filenames and directory structure. Some DB-only fields are unavailable; entity filters (subject, session, task, run) usually suffice.
If you encounter issues, please open a GitHub issue so we can discuss.
Total running time of the script: (1 minutes 14.010 seconds)
Estimated memory usage: 1023 MB