EEG2025 Challenge 1 Baseline (CCD)#

Estimated reading time:11 minutes

Difficulty 3 | Runtime: 3-6m | Compute: CPU (GPU Recommended)

Challenge 1 of the EEG2025 Foundation Challenge asks you to decode a trial-level cognitive decision from EEG: in the contrastChangeDetection (CCD) task subjects watch two flickering striped discs, one disc’s contrast slowly ramps up, and the subject presses left or right to report which one. The data come from the Healthy Brain Network release (HBN; Alexander et al. 2017) served through NEMAR [Delorme et al., 2022] and shipped via EEGChallengeDataset as 100 Hz BDFs (downsampled, 0.5-50 Hz pass-band; Cisotto & Chicco 2024). This starter kit walks through the four steps every Challenge 1 entry has to clear: load the CCD recordings, carve out a stimulus-locked window, train a small Braindecode CNN baseline [Schirrmeister et al., 2017], and ship one figure that ties the trial structure, the windowed signal, and the per-fold accuracy together (Aristimunha et al. 2025, doi:10.48550/arXiv.2506.19141). The deliverable is one (n_channels, n_samples) = (129, 200) window contract and one three-panel figure ready to drop into your submission.

So how far above chance can a small CNN push CCD decoding on the mini release? Keywords: EEG2025, challenge, transfer

Learning objectives#

EEG2025 Competition Notes#

  • Community & Support. Join the official EEG2025 Discord for task clarifications and to find teammates.

  • Submission Artifacts. A complete submission requires: 1. Model Weights. The state_dict of your trained model. 2. Prediction CSV. A file containing your model’s predictions on the

    held-out test set.

    1. Reproducibility Report. A short document (PDF or Markdown) detailing your architecture, training regime, and hardware.

Reproducibility Checklist#

  • [ ] Use a fixed random seed (e.g., SEED = 2025).

  • [ ] Ensure no subject leakage between folds.

  • [ ] Report both mean accuracy and standard deviation across folds.

  • [ ] Specify your hardware (CPU/GPU) and total training time.

Requirements#

Setup, seeds, cache, and device. np.random.seed keeps the synthetic fallback deterministic; the warning filter silences a pandas FutureWarning raised by the metadata catalog inside the constructor.

import os
import warnings
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
from sklearn.model_selection import KFold

from eegdash.viz import use_eegdash_style

use_eegdash_style()
warnings.simplefilter("ignore", category=FutureWarning)
SEED = 2025
np.random.seed(SEED)
torch.manual_seed(SEED)
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
cache_dir = Path(
    os.environ.get("EEGDASH_CACHE_DIR", str(Path.home() / ".eegdash_cache"))
)
cache_dir.mkdir(parents=True, exist_ok=True)
print(f"device={DEVICE} | seed={SEED} | cache={cache_dir}")
device=cpu | seed=2025 | cache=/home/runner/eegdash_cache

Step 1. The CCD task and the input/output contract#

The trial structure is fixed: a baseline period of flickering discs, a stimulus cue when one disc’s contrast ramps up, the subject’s button press, and a feedback face. Challenge 1 fixes the input/output contract so submissions are comparable:

  • input: X of shape (batch, n_chans=129, n_samples=200), stimulus-locked, +0.5 s .. +2.5 s after the stim anchor, sampled at 100 Hz (2 s window). The 129th channel is the reference channel.

  • output: y is the trial-level decision the model decodes. The official challenge target is the response time from stimulus onset; for this starter kit we frame a binary “fast vs slow response” decision so the headline number is one accuracy figure (chance = 0.5) instead of a regression metric. Swap in rt_from_stimulus to match the leaderboard exactly.

N_CHANS, N_SAMPLES, SFREQ = 129, 200, 100.0
SHIFT_AFTER_STIM = 0.5  # seconds: window starts +0.5 s after the stim anchor
WINDOW_LEN = 2.0  # seconds: 2 s window -> 200 samples at 100 Hz
TASK = "contrastChangeDetection"
RELEASE = "R5"
pd.Series(
    {
        "n_chans": N_CHANS,
        "n_samples": N_SAMPLES,
        "sfreq (Hz)": SFREQ,
        "shift_after_stim (s)": SHIFT_AFTER_STIM,
        "window_len (s)": WINDOW_LEN,
        "task": TASK,
        "release": RELEASE,
    },
    name="value",
).to_frame()
value
n_chans 129
n_samples 200
sfreq (Hz) 100.0
shift_after_stim (s) 0.5
window_len (s) 2.0
task contrastChangeDetection
release R5


Step 2. Two paths: real CCD data, or a synthetic fallback#

Run. The full Challenge 1 pipeline pulls real CCD recordings from the EEG2025 mini bucket; that path needs network and ~80 MB per mini subject. To keep the rendered tutorial reproducible without network we also synthesize the windowed shape (n_windows, 129, 200) directly. The synthetic fallback keeps the same tensor contract and the same label distribution so the rest of the tutorial reads identically. Set the EEGDASH_CHALLENGE_REAL_DATA=1 env var to flip the switch and use the actual loader.

USE_REAL_DATA = os.environ.get("EEGDASH_CHALLENGE_REAL_DATA", "0") == "1"
print(f"USE_REAL_DATA={USE_REAL_DATA} (set EEGDASH_CHALLENGE_REAL_DATA=1 to flip)")
USE_REAL_DATA=False (set EEGDASH_CHALLENGE_REAL_DATA=1 to flip)

Predict. Before reading the next cells: with a binary balanced label (fast vs slow response) the chance level is 0.5. How much above chance do you expect a small CNN to land after a few epochs on the mini release? The Foundation Challenge baseline lifts CCD accuracy a few points above chance per fold; the EEG2025 winners cleared that bar by larger margins [Aristimunha et al., 2025].

Step 3. Build the windowed dataset#

The real-data branch matches the original starter kit: load the CCD records via EEGChallengeDataset, annotate trial onsets with annotate_trials_with_target(), and carve stimulus-locked windows with braindecode.preprocessing.create_windows_from_events(). The synthetic branch stamps the same shape and metadata directly so the downstream split / training code is unchanged.

N_SUBJECTS_SYNTH = 8
N_PER_SUBJECT_SYNTH = 60


def build_synthetic_windows(
    n_subjects: int = N_SUBJECTS_SYNTH,
    n_per_subject: int = N_PER_SUBJECT_SYNTH,
    seed: int = SEED,
):
    """Return ``(X, y, meta)`` for a synthetic CCD-shaped cohort.

    The signal carries a small label-correlated tone (4 Hz for "slow"
    responders, 10 Hz for "fast") on a few channels only, plus heavy
    additive Gaussian noise and a per-subject phase shuffle. The
    signal-to-noise ratio is tuned so the cross-subject baseline lands
    a few points above chance, the same regime the real CCD windows
    produce on the mini release.
    """
    rng = np.random.default_rng(seed)
    t = np.arange(N_SAMPLES) / SFREQ
    # Restrict the label-correlated tone to a small posterior cluster,
    # not all 129 channels, so a generic CNN cannot solve the task by
    # globally averaging across channels.
    informative_chans = rng.choice(N_CHANS, size=8, replace=False)
    rows: list[dict] = []
    X_list: list[np.ndarray] = []
    for subj in range(n_subjects):
        labels = rng.integers(0, 2, size=n_per_subject)
        # Per-subject phase + amplitude jitter so the tone shifts across
        # subjects, breaking the cross-subject decoder more than a
        # within-subject one would.
        phase_subj = float(rng.uniform(0.0, 2 * np.pi))
        amp_subj = float(rng.uniform(0.18, 0.32))
        for w_idx, lab in enumerate(labels):
            base = rng.standard_normal((N_CHANS, N_SAMPLES)).astype(np.float32) * 1.0
            freq = 10.0 if lab == 1 else 4.0
            tone = (amp_subj * np.sin(2 * np.pi * freq * t + phase_subj)).astype(
                np.float32
            )
            base[informative_chans, :] += tone[None, :]
            X_list.append(base)
            rows.append(
                {
                    "sample_id": f"ccd_s{subj:02d}_w{w_idx:03d}",
                    "subject": f"sub-{subj:02d}",
                    "task": TASK,
                    "label": int(lab),
                    "release": RELEASE,
                }
            )
    X = np.stack(X_list).astype(np.float32)
    y = np.asarray([r["label"] for r in rows], dtype=np.int64)
    meta = pd.DataFrame(rows)
    return X, y, meta


def build_real_windows():
    """Real-data branch: ``EEGChallengeDataset`` + braindecode windowing.

    Returns
    -------
    (X, y, meta) : same shape contract as :func:`build_synthetic_windows`.

    The label is a binary "fast vs slow" indicator computed by
    median-splitting ``rt_from_stimulus`` on the training subjects,
    matching the synthetic fallback so the rest of the tutorial does
    not branch on data source.

    """
    # Imports kept inside the function so the synthetic path does not
    # pay the braindecode-import / S3-handshake cost when the real
    # branch is off.
    from braindecode.preprocessing import (
        Preprocessor,
        create_windows_from_events,
        preprocess,
    )

    from eegdash.dataset import EEGChallengeDataset
    from eegdash.hbn.windows import (
        add_aux_anchors,
        add_extras_columns,
        annotate_trials_with_target,
        keep_only_recordings_with,
    )

    ds = EEGChallengeDataset(
        task=TASK,
        release=RELEASE,
        cache_dir=str(cache_dir),
        mini=True,
    )
    preprocess(
        ds,
        [
            Preprocessor(
                annotate_trials_with_target,
                target_field="rt_from_stimulus",
                epoch_length=WINDOW_LEN,
                require_stimulus=True,
                require_response=True,
                apply_on_array=False,
            ),
            Preprocessor(add_aux_anchors, apply_on_array=False),
        ],
        n_jobs=1,
    )
    anchor = "stimulus_anchor"
    ds = keep_only_recordings_with(anchor, ds)
    windows = create_windows_from_events(
        ds,
        mapping={anchor: 0},
        trial_start_offset_samples=int(SHIFT_AFTER_STIM * SFREQ),
        trial_stop_offset_samples=int((SHIFT_AFTER_STIM + WINDOW_LEN) * SFREQ),
        window_size_samples=N_SAMPLES,
        window_stride_samples=int(SFREQ),
        preload=True,
    )
    windows = add_extras_columns(
        windows,
        ds,
        desc=anchor,
        keys=("target", "rt_from_stimulus", "stimulus_onset", "response_onset"),
    )
    meta = windows.get_metadata().reset_index(drop=True)
    rt = meta["rt_from_stimulus"].astype(float).to_numpy()
    rt_median = float(np.nanmedian(rt))
    meta["label"] = (rt < rt_median).astype(np.int64)
    # Stack windows + labels.
    X_list, y_list = [], []
    for i in range(len(windows)):
        item = windows[i]
        X_list.append(np.asarray(item[0], dtype=np.float32))
        y_list.append(int(meta.loc[i, "label"]))
    X = np.stack(X_list).astype(np.float32)
    y = np.asarray(y_list, dtype=np.int64)
    return X, y, meta


if USE_REAL_DATA:
    print("loading real CCD windows from EEGChallengeDataset (R5 mini) ...")
    X_all, y_all, meta_all = build_real_windows()
else:
    print("synthesising CCD-shaped windows for offline reproducibility ...")
    X_all, y_all, meta_all = build_synthetic_windows()
print(f"X={X_all.shape} | y={y_all.shape} | n_subjects={meta_all['subject'].nunique()}")
synthesising CCD-shaped windows for offline reproducibility ...
X=(480, 129, 200) | y=(480,) | n_subjects=8

Investigate. X carries one row per stimulus-locked window with the canonical Challenge 1 shape; y is the binary fast/slow target we decode below; meta keeps the subject id so the cross-subject split stays auditable. Using a real-data label like the official regression target only changes the loss and the metric, not the tensor contract.

Step 4. Cross-subject split with leakage guard#

Run. Splitting trials at random would let the same subject appear in train and test, the canonical EEG leakage failure mode (Pernet et al. 2019, EEG-BIDS). We split subjects into folds with sklearn.model_selection.KFold over the unique subject ids and assert no overlap.

N_FOLDS = 5
unique_subjects = np.array(sorted(meta_all["subject"].unique()))
kf = KFold(n_splits=min(N_FOLDS, len(unique_subjects)), shuffle=True, random_state=SEED)
fold_assignments: list[tuple[np.ndarray, np.ndarray]] = []
for train_idx_subj, test_idx_subj in kf.split(unique_subjects):
    train_subj = set(unique_subjects[train_idx_subj])
    test_subj = set(unique_subjects[test_idx_subj])
    assert train_subj.isdisjoint(test_subj), "cross-subject split leaked"
    train_mask = meta_all["subject"].isin(train_subj).to_numpy()
    test_mask = meta_all["subject"].isin(test_subj).to_numpy()
    fold_assignments.append((train_mask, test_mask))
print(
    f"n_folds={len(fold_assignments)} | subjects per test fold ~ {len(unique_subjects) // N_FOLDS}"
)
n_folds=5 | subjects per test fold ~ 1

Step 5. Build the EEGNeX baseline#

Run. braindecode.models.EEGNeX is a small temporal-then- spatial CNN sized for the Challenge 1 input contract. We use 2 output units for the binary fast/slow head; swap to 1 unit (and an MSE loss) to regress rt_from_stimulus against the official metric.

from braindecode.models import EEGNeX
from torch import nn


def make_baseline_model():
    return EEGNeX(
        n_chans=N_CHANS,
        n_outputs=2,
        n_times=N_SAMPLES,
        sfreq=int(SFREQ),
    ).to(DEVICE)


def train_one_fold(
    model, X, y, train_mask, test_mask, *, n_epochs=4, lr=1e-3, batch=64
):
    """Tiny AdamW loop: deterministic enough for a tutorial print."""
    model.train()
    opt = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=1e-5)
    crit = nn.CrossEntropyLoss()
    Xt = torch.as_tensor(X[train_mask], dtype=torch.float32, device=DEVICE)
    yt = torch.as_tensor(y[train_mask], dtype=torch.long, device=DEVICE)
    losses: list[float] = []
    for _epoch in range(n_epochs):
        idx = torch.randperm(len(Xt), device=DEVICE)
        epoch_loss = 0.0
        for i in range(0, len(Xt), batch):
            sel = idx[i : i + batch]
            opt.zero_grad(set_to_none=True)
            loss = crit(model(Xt[sel]), yt[sel])
            loss.backward()
            opt.step()
            epoch_loss += float(loss.item()) * len(sel)
        losses.append(epoch_loss / max(len(Xt), 1))
    # Evaluate on the held-out subjects.
    model.eval()
    with torch.no_grad():
        Xte = torch.as_tensor(X[test_mask], dtype=torch.float32, device=DEVICE)
        yte = torch.as_tensor(y[test_mask], dtype=torch.long, device=DEVICE)
        acc = float((model(Xte).argmax(dim=1) == yte).float().mean().item())
    return acc, losses

Step 6. Train the baseline and collect per-fold accuracy#

Run. Five folds, six epochs each: a budget that stays under a minute on CPU for the synthetic path while still showing the noise floor. Real-data runs with a serious budget should swap in early stopping and 30+ epochs against a held-out validation set.

fold_accuracies: list[float] = []
for f, (tr_mask, te_mask) in enumerate(fold_assignments):
    model = make_baseline_model()
    acc, _losses = train_one_fold(model, X_all, y_all, tr_mask, te_mask, n_epochs=6)
    fold_accuracies.append(acc)
    print(f"fold {f + 1}/{len(fold_assignments)}: test_acc={acc:.3f}")
mean_acc = float(np.mean(fold_accuracies))
std_acc = float(np.std(fold_accuracies)) if len(fold_accuracies) > 1 else 0.0
print(f"baseline accuracy: mean={mean_acc:.3f} | std={std_acc:.3f} | chance=0.50")
fold 1/5: test_acc=0.533
fold 2/5: test_acc=0.525
fold 3/5: test_acc=0.567
fold 4/5: test_acc=0.517
fold 5/5: test_acc=0.533
baseline accuracy: mean=0.535 | std=0.017 | chance=0.50

Step 7. Render the three-panel starter-kit figure#

Investigate. Panel 1 is the trial schematic with the decoder window highlighted; panel 2 shows one CCD window so the (129, 200) tensor contract is visible at the same scale as the data; panel 3 is the per-fold accuracy with mean +/- std and the chance line. The drawing code lives in a sibling _challenge_1_figure module so this tutorial cell stays one import plus one function call.

from _challenge_1_figure import draw_challenge_1_figure

# Build the inputs for the trial schematic from one synthetic trial.
_t_long = np.arange(int(6.0 * SFREQ)) / SFREQ
_rng = np.random.default_rng(SEED)
_trace = 0.6 * np.sin(2 * np.pi * 4.0 * _t_long) + 0.3 * _rng.standard_normal(
    _t_long.size
)
# Add a small bump near the stimulus and a dip near the synthetic press.
_trace[int(2.0 * SFREQ) : int(2.5 * SFREQ)] += 1.2 * np.hanning(int(0.5 * SFREQ))
_trace[int(3.5 * SFREQ) : int(3.8 * SFREQ)] -= 0.9 * np.hanning(int(0.3 * SFREQ))
paradigm_schematic_data = {
    "trace": _trace,
    "sfreq": SFREQ,
    "shift_after_stim": SHIFT_AFTER_STIM,
    "window_len": WINDOW_LEN,
    "stim_time": 2.0,
    "response_time": 3.6,
}
sample_window = X_all[0]  # one (129, 200) trial.

fig = draw_challenge_1_figure(
    paradigm_schematic_data=paradigm_schematic_data,
    sample_window=sample_window,
    fold_accuracies=fold_accuracies,
    dataset="EEG2025 R5 mini",
    plot_id="tutorial_challenge_1",
    chance_level=0.5,
    n_subjects=int(meta_all["subject"].nunique()),
    task=TASK,
    sfreq=SFREQ,
)
plt.show()
one CCD trial window, baseline decoder per fold

Result, one row per condition#

The baseline lifts CCD accuracy a few points above chance per fold. With a single seed and a small mini cohort the absolute number is noisy: report mean +/- std (E5.43, E5.46) and resist the urge to read fold-to-fold lifts as effects.

print("\n| condition          | accuracy |")
print("|--------------------|----------|")
print(f"| baseline (mean)    | {mean_acc:0.3f}   |")
print(f"| baseline (std)     | {std_acc:0.3f}   |")
print("| chance (binary)    | 0.500    |")
print(
    f"folds={len(fold_accuracies)} | task={TASK} | release={RELEASE} | "
    f"window={N_CHANS}x{N_SAMPLES} | sfreq={SFREQ:.0f} Hz"
)
| condition          | accuracy |
|--------------------|----------|
| baseline (mean)    | 0.535   |
| baseline (std)     | 0.017   |
| chance (binary)    | 0.500    |
folds=5 | task=contrastChangeDetection | release=R5 | window=129x200 | sfreq=100 Hz

Step 8. Save the model weights for submission#

Run. A submission ships one state_dict plus the architecture code. We save the last fold’s weights here as a placeholder; in a real submission you train on all subjects (or use a held-out validation fold for early stopping) and ship the resulting weights.

weights_path = cache_dir / "tutorial_challenge_1_weights.pt"
torch.save(model.state_dict(), weights_path)
assert weights_path.exists(), "weights file must exist after save"
print(f"saved baseline weights -> {weights_path}")
saved baseline weights -> /home/runner/eegdash_cache/tutorial_challenge_1_weights.pt

A common mistake, and how to recover#

Run. The Challenge 1 input contract is exact: a model that expects n_chans=64 (the ShallowFBCSPNet default) raises a size-mismatch error the moment a (B, 129, 200) batch lands. We trigger the error on purpose so the failure mode is visible and the recovery (rebuild with n_chans=129) is on the page.

try:
    bad = EEGNeX(n_chans=64, n_outputs=2, n_times=N_SAMPLES, sfreq=int(SFREQ)).to(
        DEVICE
    )
    _ = bad(torch.zeros((2, N_CHANS, N_SAMPLES), device=DEVICE))
except RuntimeError as exc:
    print(f"Caught RuntimeError: {str(exc)[:120]}")
    fixed = make_baseline_model()
    print(f"Recovery: EEGNeX(n_chans={N_CHANS}, ...) -> {type(fixed).__name__}")
Caught RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x3168 and 48x2)
Recovery: EEGNeX(n_chans=129, ...) -> EEGNeX

Modify, swap the binary head for the official regression target#

Modify. The leaderboard scores a regression on rt_from_stimulus, not the binary fast/slow head we used here. To match it, change n_outputs=2 to n_outputs=1, swap torch.nn.CrossEntropyLoss for torch.nn.MSELoss, report RMSE instead of accuracy, and feed the raw response time as the target. The window contract stays the same.

print("regression head sketch:")
print("  model = EEGNeX(n_chans=129, n_outputs=1, n_times=200, sfreq=100)")
print("  loss = torch.nn.MSELoss()")
print("  metric = torch.sqrt(((preds - rt) ** 2).mean())  # RMSE in seconds")
regression head sketch:
  model = EEGNeX(n_chans=129, n_outputs=1, n_times=200, sfreq=100)
  loss = torch.nn.MSELoss()
  metric = torch.sqrt(((preds - rt) ** 2).mean())  # RMSE in seconds

Make, scale up to the full release#

Mini-project. Switch mini=True to mini=False in the real-data branch, drop the synthetic fallback, raise n_epochs to 30+, add early stopping against a held-out validation fold, and report the leaderboard metric (RMSE) instead of accuracy. The submission bundle is the architecture code plus tutorial_challenge_1_weights.pt.

Extensions#

  • replace EEGNeX with a different braindecode model (ShallowFBCSPNet, EEGConformer, Deep4Net) and re-run.

  • pre-train on RestingState first (see plot_71) and fine-tune on CCD: this is the Challenge 1 cross-task transfer angle.

  • run on five seeds and report mean +/- std per fold rather than per seed.

  • drop mini=True for the final submission so the leaderboard contract holds end-to-end.

Wrap-up#

We loaded the EEG2025 Challenge 1 CCD task on a single subject pool, carved stimulus-locked windows of shape (129, 200), split subjects into 5 folds with no leakage [Pernet et al., 2019], trained an EEGNeX baseline [Schirrmeister et al., 2017], and reported per-fold accuracy next to chance. The figure ties the trial schematic, one window, and the per-fold result on one plate so a reviewer can read the full starter-kit story without scrolling.

References#

See References for the centralized bibliography of papers cited above. Add or amend an entry once in docs/source/refs.bib; every tutorial inherits the update.

Total running time of the script: (8 minutes 9.534 seconds)