Features vs. deep learning#

A recurring question in EEG decoding is whether to engineer features — band power, common spatial patterns (CSP), Riemannian covariance — or hand the raw signal to a convolutional or transformer-style network and let it learn its own representation. The answer depends on three things: how much data you have, how stationary the signal is, and how much inductive bias you can afford to bake in.

The honest summary is that neither family dominates universally. On small, single-task, single-cohort decoding problems, well-tuned feature pipelines are often competitive with — and frequently better than — deep nets trained from scratch. On large, heterogeneous corpora and on tasks where the relevant features are not known a priori, end-to-end deep models clearly win. Schirrmeister et al. (2017) [1] remains the clearest demonstration that ConvNets can match expert feature pipelines on motor-imagery decoding, but their result holds at scale and with careful regularisation.

When handcrafted features tend to win#

Pick features when at least two of the following are true:

You have less than ~50 subjects. A logistic regression or SVM on 10 well-chosen features fits with minimal regularisation. A convolutional network has hundreds of thousands of parameters and needs orders of magnitude more data to avoid memorising subject identity (see Leakage and evaluation).
The relevant rhythm is known. If you are decoding alpha-band modulation, log-power in 8–13 Hz is a near-optimal feature; a deep net will rediscover it in the best case and miss it in the worst.
You need interpretability. Feature pipelines come with named, reportable inputs (“alpha at Pz”, “central-mu lateralisation”). Deep features come with saliency maps that are notoriously hard to read.
You need cross-dataset transfer. Riemannian and CSP-based pipelines have well-understood invariances (channel permutation, reference change). A vanilla ConvNet trained on one montage can fail on another for trivial reasons.
You are CPU-bound. Feature pipelines fit on a laptop in seconds.

The How do I turn EEG windows into a band-power feature matrix? tutorial shows the simplest version of this argument: build a band-power table, fit a logistic regression, and read off the result.

When deep learning tends to win#

Pick a deep model when at least two of the following are true:

You have hundreds of subjects. Foundation-model-style deep decoders need large, diverse training sets to learn subject-invariant filters. EEGDash’s HBN-derived corpora make this regime accessible for the first time.
The relevant feature is not known. Tasks like cognitive workload, emotional state, or fatigue rarely have a single canonical band; an end-to-end model can carve out a useful representation that no human has named.
You can afford to combine multiple datasets. Deep models gain more from data diversity than from data volume; one large dataset is worth less than three medium ones with different montages.
You can apply augmentation. Mixup, channel dropout, time masking, and frequency masking close most of the gap between deep nets and hand-tuned features on small data; without augmentation, the deep model is usually under-regularised.
You will fine-tune downstream. A pre-trained deep model is a reusable asset; a hand-tuned feature pipeline is bespoke per task.

The How do I train a leakage-safe baseline classifier on EEG? tutorial trains a small braindecode ConvNet on the same data the feature tutorials use, so that you can compare the two pipelines head to head.

The “feature first” rule#

A reliable practical workflow is to make the feature pipeline mandatory. Before you commit to a deep architecture for a new dataset:

Build a one-page feature pipeline (band power, ratios, simple covariance summary). The How do I turn EEG windows into a band-power feature matrix? recipe is enough.
Fit a logistic regression or shallow tree on top.
Use exactly the same split (preferably subject-aware; see Leakage and evaluation).
Record the score and the variance across folds.

Now you have a baseline. Anything that costs ten times more compute should outperform it on more than just the headline number — it should beat it under cross-subject evaluation, with smaller variance, and on held-out cohorts. If it does not, the feature pipeline is your deliverable. You will save weeks of GPU time, and your paper will be honest.

The How do classical EEG markers compose on top of one Welch PSD? and How do I push EEGDash features through a scikit-learn Pipeline? tutorials extend the feature baseline to richer models (gradient boosting, full scikit-learn pipelines) without leaving the feature-engineering regime.

What the literature says#

Three observations recur across the EEG-deep-learning reviews:

Architecture matters less than regularisation and split discipline. Roy et al. (2019) survey 156 deep EEG papers and find no consistent architecture winner; what changes results is whether the split respected subject identity (it often did not).
At small N, ConvNets and feature pipelines are within noise. Schirrmeister et al. (2017) [1] explicitly tune their ConvNet to match FBCSP on motor imagery; both reach high accuracy and the gap is dataset-dependent.
Subject-invariant claims require evidence, not only architecture language. A model labelled “subject-invariant” must be evaluated cross-subject on a held-out cohort, which loops back to Leakage and evaluation.

The takeaway is not “always use features” or “always use deep nets”. It is that the choice is an experiment in itself, and the only way to make it honestly is to run both pipelines under the same evaluation.