EEGdash›NeMAR›NM000195

Iss. 195 · 12 subjects · 360 recordings · CC-BY-4.0

Dataset Brief · Mixture of LLP and EM for a visual matrix speller (ERP) datas…

NM000195: eeg dataset, 12 subjects#

Mixture of LLP and EM for a visual matrix speller (ERP) dataset from

Citation: David Hübner, Thibault Verhoeven, Klaus-Robert Müller, Pieter-Jan Kindermans, Michael Tangermann (2018). Mixture of LLP and EM for a visual matrix speller (ERP) dataset from.

12-participant EEG dataset — Mixture of LLP and EM for a visual matrix speller (ERP) dataset from.

Data & curation David Hübner · Thibault Verhoeven · Klaus-Robert Müller · Pieter-Jan Kindermans · Michael Tangermann
Year 2018 · Distributed via NeMAR
Funding BrainLinks-BrainTools Cluster of Excellence funded by the German Research Foundation (DFG), grant number EXC 1086 · bwHPC initiative, grant INST 39/963-1 FUGG · + 5 more

EEG · 31 ch1000 HzBIDS 1.9.0Task · p3003 sessionsHealthyVisualAttention

Layer 01Study

What was asked

Hypothesis, independent & dependent variables, paradigm, cohort, and the editorial caveats around what the recordings can and cannot answer.

Layer 02Signal · BIDS

What was recorded

Sidecars, channels & electrodes, coordinate system, event semantics, and quality stats from the NEMAR pipeline when available.

Layer 03Training · ML

What you can train on

Recommended access modes — MNE Raw, braindecode windows, PyTorch DataLoader — plus the targets the metadata makes addressable.

§ 01Access · Get started

Quickstart#

Get Started

Install

pip install eegdash

Access the data

from eegdash.dataset import NM000195

dataset = NM000195(cache_dir="./data")
# Get the raw object of the first recording
raw = dataset.datasets[0].raw
print(raw.info)

Query & Filter

Filter by subject

dataset = NM000195(cache_dir="./data", subject="01")

Advanced query

dataset = NM000195(
    cache_dir="./data",
    query={"subject": {"$in": ["01", "02"]}},
)

Iterate recordings

for rec in dataset:
    print(rec.subject, rec.raw.info['sfreq'])

Cite This Dataset

If you use this dataset in your research, please cite the original authors.

BibTeX

@dataset{nm000195,
  title = {Mixture of LLP and EM for a visual matrix speller (ERP) dataset from},
  author = {David Hübner and Thibault Verhoeven and Klaus-Robert Müller and Pieter-Jan Kindermans and Michael Tangermann},
}

§ 02Study · The README

About This Dataset#

Mixture of LLP and EM for a visual matrix speller (ERP) dataset from Hübner et al 2018 [1]_.

Schema: HED 8.4.0 | Browse: https://www.hedtags.org/hed-schema-browser

Mixture of LLP and EM for a visual matrix speller (ERP) dataset from

Target

├─ Sensory-event
├─ Experimental-stimulus

View full README

Mixture of LLP and EM for a visual matrix speller (ERP) dataset from

Target

     ├─ Sensory-event
     ├─ Experimental-stimulus
     ├─ Visual-presentation
     └─ Target

NonTarget

├─ Sensory-event
├─ Experimental-stimulus
├─ Visual-presentation
└─ Non-target

Paradigm-Specific Parameters

Detected paradigm: p300
Number of targets: 46
Inter-stimulus interval: 150.0 ms
Stimulus onset asynchrony: 250.0 ms

Data Structure

Trials: 35
Blocks per session: 3
Trials context: 35 characters per block (one trial = spelling one character), 3 blocks per session (one block per unsupervised algorithm: EM, LLP, MIX in pseudo-randomized order)

Preprocessing

Data state: raw
Preprocessing applied: False

Signal Processing

Classifiers: EM (Expectation-Maximization), LLP (Learning from Label Proportions), MIX (mixture of EM and LLP), shrinkage-regularized LDA (Ledoit-Wolf), Bayesian least square regression
Feature extraction: mean amplitudes in six temporal intervals per channel

Cross-Validation

Method: leave-one-character-out for offline analysis; online sequential testing
Evaluation type: online, within_session, unsupervised_learning

Performance (Original Study)

Accuracy: 80.0%
Mix Auc After 7 Chars: 80.0
Time To 80 Accuracy Seconds: 168.0
Epochs To 80 Accuracy: 476.0
Characters To 80 Accuracy: 7.0

BCI Application

Applications: speller, communication
Environment: controlled laboratory
Online feedback: True

Tags

Pathology: Healthy
Modality: Visual
Type: Research

Documentation

DOI: 10.5281/zenodo.192684
Associated paper DOI: 10.1109/MCI.2018.2807039
License: CC-BY-4.0
Investigators: David Hübner, Thibault Verhoeven, Klaus-Robert Müller, Pieter-Jan Kindermans, Michael Tangermann
Contact: p.kindermans@tu-berlin.de; michael.tangermann@blbt.uni-freiburg.de
Institution: University of Freiburg
Department: Brain State Decoding Lab
Address: Brain State Decoding Lab, University of Freiburg, Freiburg, GERMANY
Country: DE
Repository: Zenodo
Data URL: https://zenodo.org/record/5831879
Publication year: 2018
Funding: BrainLinks-BrainTools Cluster of Excellence funded by the German Research Foundation (DFG), grant number EXC 1086; bwHPC initiative, grant INST 39/963-1 FUGG; European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement NO 657679; Special Research Fund of Ghent University; DFG (DFG SPP 1527, MU 987/14-1); Federal Ministry for Education and Research (BMBF No. 2017-0-00451); Brain Korea 21 Plus Program by the Institute for Information & Communications Technology Promotion (IITP) grant (1IS14013A) funded by the Korean government
Ethics approval: University Medical Center Freiburg ethics committee
Keywords: unsupervised learning, brain-computer interface, event-related potentials, P300 speller, expectation-maximization, learning from label proportions, MIX method, EEG

Abstract

One of the fundamental challenges in brain-computer interfaces (BCIs) is to tune a brain signal decoder to reliably detect a user’s intention. While information about the decoder can partially be transferred between subjects or sessions, optimal decoding performance can only be reached with novel data from the current session. Thus, it is preferable to learn from unlabeled data gained from the actual usage of the BCI application instead of conducting a calibration recording prior to BCI usage. We review such unsupervised machine learning methods for BCIs based on event-related potentials of the electroencephalogram. We present results of an online study with twelve healthy participants controlling a visual speller. Online performance is reported for three completely unsupervised learning methods: (1) learning from label proportions, (2) an expectation-maximization approach and (3) MIX, which combines the strengths of the two other methods. After a short ramp-up, we observed that the MIX method not only defeats its two unsupervised competitors but even performs on par with a state-of-the-art regularized linear discriminant analysis trained on the same number of data points and with full label access. With this online study, we deliver the best possible proof in BCI that an unsupervised decoding method can in practice render a supervised method unnecessary. This is possible despite skipping the calibration, without losing much performance and with the prospect of continuous improvement over a session. Thus, our findings pave the way for a transition from supervised to unsupervised learning methods in BCIs based on event-related potentials.

Methodology

Online study comparing three unsupervised learning methods (EM, LLP, MIX) for P300 speller. Twelve healthy volunteers (8 female, 4 male, mean age 26, range 19-31 years) participated in a single session each. Subjects spelled the German sentence ‘Franzy jagt im Taxi quer durch das’ (35 characters) in three blocks, each using a different unsupervised algorithm in pseudo-randomized order. Each trial (spelling one character) consisted of 68 highlighting events with 250 ms SOA and 100 ms stimulus duration (ISI=150 ms). The speller used a modified 6x6 grid with 36 normal characters extended with 10 # symbols as visual blanks (total 46 symbols). Two interleaved highlighting sequences were used: S1 highlighted only normal characters, S2 highlighted both normal characters and # symbols, creating different known target-to-non-target ratios to enable learning from label proportions. Highlighting consisted of brightness enhancement, rotation, enlargement and trichromatic grid overlay. Classifiers were randomly initialized at block start and updated after each trial. No labeled data was provided during online session. Participants sat 80 cm from a 24-inch screen. EEG was recorded from 31 passive Ag/AgCl electrodes (EasyCap) placed according to extended 10-20 system, with impedances kept below 20 kOhm. Signals were recorded and amplified by BrainAmp DC at 1 kHz sampling rate using BBCI toolbox in Matlab. Data was bandpass filtered (0.5-8 Hz, 3rd order Chebyshev Type II), downsampled to 100 Hz, epoched to [-200, 700] ms relative to stimulus onset, and baseline corrected using [-200, 0] ms interval. Features were mean amplitudes of six time intervals ([50-120], [121-200], [201-280], [281-380], [381-530], [531-700] ms post-stimulus) per channel. No artifact rejection was applied; participants were instructed to avoid artifacts. Performance metrics: spelling accuracy and AUC for target vs. non-target discrimination. Results showed MIX method achieved ~80% accuracy after ~7 characters (168 seconds, 476 epochs) and performed comparably to supervised regularized LDA trained on same amount of labeled data after 10+ characters. Ethics approval was obtained from University Medical Center Freiburg. Participants were compensated 8 Euros per hour for the ~3 hour session (including EEG setup).

References

Huebner, D., Verhoeven, T., Mueller, K. R., Kindermans, P. J., & Tangermann, M. (2018). Unsupervised learning for brain-computer interfaces based on event-related potentials: Review and online comparison [research frontier]. IEEE Computational Intelligence Magazine, 13(2), 66-77. https://doi.org/10.1109/MCI.2018.2807039 .. versionadded:: 0.4.5 Appelhoff, S., Sanderson, M., Brooks, T., Vliet, M., Quentin, R., Holdgraf, C., Chaumon, M., Mikulan, E., Tavabi, K., Hochenberger, R., Welke, D., Brunner, C., Rockhill, A., Larson, E., Gramfort, A. and Jas, M. (2019). MNE-BIDS: Organizing electrophysiological data into the BIDS format and facilitating their analysis. Journal of Open Source Software 4: (1896). https://doi.org/10.21105/joss.01896 Pernet, C. R., Appelhoff, S., Gorgolewski, K. J., Flandin, G., Phillips, C., Delorme, A., Oostenveld, R. (2019). EEG-BIDS, an extension to the brain imaging data structure for electroencephalography. Scientific Data, 6, 103. https://doi.org/10.1038/s41597-019-0104-8 Generated by MOABB 1.5.0 (Mother of All BCI Benchmarks) NeuroTechX/moabb

§ 03Cohort · Participants

Cohort#

Dataset Statistics#

Age distribution by gender (n=12, range 26–26 yr, mean 26.0 yr)

25

Other · 12

Channel counts: 31 ch (n=360 recordings)

Sampling frequencies: 1000.0 Hz (n=360 recordings)

Total recording duration: 15 h 19 min

§ 04Signal · Electrodes & trace

Signal · Electrodes & live trace#

Fig. 01 Signal & montage 31 ch · EEG · 1000 Hz · 12 subjects, 360 recordings

Live trace viewer — sub-12 · ses-2 · task-p300 · run-7

Showing one representative recording out of 12 subjects and 360 recordings in this dataset. Browse the full set on OpenNeuro; drop any other _eeg.{set,edf,bdf,vhdr} file onto the viewer (or pass ?eeg=<url>) to inspect it.

Electrode layout — EEG · 31 sensors — 31 channels

NEMAR Processing Statistics#

The plots below are generated by NEMAR’s automated EEG pipeline. The histogram shows pipeline success for data cleaning and ICA decomposition, the percentage of data frames and EEG channels retained after artefact removal, line noise per channel (RMS, dB), and the age/gender distribution of participants.

HED event descriptors word cloud

§ 05Manifest · BIDS tree

Manifest#

File Explorer#

Browse the BIDS file structure of this dataset. Records are fetched on demand from the EEGDash catalog the first time you open the explorer.

Recordings—

Files—

Subjects—

Modalities—

Click to load file structure…

§ 06API · Programmatic access

API Reference#

Signature

eegdash.dataset

class

eegdash.dataset.NM000195(cache_dir, query=None, s3_bucket=None, **kwargs)

Bases: EEGDashDataset

Author (year)Hubner2018

Canonical—

Importable asNM000195 · Hubner2018

Sourceeegdash/dataset/registry.py · [source ↗]

class eegdash.dataset.NM000195(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#

Mixture of LLP and EM for a visual matrix speller (ERP) dataset from

Study:: nm000195 (NeMAR)
Author (year):: Hubner2018
Canonical:: —

Also importable as: NM000195, Hubner2018.

Modality: eeg; Experiment type: Attention; Subject type: Healthy. Subjects: 12; recordings: 360; tasks: 1.

Parameters:

cache_dir (str | Path) – Directory where data are cached locally.
query (dict | None) – Additional MongoDB-style filters to AND with the dataset selection. Must not contain the key dataset.
s3_bucket (str | None) – Base S3 bucket used to locate the data.
**kwargs (dict) – Additional keyword arguments forwarded to EEGDashDataset.

data_dir#

Local dataset cache directory (cache_dir / dataset_id).

Type:: Path

query#

Merged query with the dataset filter applied.

Type:: dict

records#

Metadata records used to build the dataset, if pre-fetched.

Type:: list[dict] | None

Notes

Each item is a recording; recording-level metadata are available via dataset.description. query supports MongoDB-style filters on fields in ALLOWED_QUERY_FIELDS and is combined with the dataset filter. Dataset-specific caveats are not provided in the summary metadata.

References

OpenNeuro dataset: https://openneuro.org/datasets/nm000195 NeMAR dataset: https://nemar.org/dataexplorer/detail?dataset_id=nm000195

Examples

>>> from eegdash.dataset import NM000195
>>> dataset = NM000195(cache_dir="./data")
>>> recording = dataset[0]
>>> raw = recording.load()

__init__(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#

save(path: str, overwrite: bool = False, offset: int = 0)[source]#

Save datasets to files by creating one subdirectory for each dataset:

path/
    0/
        0-raw.fif | 0-epo.fif
        description.json
        raw_preproc_kwargs.json (if raws were preprocessed)
        window_kwargs.json (if this is a windowed dataset)
        window_preproc_kwargs.json  (if windows were preprocessed)
        target_name.json (if target_name is not None and dataset is raw)
    1/
        1-raw.fif | 1-epo.fif
        description.json
        raw_preproc_kwargs.json (if raws were preprocessed)
        window_kwargs.json (if this is a windowed dataset)
        window_preproc_kwargs.json  (if windows were preprocessed)
        target_name.json (if target_name is not None and dataset is raw)

Parameters:

path (str) –

Directory in which subdirectories are created to store
-raw.fif | -epo.fif and .json files to.
overwrite (bool) – Whether to delete old subdirectories that will be saved to in this call.
offset (int) – If provided, the integer is added to the id of the dataset in the concat. This is useful in the setting of very large datasets, where one dataset has to be processed and saved at a time to account for its original position.

Access modesMNE → braindecode → PyTorch → ML

.rawMNE Raw object — standard tools (filter, epoch, ICA, plot_psd).mne

BaseConcatDatasetEach record is a lazy BaseDataset from braindecode — windowed via create_windows_from_events.braindecode

DataLoaderWraps the windowed dataset into a PyTorch DataLoader; supports parallel workers and on-the-fly augmentations.pytorch

Zarr cacheOptional braindecode Zarr mirror for fast resume; persisted to cache_dir.zarr

Hugging FaceNo per-dataset mirror published yet — browse the EEGDash org listing for sibling datasets. See the datasets loader API.huggingface

Croissant 1.0Machine-readable JSON-LD descriptor — NM000195.croissant.json (MLCommons schema, ingestible by PyTorch / TensorFlow / JAX).mlcommons

Examples using EEGDashcurated · start here

Find datasets with the EEGDash APIQuery the catalogue, filter by task or modality, list candidates.

Load one EEG recordingResolve a single record to an MNE Raw with channels and events.

EEG recording to PyTorch DataLoaderWrap braindecode windows in a DataLoader for model training.

Preprocess EEG and create windowsFilter, resample, epoch — and persist the windowed dataset.

Save and reload prepared dataCache a windowed dataset to disk and reattach it without recompute.

Download a dataset locallyPrefetch BIDS files to a local cache and validate the layout.

Swap any load_dataset(...) call for nm000195 to reproduce the tutorial on this dataset.

Citation

David Hübner, Thibault Verhoeven, Klaus-Robert Müller, Pieter-Jan Kindermans, Michael Tangermann (2018). Mixture of LLP and EM for a visual matrix speller (ERP) dataset from.

Provenance

¹Contributed to nemar in BIDS format.

²Curated & ingested by the EEGDash catalog; see CITATION.cff for canonical reference.

Related & sibling datasets

NM000199EEG · 13 subjSame authors NM000211EEG · 15 subj NM000205EEG · 14 subj NM000204EEG · 14 subj NM000222EEG · 10 subj

+ 1 more — see See Also below →

BIDS

BIDS 1.9.0

Sidecars

events · events.json · channels · eeg.json

Provenance

CC-BY-4.0 · DOI not on file

Machine-readable

schema.org/Dataset · Croissant

Mirrors

OpenNeuro · NEMAR · HF org

Dataset ID	`NM000195`
Title	Mixture of LLP and EM for a visual matrix speller (ERP) dataset from
Author (year)	`Hubner2018`
Canonical	—
Importable as	`NM000195`, `Hubner2018`
Year	2018
Authors	David Hübner, Thibault Verhoeven, Klaus-Robert Müller, Pieter-Jan Kindermans, Michael Tangermann
License	CC-BY-4.0
Citation / DOI	Unknown
Source links	OpenNeuro \| NeMAR \| Source URL

NM000195: eeg dataset, 12 subjects#

Quickstart#

About This Dataset#

Cohort#

Dataset Statistics#

Signal · Electrodes & live trace#

NEMAR Processing Statistics#

Manifest#

File Explorer#

API Reference#

Citation

Provenance

Related & sibling datasets

See Also#