EEGdash›NeMAR›NM000134

Iss. 134 · 20 subjects · 1525 recordings · CC-BY-NC-ND-4.0

Dataset Brief · Alljoined-1.6M

NM000134: eeg dataset, 20 subjects#

Name: Alljoined-1.6M
Published: 2025-01-01
License: CC-BY-NC-ND-4.0

Alljoined-1.6M

Access recordings and metadata through EEGDash.

Citation: Jonathan Xu, Ugo Bruzadin Nunes, Wangshu Jiang, Samuel Ryther, Jordan Pringle, Paul S. Scotti, Arnaud Delorme, Reese Kneeland (2025). Alljoined-1.6M. 10.82901/nemar.nm000134

Modality: eeg Subjects: 20 Recordings: 1525 License: CC-BY-NC-ND-4.0 Source: nemar

Metadata: Complete (100%)

20-participant EEG dataset — Alljoined-1.6M.

Data & curation Jonathan Xu · Ugo Bruzadin Nunes · Wangshu Jiang · Samuel Ryther · Jordan Pringle · Paul S. Scotti · …
Year 2025 · Distributed via NeMAR

EEG · 32 ch256 HzBIDS 1.9.0Task · images5 sessions

Layer 01Study

What was asked

Hypothesis, independent & dependent variables, paradigm, cohort, and the editorial caveats around what the recordings can and cannot answer.

Layer 02Signal · BIDS

What was recorded

Sidecars, channels & electrodes, coordinate system, event semantics, and quality stats from the NEMAR pipeline when available.

Layer 03Training · ML

What you can train on

Recommended access modes — MNE Raw, braindecode windows, PyTorch DataLoader — plus the targets the metadata makes addressable.

§ 01Access · Get started

Quickstart#

Get Started

Install

pip install eegdash

Access the data

from eegdash.dataset import NM000134

dataset = NM000134(cache_dir="./data")
# Get the raw object of the first recording
raw = dataset.datasets[0].raw
print(raw.info)

Query & Filter

Filter by subject

dataset = NM000134(cache_dir="./data", subject="01")

Advanced query

dataset = NM000134(
    cache_dir="./data",
    query={"subject": {"$in": ["01", "02"]}},
)

Iterate recordings

for rec in dataset:
    print(rec.subject, rec.raw.info['sfreq'])

Cite This Dataset

If you use this dataset in your research, please cite the original authors.

BibTeX

@dataset{nm000134,
  title = {Alljoined-1.6M},
  author = {Jonathan Xu and Ugo Bruzadin Nunes and Wangshu Jiang and Samuel Ryther and Jordan Pringle and Paul S. Scotti and Arnaud Delorme and Reese Kneeland},
  doi = {10.82901/nemar.nm000134},
  url = {https://doi.org/10.82901/nemar.nm000134},
}

§ 02Study · The README

About This Dataset#

Alljoined-1.6M is a large-scale EEG dataset of neural responses to rapid serial visual presentation (RSVP) of natural images, recorded using a consumer-grade 32-channel EMOTIV FLEX2 system. Twenty healthy adult participants (ages 23-63; 15 male, 5 female) each completed four recording sessions, generating over 1.6 million visual stimulus trials in total.

The dataset was designed to evaluate whether deep neural network-based brain-computer interface (BCI) research and semantic decoding methods can be effectively conducted with affordable consumer-grade EEG systems (approximately $2.2k versus $35-60k for research-grade systems).

Reference: Xu, J., Bruzadin Nunes, U., Jiang, W., Ryther, S., Pringle, J., Scotti, P. S., Delorme, A., & Kneeland, R. (2025). Alljoined-1.6M: A Million-Trial EEG-Image Dataset for Evaluating Affordable Brain-Computer Interfaces. https://doi.org/10.48550/arXiv.2508.18571

DOI

Alljoined-1.6M: Million-Trial EEG Dataset with Consumer-Grade Hardware

Overview

Recording Setup

Equipment: EMOTIV FLEX2, 32-channel sintered Ag/AgCl gel-based electrodes

View full README

DOI

Alljoined-1.6M: Million-Trial EEG Dataset with Consumer-Grade Hardware

Overview

Recording Setup

Equipment: EMOTIV FLEX2, 32-channel sintered Ag/AgCl gel-based electrodes
Connectivity: wireless Bluetooth 5.2
Sampling rate: 256 Hz (resampled to 250 Hz in published analyses)
Montage: extended 10-20 system, focused on occipital/visual regions
Channels: Cz, Fp1, F7, F3, CP5, CP1, P1, P3, P5, P7, PO9, PO7, PO3, O1, O9, Pz, POz, Oz, O10, O2, PO4, PO8, PO10, P8, P6, P4, P2, CP2, CP6, F4, F8, Fp2
Firmware filters: dual 50/60 Hz notch filter (built into EMOTIV firmware)
Cost: approximately $2.2k (approximately 27x cheaper than research-grade systems)

Task Paradigm

Rapid Serial Visual Presentation (RSVP) with orthogonal oddball detection. Each trial consisted of an image presented for 100 ms, followed by 100 ms of blank screen (200 ms total cycle). A small semi-transparent red fixation dot (0.2 x 0.2 degrees, 50% opacity) was present throughout.

Oddball detection: participants pressed a button when they detected catch trials featuring a Woody (Toy Story) character, which appeared in approximately 6% of sequences. Detection window was up to 2 seconds post-sequence. This task maintained engagement without biasing perception toward specific image categories. Viewing distance: 60 cm; viewing angle: 7 degrees.

Stimulus Set

16,740 unique images from the THINGS database (26,000 total images across 1,854 object categories), identical to the THINGS-EEG2 stimulus set for direct comparison. - Test images: shown 80 times per participant (4 sessions x 4 test blocks x 5 presentations) - Training images: shown 4-5 times per participant - Randomization: constrained so no image repeats within 2 intervening items

The stimulus images live under /stimuli/<NNNNN>.jpg (5-digit zero-padded, IDs 00000 through 16748). They mirror the upstream HuggingFace Alljoined/Alljoined-1.6M’s stimuli.zip, which is itself the THINGS image set used by THINGS-EEG2 (see nm000232). Each events.tsv row whose trial_type starts with stim_test,<id>,... resolves to stimuli/<id:05d>.jpg. A small Python helper is provided to make alignment one line:

import pandas as pd
from code.align_stimuli import StimulusAligner
aligner = StimulusAligner('.')
events = pd.read_csv('sub-01/ses-01/eeg/sub-01_ses-01_task-images_run-01_events.tsv', sep='\t')
paths = aligner.paths_for_events(events)        # list[Path | None]; None for non-stim_test rows
img   = aligner.image_for_event(events.iloc[0]) # PIL.Image, or None

Run python code/smoke_test.py from the dataset root to verify every stim_test reference resolves to an existing image (currently 328,364/328,364 ≈ 100%).

Subjects, Sessions, and Runs

20 subjects, 4 sessions each (sub-08 has an additional session ses-02old, a retake of session 2). Each session contains 19 RSVP blocks (runs), approximately 5 minutes each. The first 4 runs per session present test images; the remaining 15 runs present training images.

Total: 83,520 image trials per subject; approximately 1.6 million trials across all 20 participants.

| Subject | Sessions | Runs | Notes |
|---------|----------|------|-------|
| sub-01 | 4 | 76 | |
| sub-02 | 4 | 76 | |
| sub-03 | 4 | 76 | |
| sub-04 | 4 | 76 | |
| sub-05 | 4 | 76 | |
| sub-06 | 4 | 76 | |
| sub-07 | 4 | 76 | |
| sub-08 | 5 | 81 | Includes ses-02old (session 2 retake) |
| sub-09 | 4 | 76 | |
| sub-10 | 4 | 76 | |
| sub-11 | 4 | 76 | |
| sub-12 | 4 | 76 | |
| sub-13 | 4 | 76 | |
| sub-14 | 4 | 76 | |
| sub-15 | 4 | 76 | |
| sub-16 | 4 | 76 | |
| sub-17 | 4 | 76 | |
| sub-18 | 4 | 76 | |
| sub-19 | 4 | 76 | |
| sub-20 | 4 | 76 | |

Participants were recruited from San Francisco via local platforms (Craigslist 55%, Instawork 35%) and filtered from an initial pool of 48 for high behavioral engagement. Mean oddball detection performance: 88% AUC (+/- 1% SE).

Data Format

Raw continuous EEG recordings are stored as European Data Format (EDF) files, the native export format of the EMOTIV FLEX2 system (16-bit resolution). Only the 32 EEG channels are retained; EMOTIV metadata channels (timestamps, counters, contact quality, motion sensors, etc.) were excluded during conversion. Per-run files:

| Path | Description |
|------|-------------|
| `sub-XX/ses-YY/eeg/sub-XX_ses-YY_task-images_run-ZZ_eeg.edf` | Raw EEG |
| `sub-XX/ses-YY/eeg/sub-XX_ses-YY_task-images_run-ZZ_events.tsv` | Events |
| `sub-XX/ses-YY/eeg/sub-XX_ses-YY_task-images_run-ZZ_events.json` | Event metadata |
| `sub-XX/ses-YY/eeg/sub-XX_ses-YY_task-images_run-ZZ_channels.tsv` | Channels |
| `sub-XX/ses-YY/eeg/sub-XX_ses-YY_task-images_run-ZZ_eeg.json` | Recording parameters |
| `sub-XX/ses-YY/eeg/sub-XX_ses-YY_space-CapTrak_coordsystem.json` | Coordinate system |
| `sub-XX/ses-YY/eeg/sub-XX_ses-YY_space-CapTrak_electrodes.tsv` | Electrode positions |

Event annotations in the events.tsv files use the following trial_type format from the EMOTIV recording system: - stim_test,{image_id},-1,{trial} – test image presentation - oddball,... – oddball (catch) trial - behav,... – behavioral response (button press)

Source Data

The sourcedata/ directory contains the original EMOTIV JSON metadata files from each recording block. These files include the raw EMOTIV marker data with precise timestamps, UUIDs, and port information as recorded by the EMOTIV software. They are the original, unprocessed recording artifacts from the EMOTIV system, not derived products, and are stored in sourcedata/ per BIDS conventions.

sourcedata/sub-XX/ses-YY/eeg/sub-XX_ses-YY_task-images_run-ZZ_recording.json

Code

The code/ directory contains the original Alljoined-1.6M analysis code, cloned from Alljoined/Alljoined-1.6M.

BIDS Conversion

Converted to BIDS by Yahya Shirazi (Swartz Center for Computational Neuroscience, UC San Diego) using MNE-BIDS and custom scripts. - Source data: HuggingFace https://huggingface.co/datasets/Alljoined/Alljoined-1.6M - EMOTIV channel Afz renamed to AFz (standard 10-20 capitalization) - Session label session_02 old sanitized to ses-02old for BIDS compliance - 95 EMOTIV metadata channels excluded (only 32 EEG channels retained) - Conversion validated with round-trip integrity checks (data amplitude, per-channel correlation, sampling frequency, event count, and event timing)

License and Terms of Use

This dataset is distributed under CC-BY-NC-ND-4.0 (Creative Commons Attribution-NonCommercial-NoDerivatives 4.0) with the following additional terms imposed by the Alljoined team. By using this dataset you agree to all conditions below. 1. Researcher shall use the Dataset only for non-commercial research and educational purposes, in accordance with Alljoined’s Terms of Use. 2. No Warranties: Alljoined makes no representations or warranties regarding the Dataset, including but not limited to warranties of non-infringement or fitness for a particular purpose. 3. Full Responsibility: Researcher accepts full responsibility for his or her use of the Dataset and shall defend and indemnify Alljoined, including their employees, officers and agents, against any and all claims arising from Researcher’s use of the Dataset. 4. Privacy Compliance: Researcher shall comply with Alljoined’s Privacy Policy and ensure that any use of the Dataset respects the privacy rights of individuals whose data may be included. 5. Sharing Rights: Researcher may provide research associates and colleagues with access to the Dataset provided that they first agree to be bound by these terms and conditions. 6. Termination Rights: Alljoined reserves the right to terminate Researcher’s access to the Dataset at any time. 7. Commercial Entity Binding: If Researcher is employed by a for-profit, commercial entity, Researcher’s employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer. 8. Governing Law: The law of the State of California shall apply to all disputes under this agreement. - Full terms: https://www.alljoined.com/terms-of-use - Privacy policy: https://www.alljoined.com/privacy-policy

References

Xu, J., Bruzadin Nunes, U., Jiang, W., Ryther, S., Pringle, J., Scotti, P. S., Delorme, A., & Kneeland, R. (2025). Alljoined-1.6M: A Million-Trial EEG-Image Dataset for Evaluating Affordable Brain-Computer Interfaces. https://doi.org/10.48550/arXiv.2508.18571 Xu, J.*, Aristimunha, B.*, Feucht, M. E.*, Qian, E., Liu, C., Shahjahan, T., … & Nestor, A. (2024). Alljoined–A dataset for EEG-to-Image decoding. Workshop Data Curation and Augmentation in Medical Imaging at 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1–9. https://doi.org/10.48550/arXiv.2404.05553

NEMAR Metadata#

[![DOI](https://img.shields.io/badge/DOI-10.82901%2Fnemar.nm000134-blue)](https://doi.org/10.82901/nemar.nm000134) # Alljoined-1.6M: Million-Trial EEG Dataset with Consumer-Grade Hardware ## Overview Alljoined-1.6M is a large-scale EEG dataset of neural responses to rapid serial visual presentation (RSVP) of natural images, recorded using a consumer-grade 32-channel EMOTIV FLEX2 system. Twenty healthy adult participants (ages 23-63; 15 male, 5 female) each completed four recording sessions, generating over 1.6 million visual stimulus trials in total. The dataset was designed to evaluate whether deep neural network-based brain-computer interface (BCI) research and semantic decoding methods can be effectively conducted with affordable consumer-grade EEG systems (approximately $2.2k versus $35-60k for research-grade systems). Reference: Xu, J., Bruzadin Nunes, U., Jiang, W., Ryther, S., Pringle, J., Scotti, P. S., Delorme, A., & Kneeland, R. (2025). Alljoined-1.6M: A Million-Trial EEG-Image Dataset for Evaluating Affordable Brain-Computer Interfaces. <https://doi.org/10.48550/arXiv.2508.18571> ## Recording Setup - Equipment: EMOTIV FLEX2, 32-channel sintered Ag/AgCl gel-based electrodes - Connectivity: wireless Bluetooth 5.2 - Sampling rate: 256 Hz (resampled to 250 Hz in published analyses) - Montage: extended 10-20 system, focused on occipital/visual regions - Channels: Cz, Fp1, F7, F3, CP5, CP1, P1, P3, P5, P7, PO9, PO7, PO3, O1, O9, Pz, POz, Oz, O10, O2, PO4, PO8, PO10, P8, P6, P4, P2, CP2, CP6, F4, F8, Fp2 - Firmware filters: dual 50/60 Hz notch filter (built into EMOTIV firmware) - Cost: approximately $2.2k (approximately 27x cheaper than research-grade systems) ## Task Paradigm Rapid Serial Visual Presentation (RSVP) with orthogonal oddball detection. Each trial consisted of an image presented for 100 ms, followed by 100 ms of blank screen (200 ms total cycle). A small semi-transparent red fixation dot (0.2 x 0.2 degrees, 50% opacity) was present throughout. Oddball detection: participants pressed a button when they detected catch trials featuring a Woody (Toy Story) character, which appeared in approximately 6% of sequences. Detection window was up to 2 seconds post-sequence. This task maintained engagement without biasing perception toward specific image categories. Viewing distance: 60 cm; viewing angle: 7 degrees. ## Stimulus Set 16,740 unique images from the THINGS database (26,000 total images across 1,854 object categories), identical to the THINGS-EEG2 stimulus set for direct comparison. - Test images: shown 80 times per participant (4 sessions x 4 test blocks x 5 presentations) - Training images: shown 4-5 times per participant - Randomization: constrained so no image repeats within 2 intervening items The stimulus images live under /stimuli/<NNNNN>.jpg (5-digit zero-padded, IDs 00000 through 16748). They mirror the upstream HuggingFace Alljoined/Alljoined-1.6M’s stimuli.zip, which is itself the THINGS image set used by THINGS-EEG2 (see nm000232). Each events.tsv row whose trial_type starts with stim_test,<id>,… resolves to stimuli/<id:05d>.jpg. A small Python helper is provided to make alignment one line: `python import pandas as pd from code.align_stimuli import StimulusAligner aligner = StimulusAligner('.') events = pd.read_csv('sub-01/ses-01/eeg/sub-01_ses-01_task-images_run-01_events.tsv', sep='\t') paths = aligner.paths_for_events(events) # list[Path | None]; None for non-stim_test rows img = aligner.image_for_event(events.iloc[0]) # PIL.Image, or None ` Run python code/smoke_test.py from the dataset root to verify every stim_test reference resolves to an existing image (currently 328,364/328,364 ≈ 100%). ## Subjects, Sessions, and Runs 20 subjects, 4 sessions each (sub-08 has an additional session ses-02old, a retake of session 2). Each session contains 19 RSVP blocks (runs), approximately 5 minutes each. The first 4 runs per session present test images; the remaining 15 runs present training images. Total: 83,520 image trials per subject; approximately 1.6 million trials across all 20 participants. | Subject | Sessions | Runs | Notes | |---------|———-|------|——-| | sub-01 | 4 | 76 | | | sub-02 | 4 | 76 | | | sub-03 | 4 | 76 | | | sub-04 | 4 | 76 | | | sub-05 | 4 | 76 | | | sub-06 | 4 | 76 | | | sub-07 | 4 | 76 | | | sub-08 | 5 | 81 | Includes ses-02old (session 2 retake) | | sub-09 | 4 | 76 | | | sub-10 | 4 | 76 | | | sub-11 | 4 | 76 | | | sub-12 | 4 | 76 | | | sub-13 | 4 | 76 | | | sub-14 | 4 | 76 | | | sub-15 | 4 | 76 | | | sub-16 | 4 | 76 | | | sub-17 | 4 | 76 | | | sub-18 | 4 | 76 | | | sub-19 | 4 | 76 | | | sub-20 | 4 | 76 | | Participants were recruited from San Francisco via local platforms (Craigslist 55%, Instawork 35%) and filtered from an initial pool of 48 for high behavioral engagement. Mean oddball detection performance: 88% AUC (+/- 1% SE). ## Data Format Raw continuous EEG recordings are stored as European Data Format (EDF) files, the native export format of the EMOTIV FLEX2 system (16-bit resolution). Only the 32 EEG channels are retained; EMOTIV metadata channels (timestamps, counters, contact quality, motion sensors, etc.) were excluded during conversion. Per-run files: | Path | Description | |------|————-| | sub-XX/ses-YY/eeg/sub-XX_ses-YY_task-images_run-ZZ_eeg.edf | Raw EEG | | sub-XX/ses-YY/eeg/sub-XX_ses-YY_task-images_run-ZZ_events.tsv | Events | | sub-XX/ses-YY/eeg/sub-XX_ses-YY_task-images_run-ZZ_events.json | Event metadata | | sub-XX/ses-YY/eeg/sub-XX_ses-YY_task-images_run-ZZ_channels.tsv | Channels | | sub-XX/ses-YY/eeg/sub-XX_ses-YY_task-images_run-ZZ_eeg.json | Recording parameters | | sub-XX/ses-YY/eeg/sub-XX_ses-YY_space-CapTrak_coordsystem.json | Coordinate system | | sub-XX/ses-YY/eeg/sub-XX_ses-YY_space-CapTrak_electrodes.tsv | Electrode positions | Event annotations in the events.tsv files use the following trial_type format from the EMOTIV recording system: - stim_test,{image_id},-1,{trial} – test image presentation - oddball,… – oddball (catch) trial - behav,… – behavioral response (button press) ## Source Data The sourcedata/ directory contains the original EMOTIV JSON metadata files from each recording block. These files include the raw EMOTIV marker data with precise timestamps, UUIDs, and port information as recorded by the EMOTIV software. They are the original, unprocessed recording artifacts from the EMOTIV system, not derived products, and are stored in sourcedata/ per BIDS conventions. ` sourcedata/sub-XX/ses-YY/eeg/sub-XX_ses-YY_task-images_run-ZZ_recording.json ` ## Code The code/ directory contains the original Alljoined-1.6M analysis code, cloned from <Alljoined/Alljoined-1.6M>. ## BIDS Conversion Converted to BIDS by Yahya Shirazi (Swartz Center for Computational Neuroscience, UC San Diego) using MNE-BIDS and custom scripts. - Source data: HuggingFace <https://huggingface.co/datasets/Alljoined/Alljoined-1.6M> - EMOTIV channel Afz renamed to AFz (standard 10-20 capitalization) - Session label session_02 old sanitized to ses-02old for BIDS compliance - 95 EMOTIV metadata channels excluded (only 32 EEG channels retained) - Conversion validated with round-trip integrity checks (data amplitude, per-channel correlation, sampling frequency, event count, and event timing) ## License and Terms of Use This dataset is distributed under CC-BY-NC-ND-4.0 (Creative Commons Attribution-NonCommercial-NoDerivatives 4.0) with the following additional terms imposed by the Alljoined team. By using this dataset you agree to all conditions below. 1. Researcher shall use the Dataset only for non-commercial research and educational purposes, in accordance with Alljoined’s [Terms of Use](https://www.alljoined.com/terms-of-use). 2. No Warranties: Alljoined makes no representations or warranties regarding the Dataset, including but not limited to warranties of non-infringement or fitness for a particular purpose. 3. Full Responsibility: Researcher accepts full responsibility for his or her use of the Dataset and shall defend and indemnify Alljoined, including their employees, officers and agents, against any and all claims arising from Researcher’s use of the Dataset. 4. Privacy Compliance: Researcher shall comply with Alljoined’s [Privacy Policy](https://www.alljoined.com/privacy-policy) and ensure that any use of the Dataset respects the privacy rights of individuals whose data may be included. 5. Sharing Rights: Researcher may provide research associates and colleagues with access to the Dataset provided that they first agree to be bound by these terms and conditions. 6. Termination Rights: Alljoined reserves the right to terminate Researcher’s access to the Dataset at any time. 7. Commercial Entity Binding: If Researcher is employed by a for-profit, commercial entity, Researcher’s employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer. 8. Governing Law: The law of the State of California shall apply to all disputes under this agreement. - Full terms: <https://www.alljoined.com/terms-of-use> - Privacy policy: <https://www.alljoined.com/privacy-policy> ## References Xu, J., Bruzadin Nunes, U., Jiang, W., Ryther, S., Pringle, J., Scotti, P. S., Delorme, A., & Kneeland, R. (2025). Alljoined-1.6M: A Million-Trial EEG-Image Dataset for Evaluating Affordable Brain-Computer Interfaces. https://doi.org/10.48550/arXiv.2508.18571 Xu, J.*, Aristimunha, B.*, Feucht, M. E.*, Qian, E., Liu, C., Shahjahan, T., … & Nestor, A. (2024). Alljoined–A dataset for EEG-to-Image decoding. Workshop Data Curation and Augmentation in Medical Imaging at 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1–9. https://doi.org/10.48550/arXiv.2404.05553

License: CC-BY-NC-ND-4.0

Authors:

Jonathan Xu
Ugo Bruzadin Nunes
Wangshu Jiang
Samuel Ryther
Jordan Pringle
… and 3 more

Versions:

Version	DOI	Released
`current`	10.82901/nemar.nm000134

§ 03Cohort · Participants

Cohort#

Dataset Statistics#

Channel counts: 32 ch (n=1525 recordings)

Sampling frequencies: 256.0 Hz (n=1525 recordings)

Total recording duration: 129 h

§ 04Signal · Electrodes & trace

Signal · Electrodes & live trace#

Fig. 01 Signal & montage 32 ch · EEG · 256 Hz · 20 subjects, 1525 recordings

Live trace viewer — sub-13 · ses-02 · task-images · run-11

Showing one representative recording out of 20 subjects and 1525 recordings in this dataset. Browse the full set on OpenNeuro; drop any other _eeg.{set,edf,bdf,vhdr} file onto the viewer (or pass ?eeg=<url>) to inspect it.

Electrode layout — EEG · 32 sensors — 32 channels

NEMAR Processing Statistics#

The plots below are generated by NEMAR’s automated EEG pipeline. The histogram shows pipeline success for data cleaning and ICA decomposition, the percentage of data frames and EEG channels retained after artefact removal, line noise per channel (RMS, dB), and the age/gender distribution of participants.

HED event descriptors word cloud

§ 05Manifest · BIDS tree

Manifest#

File Explorer#

Browse the BIDS file structure of this dataset. Records are fetched on demand from the EEGDash catalog the first time you open the explorer.

Recordings—

Files—

Subjects—

Modalities—

Click to load file structure…

§ 06API · Programmatic access

API Reference#

Signature

eegdash.dataset

class

eegdash.dataset.NM000134(cache_dir, query=None, s3_bucket=None, **kwargs)

Bases: EEGDashDataset

Author (year)Xu2025

Canonical—

Importable asNM000134 · Xu2025

Sourceeegdash/dataset/registry.py · [source ↗]

class eegdash.dataset.NM000134(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#

Alljoined-1.6M

Study:: nm000134 (NeMAR)
Author (year):: Xu2025
Canonical:: —

Also importable as: NM000134, Xu2025.

Modality: eeg; Subject type: Unknown. Subjects: 20; recordings: 1525; tasks: 1.

Parameters:

cache_dir (str | Path) – Directory where data are cached locally.
query (dict | None) – Additional MongoDB-style filters to AND with the dataset selection. Must not contain the key dataset.
s3_bucket (str | None) – Base S3 bucket used to locate the data.
**kwargs (dict) – Additional keyword arguments forwarded to EEGDashDataset.

data_dir#

Local dataset cache directory (cache_dir / dataset_id).

Type:: Path

query#

Merged query with the dataset filter applied.

Type:: dict

records#

Metadata records used to build the dataset, if pre-fetched.

Type:: list[dict] | None

Notes

Each item is a recording; recording-level metadata are available via dataset.description. query supports MongoDB-style filters on fields in ALLOWED_QUERY_FIELDS and is combined with the dataset filter. Dataset-specific caveats are not provided in the summary metadata.

References

OpenNeuro dataset: https://openneuro.org/datasets/nm000134 NeMAR dataset: https://nemar.org/dataexplorer/detail?dataset_id=nm000134 DOI: https://doi.org/10.82901/nemar.nm000134

Examples

>>> from eegdash.dataset import NM000134
>>> dataset = NM000134(cache_dir="./data")
>>> recording = dataset[0]
>>> raw = recording.load()

__init__(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#

save(path: str, overwrite: bool = False, offset: int = 0)[source]#

Save datasets to files by creating one subdirectory for each dataset:

path/
    0/
        0-raw.fif | 0-epo.fif
        description.json
        raw_preproc_kwargs.json (if raws were preprocessed)
        window_kwargs.json (if this is a windowed dataset)
        window_preproc_kwargs.json  (if windows were preprocessed)
        target_name.json (if target_name is not None and dataset is raw)
    1/
        1-raw.fif | 1-epo.fif
        description.json
        raw_preproc_kwargs.json (if raws were preprocessed)
        window_kwargs.json (if this is a windowed dataset)
        window_preproc_kwargs.json  (if windows were preprocessed)
        target_name.json (if target_name is not None and dataset is raw)

Parameters:

path (str) –

Directory in which subdirectories are created to store
-raw.fif | -epo.fif and .json files to.
overwrite (bool) – Whether to delete old subdirectories that will be saved to in this call.
offset (int) – If provided, the integer is added to the id of the dataset in the concat. This is useful in the setting of very large datasets, where one dataset has to be processed and saved at a time to account for its original position.

Access modesMNE → braindecode → PyTorch → ML

.rawMNE Raw object — standard tools (filter, epoch, ICA, plot_psd).mne

BaseConcatDatasetEach record is a lazy BaseDataset from braindecode — windowed via create_windows_from_events.braindecode

DataLoaderWraps the windowed dataset into a PyTorch DataLoader; supports parallel workers and on-the-fly augmentations.pytorch

Zarr cacheOptional braindecode Zarr mirror for fast resume; persisted to cache_dir.zarr

Hugging FacePre-bundled mirror at EEGDash/nm000134 · pull with datasets.load_dataset("EEGDash/nm000134").huggingface

Croissant 1.0Machine-readable JSON-LD descriptor — NM000134.croissant.json (MLCommons schema, ingestible by PyTorch / TensorFlow / JAX).mlcommons

Examples using EEGDashcurated · start here

Find datasets with the EEGDash APIQuery the catalogue, filter by task or modality, list candidates.

Load one EEG recordingResolve a single record to an MNE Raw with channels and events.

EEG recording to PyTorch DataLoaderWrap braindecode windows in a DataLoader for model training.

Preprocess EEG and create windowsFilter, resample, epoch — and persist the windowed dataset.

Save and reload prepared dataCache a windowed dataset to disk and reattach it without recompute.

Download a dataset locallyPrefetch BIDS files to a local cache and validate the layout.

Swap any load_dataset(...) call for nm000134 to reproduce the tutorial on this dataset.

Citation

Jonathan Xu, Ugo Bruzadin Nunes, Wangshu Jiang, Samuel Ryther, Jordan Pringle, … (2025). Alljoined-1.6M. 10.82901/nemar.nm000134

Provenance

¹Contributed to nemar in BIDS format.

²Curated & ingested by the EEGDash catalog; see CITATION.cff for canonical reference.

³Persistent identifier: 10.82901/nemar.nm000134.

Related & sibling datasets

NM000229EEG · 29 subj NM000157EEG · 19 subj ON004362EEG · 109 subj ON003190EEG · 19 subj NM000203EEG · 13 subj

+ 1 more — see See Also below →

BIDS

BIDS 1.9.0

Sidecars

events · events.json · channels · eeg.json

Provenance

CC-BY-NC-ND-4.0 · 10.82901/nemar.nm000134

Machine-readable

schema.org/Dataset · Croissant

Mirrors

OpenNeuro · NEMAR · HuggingFace · Paper

Dataset ID	`NM000134`
Title	Alljoined-1.6M
Author (year)	`Xu2025`
Canonical	—
Importable as	`NM000134`, `Xu2025`
Year	2025
Authors	Jonathan Xu, Ugo Bruzadin Nunes, Wangshu Jiang, Samuel Ryther, Jordan Pringle, Paul S. Scotti, Arnaud Delorme, Reese Kneeland
License	CC-BY-NC-ND-4.0
Citation / DOI	10.82901/nemar.nm000134
Source links	OpenNeuro \| NeMAR

NM000134: eeg dataset, 20 subjects#

Quickstart#

About This Dataset#

NEMAR Metadata#

Cohort#

Dataset Statistics#

Signal · Electrodes & live trace#

NEMAR Processing Statistics#

Manifest#

File Explorer#

API Reference#

Citation

Provenance

Related & sibling datasets

See Also#