EEGdash›NeMAR›NM000255

Iss. 255 · 30 subjects · 291 recordings · CC BY 4.0

Dataset Brief · The Brain, Body, and Behaviour Dataset (1.0.0) - Experiment 2

NM000255: eeg dataset, 30 subjects#

The Brain, Body, and Behaviour Dataset (1.0.0) - Experiment 2

Access recordings and metadata through EEGDash.

Citation: Jens Madsen, Nikhil Kuppa, Lucas Parra (—). The Brain, Body, and Behaviour Dataset (1.0.0) - Experiment 2.

Modality: eeg Subjects: 30 Recordings: 291 License: CC BY 4.0 Source: nemar

Metadata: Complete (90%)

30-participant EEG dataset — The Brain, Body, and Behaviour Dataset (1.0.0) - Experiment 2.

Data & curation Jens Madsen · Nikhil Kuppa · Lucas Parra
Funding DRL-2201835

EEG · 64 ch128 HzBIDS 1.10.05 tasks2 sessions

Layer 01Study

What was asked

Hypothesis, independent & dependent variables, paradigm, cohort, and the editorial caveats around what the recordings can and cannot answer.

Layer 02Signal · BIDS

What was recorded

Sidecars, channels & electrodes, coordinate system, event semantics, and quality stats from the NEMAR pipeline when available.

Layer 03Training · ML

What you can train on

Recommended access modes — MNE Raw, braindecode windows, PyTorch DataLoader — plus the targets the metadata makes addressable.

§ 01Access · Get started

Quickstart#

Get Started

Install

pip install eegdash

Access the data

from eegdash.dataset import NM000255

dataset = NM000255(cache_dir="./data")
# Get the raw object of the first recording
raw = dataset.datasets[0].raw
print(raw.info)

Query & Filter

Filter by subject

dataset = NM000255(cache_dir="./data", subject="01")

Advanced query

dataset = NM000255(
    cache_dir="./data",
    query={"subject": {"$in": ["01", "02"]}},
)

Iterate recordings

for rec in dataset:
    print(rec.subject, rec.raw.info['sfreq'])

Cite This Dataset

If you use this dataset in your research, please cite the original authors.

BibTeX

@dataset{nm000255,
  title = {The Brain, Body, and Behaviour Dataset (1.0.0) - Experiment 2},
  author = {Jens Madsen and Nikhil Kuppa and Lucas Parra},
}

§ 02Study · The README

About This Dataset#

Description: Subjects watched five videos, knowing they’d be tested afterward. After each video, they answered 11 to 12 factual multiple-choice questions. Videos and questions were presented in random order.

Subjects:*31,*Sessions: 2 1. Attentive - Watch videos with focus and answer questions after 2. Distracted - Watch videos while counting backwards in your head, no test after watching

In experiment 2, subjects watched 5 different informative videos not knowing they would be questioned about the content of each video all together at the end of the video-watching.

The Brain, Body, and Behaviour Dataset - Experiment 2

Summary:

This experiment was incidental learning because the subjects did not know they would be questioned about the contents of the video. We collected ECG, EEG, EOG, Head Motion, Gaze Coordinates, and Pupil Size. Sessions: - Ses-01 contains these signals recorded on subjects watching these videos in an attentive condition. They answered questions pertaining to each video after watching the videos one at a time. - Ses-02 contains the data recorded on subjects watching all of the videos again in a distracted condition in the same order as Ses-01 described above. The distraction from the stimuli was to silently count backwards from a random prime number in steps of 7. No questions were asked after this session.

View full README

The Brain, Body, and Behaviour Dataset - Experiment 2

Summary:

This experiment was incidental learning because the subjects did not know they would be questioned about the contents of the video. We collected ECG, EEG, EOG, Head Motion, Gaze Coordinates, and Pupil Size. Sessions: - Ses-01 contains these signals recorded on subjects watching these videos in an attentive condition. They answered questions pertaining to each video after watching the videos one at a time. - Ses-02 contains the data recorded on subjects watching all of the videos again in a distracted condition in the same order as Ses-01 described above. The distraction from the stimuli was to silently count backwards from a random prime number in steps of 7. No questions were asked after this session.

Questionnaires:

Questionnaires and answers to them can be found in the phenotype/ directory. 1. stimuli_questionnaire:

The stimuli_questionnaire tsv and json files have the questions, answers and correct answers. - “Domain” question type is a general domain knowledge question that was asked before the subject watched a video. - “Memory” question type is a memory testing question that is asked after the subject finishes watching the video, and has questions that are directly pertaining to the video content. 2. asrs_questionnaire:

These tsv and json files have questions and answers to the ASRS questionnaire that is an adult ADHD symptom checklist test. The answers are options from 0 (never) to 4 (very often). - There are two different scales used to score this test, and there are 2 different parts (Part A, Part B) to this test. Screen test involves just the first 6 questions (Part A), and the full-test involves the entire 18 question test (Part A + Part B). Despite lesser questions, the screen test (first 6 ques) gives one a higher indication of ADHD prevalence than the full-test. - This is the reason why the 2 different scoring scales are based on the Screen test. Scale one is out of 6, where each question counts for just 1 point depending on the frequency of the symptom occurrence, and if one scores over 4, there is a high chance of ADHD prevalence. The next scale is out of 24 where the 5 frequencies of symptom occurrence (never, rarely, sometimes, often, very often) are assigned scores in an increasing order of 0-4, and for each question, the respective scores of the frequency is the score for that question. Out of 24, the threshold for high prevalence of ADHD is 18 for this scale.

The following are the question numbers for inattentive ADHD and Hyperactive ADHD inattentive_questions = [1,2,3,4,7,8,9,10,11,12] hyperactive_questions = [5,6,13,14,15,16,17,18]

General BIDS dataset structure overview for all MEVD experiments

Each BIDS dataset (one per experiment) has files that describe the dataset, its participants, and related metadata at the root directory - dataset_description.json, participants.tsv and participants.json, providing essential information about the study and participants to anyone working with the dataset.

In the root directory, the raw data of each participant is organized by subject (sub-XX) and then further divided into sessions (ses-XX) to accommodate multi-session data collection (some experiments have 2 sessions: attentive and distracted). Inside each session folder, you’ll find modality-specific subfolders, such as: - eeg: Contains electroencephalogram data files (.bdf), event logs (events.tsv), and additional metadata (.json) that describe the experiment and recording conditions. - beh: Contains physiological recordings like ECG (electrocardiogram) and EOG (electrooculogram) stored in compressed .tsv.gz files, with accompanying metadata in .json files. - eyetrack: Contains physiological recordings like eye-tracking (gaze coordinates and pupil size), and head movement data, stored in compressed .tsv.gz files, with accompanying metadata in .json files.

| Modality  | Filename Format                                                     | Data File Extension | Metadata                                     | Notes                                                                                      |
|-----------|---------------------------------------------------------------------|---------------------|----------------------------------------------|--------------------------------------------------------------------------------------------|
| EEG       | `sub_xx-ses_xx-task-stimxx_{file_of_interest.extension}`            | `.bdf`              | Exists for each file as a `.json` file       |  There are event files with a `.tsv` extension that include start and end times in seconds.|
| Beh       | `sub_xx-ses_xx-task-stimxx_recording-{modality}_physio.{extension}` | `.tsv.gz`           | Exists for each file as a `.json` file       |                                                                                            |
| EyeTrack  | `sub_xx-ses_xx-task-stimxx_{modality}_eyetrack.{extension}`         | `.tsv.gz`           | Exists for each file as a `.json` file       |                                                                                            |

There is also a derivatives directory which contains preprocessed data derived from the raw recordings, such as filtered heart rate data or preprocessed physiological signals, making it easy to work with and apply advanced analyses. Files in this directory are also stored in the BIDS structure (subject-wise → session-wise → modality-wise). A brief overview on what you can expect in the derivatives directory: - eeg: Contains filtered electroencephalogram (EEG) data files (.bdf). - beh: Contains heart beats (r-peak timestamps synchronized with the stimulus), heart-rates, filtered ECG, breath-rates, and they are stored in compressed .tsv.gz files. - eyetrack: Contains saccades (timestamps), saccade-rates, fixations (timestamps), fixation-rates, blinks (timestamps), blink-rates, and filtered pupil and gaze files all stored in compressed .tsv.gz files.

How to Navigate the Dataset

Top-Level Files: Files like dataset_description.json and participants.tsv give you an overview of the study and participants, serving as your starting point when exploring a dataset.
Derivatives Folder: In this folder, processed data is organized like how raw data is organized in the BIDS directory.
Subject Folders (sub-XX): Inside these folders, data is organized by individual participants, providing separate directories for each person involved in the study.
Session Folders (ses-XX): For longitudinal or multi-session studies, session folders contain the raw and derived data for each session, making it easy to track and analyze data collected over time.
Modality-Specific Subfolders: Each session is further split into subfolders according to data modalities (e.g., EEG, behavior/physio), helping you to quickly locate the data of interest, whether it’s brain recordings, heart rate data, or eye movement information.
Tasks: Each subject is exposed to different stimuli during data collection, and BIDS uses “tasks” in filenames to clearly differentiate recordings based on these experimental conditions. This makes it easy to identify, retrieve, and analyze data associated with specific tasks across multiple modalities.

Dataset Overview

The table below is a breakdown on the total minutes of raw data available for each session across the complete dataset:

| Modality                   | Hours   |
|----------------------------|---------|
| EEG                        | 65      |
| ECG                        | 93.5    |
| EOG                        | 94.35   |
| Head                       | 92.46   |
| Gaze                       | 110.56  |
| Pupil                      | 110.56  |
| Respiration                | 44.2    |

NEMAR Metadata#

# The Brain, Body, and Behaviour Dataset - Experiment 2 ## Summary: Description: Subjects watched five videos, knowing they’d be tested afterward. After each video, they answered 11 to 12 factual multiple-choice questions. Videos and questions were presented in random order. Subjects: 31, Sessions: 2 1. Attentive - Watch videos with focus and answer questions after 2. Distracted - Watch videos while counting backwards in your head, no test after watching # Tasks (Stimuli) ## Experiment 2 ———————————————————————————————————————— | Stimulus ID | Name | URL | |-----------------|——————————————|---------------------------------------------------------| | Stim-01 | Why are Stars Star-Shaped | [Watch Here](https://www.youtube.com/embed/VVAKFJ8VVp4) | | Stim-02 | How Modern Light Bulbs Work | [Watch Here](https://www.youtube.com/embed/oCEKMEeZXug) | | Stim-03 | The Immune System Explained – Bacteria | [Watch Here](https://www.youtube.com/embed/zQGOcOUBi6s) | | Stim-04 | Who Invented the Internet - And Why | [Watch Here](https://www.youtube.com/embed/21eFwbb48sE) | | Stim-05 | Why Do We Have More Boys Than Girls | [Watch Here](https://www.youtube.com/embed/3IaYhG11ckA) | |----------------------------------------------------------------------------------------------------------------------| Modalities Recorded: - Gaze (X, Y) - Pupil Size - Blinks - Saccades - EOG - EEG - ECG # Experiment Setup: In experiment 2, subjects watched 5 different informative videos not knowing they would be questioned about the content of each video all together at the end of the video-watching. This experiment was incidental learning because the subjects did not know they would be questioned about the contents of the video. We collected ECG, EEG, EOG, Head Motion, Gaze Coordinates, and Pupil Size. Sessions: - Ses-01 contains these signals recorded on subjects watching these videos in an attentive condition. They answered questions pertaining to each video after watching the videos one at a time. - Ses-02 contains the data recorded on subjects watching all of the videos again in a distracted condition in the same order as Ses-01 described above. The distraction from the stimuli was to silently count backwards from a random prime number in steps of 7. No questions were asked after this session. # Questionnaires: Questionnaires and answers to them can be found in the phenotype/ directory. 1. stimuli_questionnaire: The stimuli_questionnaire tsv and json files have the questions, answers and correct answers. - “Domain” question type is a general domain knowledge question that was asked before the subject watched a video. - “Memory” question type is a memory testing question that is asked after the subject finishes watching the video, and has questions that are directly pertaining to the video content. 2. asrs_questionnaire: These tsv and json files have questions and answers to the ASRS questionnaire that is an adult ADHD symptom checklist test. The answers are options from 0 (never) to 4 (very often). - There are two different scales used to score this test, and there are 2 different parts (Part A, Part B) to this test. Screen test involves just the first 6 questions (Part A), and the full-test involves the entire 18 question test (Part A + Part B). Despite lesser questions, the screen test (first 6 ques) gives one a higher indication of ADHD prevalence than the full-test. - This is the reason why the 2 different scoring scales are based on the Screen test. Scale one is out of 6, where each question counts for just 1 point depending on the frequency of the symptom occurrence, and if one scores over 4, there is a high chance of ADHD prevalence. The next scale is out of 24 where the 5 frequencies of symptom occurrence (never, rarely, sometimes, often, very often) are assigned scores in an increasing order of 0-4, and for each question, the respective scores of the frequency is the score for that question. Out of 24, the threshold for high prevalence of ADHD is 18 for this scale. The following are the question numbers for inattentive ADHD and Hyperactive ADHD inattentive_questions = [1,2,3,4,7,8,9,10,11,12] hyperactive_questions = [5,6,13,14,15,16,17,18] # General BIDS dataset structure overview for all MEVD experiments Each BIDS dataset (one per experiment) has files that describe the dataset, its participants, and related metadata at the root directory - dataset_description.json, participants.tsv and participants.json, providing essential information about the study and participants to anyone working with the dataset. In the root directory, the raw data of each participant is organized by subject (sub-XX) and then further divided into sessions (ses-XX) to accommodate multi-session data collection (some experiments have 2 sessions: attentive and distracted). Inside each session folder, you’ll find modality-specific subfolders, such as: - eeg: Contains electroencephalogram data files (.bdf), event logs (events.tsv), and additional metadata (.json) that describe the experiment and recording conditions. - beh: Contains physiological recordings like ECG (electrocardiogram) and EOG (electrooculogram) stored in compressed .tsv.gz files, with accompanying metadata in .json files. - eyetrack: Contains physiological recordings like eye-tracking (gaze coordinates and pupil size), and head movement data, stored in compressed .tsv.gz files, with accompanying metadata in .json files. —————————————————————————————————————————————————————————————————————————————————– | Modality | Filename Format | Data File Extension | Metadata | Notes | |-----------|———————————————————————|---------------------|———————————————-|--------------------------------------------------------------------------------------------| | EEG | sub_xx-ses_xx-task-stimxx_{file_of_interest.extension} | .bdf | Exists for each file as a .json file | There are event files with a .tsv extension that include start and end times in seconds.| | Beh | sub_xx-ses_xx-task-stimxx_recording-{modality}_physio.{extension} | .tsv.gz | Exists for each file as a .json file | | | EyeTrack | sub_xx-ses_xx-task-stimxx_{modality}_eyetrack.{extension} | .tsv.gz | Exists for each file as a .json file | | —————————————————————————————————————————————————————————————————————————————————– There is also a derivatives directory which contains preprocessed data derived from the raw recordings, such as filtered heart rate data or preprocessed physiological signals, making it easy to work with and apply advanced analyses. Files in this directory are also stored in the BIDS structure (subject-wise → session-wise → modality-wise). A brief overview on what you can expect in the derivatives directory: - eeg: Contains filtered electroencephalogram (EEG) data files (.bdf). - beh: Contains heart beats (r-peak timestamps synchronized with the stimulus), heart-rates, filtered ECG, breath-rates, and they are stored in compressed .tsv.gz files. - eyetrack: Contains saccades (timestamps), saccade-rates, fixations (timestamps), fixation-rates, blinks (timestamps), blink-rates, and filtered pupil and gaze files all stored in compressed .tsv.gz files. ## How to Navigate the Dataset - Top-Level Files: Files like dataset_description.json and participants.tsv give you an overview of the study and participants, serving as your starting point when exploring a dataset. - Derivatives Folder: In this folder, processed data is organized like how raw data is organized in the BIDS directory. - Subject Folders (sub-XX): Inside these folders, data is organized by individual participants, providing separate directories for each person involved in the study. - Session Folders (ses-XX): For longitudinal or multi-session studies, session folders contain the raw and derived data for each session, making it easy to track and analyze data collected over time. - Modality-Specific Subfolders: Each session is further split into subfolders according to data modalities (e.g., EEG, behavior/physio), helping you to quickly locate the data of interest, whether it’s brain recordings, heart rate data, or eye movement information. - Tasks: Each subject is exposed to different stimuli during data collection, and BIDS uses “tasks” in filenames to clearly differentiate recordings based on these experimental conditions. This makes it easy to identify, retrieve, and analyze data associated with specific tasks across multiple modalities. ## Dataset Overview The table below is a breakdown on the total minutes of raw data available for each session across the complete dataset: —————————————- | Modality | Hours | |----------------------------|———| | EEG | 65 | | ECG | 93.5 | | EOG | 94.35 | | Head | 92.46 | | Gaze | 110.56 | | Pupil | 110.56 | | Respiration | 44.2 | —————————————-

License: CC BY 4.0

Authors:

Jens Madsen
Nikhil Kuppa
Lucas Parra

§ 03Cohort · Participants

Cohort#

Dataset Statistics#

Age distribution (n=31, range 18–50 yr, mean 26.0 yr · sex per subject not reported)

15202530354550

Sex composition

Female
21
Male
10
F : M ratio
2.10 : 1

68% female · n = 31 subjects with reported sex.

Channel counts: 64 ch (n=291 recordings)

Sampling frequencies: 128.0 Hz (n=291 recordings)

Total recording duration: 2 min

§ 04Signal · Electrodes & trace

Signal · Electrodes & live trace#

Fig. 01 Signal & montage 64 ch · EEG · 128 Hz · 30 subjects, 291 recordings

Live trace viewer — sub-01 · ses-01 · task-stim01

Showing one representative recording out of 30 subjects and 291 recordings in this dataset. Browse the full set on OpenNeuro; drop any other _eeg.{set,edf,bdf,vhdr} file onto the viewer (or pass ?eeg=<url>) to inspect it.

Electrode layout — EEG · 64 sensors — 64 channels

NEMAR Processing Statistics#

The plots below are generated by NEMAR’s automated EEG pipeline. The histogram shows pipeline success for data cleaning and ICA decomposition, the percentage of data frames and EEG channels retained after artefact removal, line noise per channel (RMS, dB), and the age/gender distribution of participants.

HED event descriptors word cloud

§ 05Manifest · BIDS tree

Manifest#

File Explorer#

Browse the BIDS file structure of this dataset. Records are fetched on demand from the EEGDash catalog the first time you open the explorer.

Recordings—

Files—

Subjects—

Modalities—

Click to load file structure…

§ 06API · Programmatic access

API Reference#

Signature

eegdash.dataset

class

eegdash.dataset.NM000255(cache_dir, query=None, s3_bucket=None, **kwargs)

Bases: EEGDashDataset

Author (year)Madsen2024_E2

Canonical—

Importable asNM000255 · Madsen2024_E2

Sourceeegdash/dataset/registry.py · [source ↗]

class eegdash.dataset.NM000255(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#

The Brain, Body, and Behaviour Dataset (1.0.0) - Experiment 2

Study:: nm000255 (NeMAR)
Author (year):: Madsen2024_E2
Canonical:: —

Also importable as: NM000255, Madsen2024_E2.

Modality: eeg; Subject type: Unknown. Subjects: 30; recordings: 291; tasks: 5.

Parameters:

cache_dir (str | Path) – Directory where data are cached locally.
query (dict | None) – Additional MongoDB-style filters to AND with the dataset selection. Must not contain the key dataset.
s3_bucket (str | None) – Base S3 bucket used to locate the data.
**kwargs (dict) – Additional keyword arguments forwarded to EEGDashDataset.

data_dir#

Local dataset cache directory (cache_dir / dataset_id).

Type:: Path

query#

Merged query with the dataset filter applied.

Type:: dict

records#

Metadata records used to build the dataset, if pre-fetched.

Type:: list[dict] | None

Notes

Each item is a recording; recording-level metadata are available via dataset.description. query supports MongoDB-style filters on fields in ALLOWED_QUERY_FIELDS and is combined with the dataset filter. Dataset-specific caveats are not provided in the summary metadata.

References

OpenNeuro dataset: https://openneuro.org/datasets/nm000255 NeMAR dataset: https://nemar.org/dataexplorer/detail?dataset_id=nm000255

Examples

>>> from eegdash.dataset import NM000255
>>> dataset = NM000255(cache_dir="./data")
>>> recording = dataset[0]
>>> raw = recording.load()

__init__(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#

save(path: str, overwrite: bool = False, offset: int = 0)[source]#

Save datasets to files by creating one subdirectory for each dataset:

path/
    0/
        0-raw.fif | 0-epo.fif
        description.json
        raw_preproc_kwargs.json (if raws were preprocessed)
        window_kwargs.json (if this is a windowed dataset)
        window_preproc_kwargs.json  (if windows were preprocessed)
        target_name.json (if target_name is not None and dataset is raw)
    1/
        1-raw.fif | 1-epo.fif
        description.json
        raw_preproc_kwargs.json (if raws were preprocessed)
        window_kwargs.json (if this is a windowed dataset)
        window_preproc_kwargs.json  (if windows were preprocessed)
        target_name.json (if target_name is not None and dataset is raw)

Parameters:

path (str) –

Directory in which subdirectories are created to store
-raw.fif | -epo.fif and .json files to.
overwrite (bool) – Whether to delete old subdirectories that will be saved to in this call.
offset (int) – If provided, the integer is added to the id of the dataset in the concat. This is useful in the setting of very large datasets, where one dataset has to be processed and saved at a time to account for its original position.

Access modesMNE → braindecode → PyTorch → ML

.rawMNE Raw object — standard tools (filter, epoch, ICA, plot_psd).mne

BaseConcatDatasetEach record is a lazy BaseDataset from braindecode — windowed via create_windows_from_events.braindecode

DataLoaderWraps the windowed dataset into a PyTorch DataLoader; supports parallel workers and on-the-fly augmentations.pytorch

Zarr cacheOptional braindecode Zarr mirror for fast resume; persisted to cache_dir.zarr

Hugging FacePre-bundled mirror at EEGDash/nm000255 · pull with datasets.load_dataset("EEGDash/nm000255").huggingface

Croissant 1.0Machine-readable JSON-LD descriptor — NM000255.croissant.json (MLCommons schema, ingestible by PyTorch / TensorFlow / JAX).mlcommons

Examples using EEGDashcurated · start here

Find datasets with the EEGDash APIQuery the catalogue, filter by task or modality, list candidates.

Load one EEG recordingResolve a single record to an MNE Raw with channels and events.

EEG recording to PyTorch DataLoaderWrap braindecode windows in a DataLoader for model training.

Preprocess EEG and create windowsFilter, resample, epoch — and persist the windowed dataset.

Save and reload prepared dataCache a windowed dataset to disk and reattach it without recompute.

Download a dataset locallyPrefetch BIDS files to a local cache and validate the layout.

Swap any load_dataset(...) call for nm000255 to reproduce the tutorial on this dataset.

Citation

Jens Madsen, Nikhil Kuppa, Lucas Parra (n.d.). The Brain, Body, and Behaviour Dataset (1.0.0) - Experiment 2.

Provenance

¹Contributed to nemar in BIDS format.

²Curated & ingested by the EEGDash catalog; see CITATION.cff for canonical reference.

Related & sibling datasets

NM000256EEG · 29 subj ON004368EEG · 39 subj ON003190EEG · 19 subj NM000202EEG · 25 subj NM000156EEG · 37 subj

+ 1 more — see See Also below →

BIDS

BIDS 1.10.0

Sidecars

events · events.json · channels · electrodes · coordsystem · eeg.json

Provenance

CC BY 4.0 · DOI not on file

Machine-readable

schema.org/Dataset · Croissant

Mirrors

OpenNeuro · NEMAR · HuggingFace · Paper

Dataset ID	`NM000255`
Title	The Brain, Body, and Behaviour Dataset (1.0.0) - Experiment 2
Author (year)	`Madsen2024_E2`
Canonical	—
Importable as	`NM000255`, `Madsen2024_E2`
Year	—
Authors	Jens Madsen, Nikhil Kuppa, Lucas Parra
License	CC BY 4.0
Citation / DOI	Unknown
Source links	OpenNeuro \| NeMAR

NM000255: eeg dataset, 30 subjects#

Quickstart#

About This Dataset#

NEMAR Metadata#

Cohort#

Dataset Statistics#

Signal · Electrodes & live trace#

NEMAR Processing Statistics#

Manifest#

File Explorer#

API Reference#

Citation

Provenance

Related & sibling datasets

See Also#