NM000255: eeg dataset, 30 subjects#
The Brain, Body, and Behaviour Dataset (1.0.0) - Experiment 2
Access recordings and metadata through EEGDash.
Citation: Jens Madsen, Nikhil Kuppa, Lucas Parra (—). The Brain, Body, and Behaviour Dataset (1.0.0) - Experiment 2.
Modality: eeg Subjects: 30 Recordings: 291 License: CC BY 4.0 Source: nemar
Metadata: Complete (90%)
Quickstart#
Install
pip install eegdash
Access the data
from eegdash.dataset import NM000255
dataset = NM000255(cache_dir="./data")
# Get the raw object of the first recording
raw = dataset.datasets[0].raw
print(raw.info)
Filter by subject
dataset = NM000255(cache_dir="./data", subject="01")
Advanced query
dataset = NM000255(
cache_dir="./data",
query={"subject": {"$in": ["01", "02"]}},
)
Iterate recordings
for rec in dataset:
print(rec.subject, rec.raw.info['sfreq'])
If you use this dataset in your research, please cite the original authors.
BibTeX
@dataset{nm000255,
title = {The Brain, Body, and Behaviour Dataset (1.0.0) - Experiment 2},
author = {Jens Madsen and Nikhil Kuppa and Lucas Parra},
}
About This Dataset#
The Brain, Body, and Behaviour Dataset - Experiment 2
Summary:
Description: Subjects watched five videos, knowing they’d be tested afterward. After each video, they answered 11 to 12 factual multiple-choice questions. Videos and questions were presented in random order. Subjects:*31,*Sessions: 2 1. Attentive - Watch videos with focus and answer questions after 2. Distracted - Watch videos while counting backwards in your head, no test after watching
View full README
The Brain, Body, and Behaviour Dataset - Experiment 2
Summary:
Description: Subjects watched five videos, knowing they’d be tested afterward. After each video, they answered 11 to 12 factual multiple-choice questions. Videos and questions were presented in random order. Subjects:*31,*Sessions: 2 1. Attentive - Watch videos with focus and answer questions after 2. Distracted - Watch videos while counting backwards in your head, no test after watching
Tasks (Stimuli)
Experiment 2
| **Stimulus ID** | **Name** | **URL** |
|-----------------|------------------------------------------|---------------------------------------------------------|
| Stim-01 | Why are Stars Star-Shaped | [Watch Here](https://www.youtube.com/embed/VVAKFJ8VVp4) |
| Stim-02 | How Modern Light Bulbs Work | [Watch Here](https://www.youtube.com/embed/oCEKMEeZXug) |
| Stim-03 | The Immune System Explained – Bacteria | [Watch Here](https://www.youtube.com/embed/zQGOcOUBi6s) |
| Stim-04 | Who Invented the Internet - And Why | [Watch Here](https://www.youtube.com/embed/21eFwbb48sE) |
| Stim-05 | Why Do We Have More Boys Than Girls | [Watch Here](https://www.youtube.com/embed/3IaYhG11ckA) |
|----------------------------------------------------------------------------------------------------------------------|
Modalities Recorded: - Gaze (X, Y) - Pupil Size - Blinks - Saccades - EOG - EEG - ECG
Experiment Setup:
In experiment 2, subjects watched 5 different informative videos not knowing they would be questioned about the content of each video all together at the end of the video-watching. This experiment was incidental learning because the subjects did not know they would be questioned about the contents of the video. We collected ECG, EEG, EOG, Head Motion, Gaze Coordinates, and Pupil Size. Sessions: - Ses-01 contains these signals recorded on subjects watching these videos in an attentive condition. They answered questions pertaining to each video after watching the videos one at a time. - Ses-02 contains the data recorded on subjects watching all of the videos again in a distracted condition in the same order as Ses-01 described above. The distraction from the stimuli was to silently count backwards from a random prime number in steps of 7. No questions were asked after this session.
Questionnaires:
Questionnaires and answers to them can be found in the phenotype/ directory. 1. stimuli_questionnaire:
The stimuli_questionnaire tsv and json files have the questions, answers and correct answers. - “Domain” question type is a general domain knowledge question that was asked before the subject watched a video. - “Memory” question type is a memory testing question that is asked after the subject finishes watching the video, and has questions that are directly pertaining to the video content. 2. asrs_questionnaire:
These tsv and json files have questions and answers to the ASRS questionnaire that is an adult ADHD symptom checklist test. The answers are options from 0 (never) to 4 (very often). - There are two different scales used to score this test, and there are 2 different parts (Part A, Part B) to this test. Screen test involves just the first 6 questions (Part A), and the full-test involves the entire 18 question test (Part A + Part B). Despite lesser questions, the screen test (first 6 ques) gives one a higher indication of ADHD prevalence than the full-test. - This is the reason why the 2 different scoring scales are based on the Screen test. Scale one is out of 6, where each question counts for just 1 point depending on the frequency of the symptom occurrence, and if one scores over 4, there is a high chance of ADHD prevalence. The next scale is out of 24 where the 5 frequencies of symptom occurrence (never, rarely, sometimes, often, very often) are assigned scores in an increasing order of 0-4, and for each question, the respective scores of the frequency is the score for that question. Out of 24, the threshold for high prevalence of ADHD is 18 for this scale.
The following are the question numbers for inattentive ADHD and Hyperactive ADHD inattentive_questions = [1,2,3,4,7,8,9,10,11,12] hyperactive_questions = [5,6,13,14,15,16,17,18]
General BIDS dataset structure overview for all MEVD experiments
Each BIDS dataset (one per experiment) has files that describe the dataset, its participants, and related metadata at the root directory - dataset_description.json, participants.tsv and participants.json, providing essential information about the study and participants to anyone working with the dataset. In the root directory, the raw data of each participant is organized by subject (sub-XX) and then further divided into sessions (ses-XX) to accommodate multi-session data collection (some experiments have 2 sessions: attentive and distracted). Inside each session folder, you’ll find modality-specific subfolders, such as: - eeg: Contains electroencephalogram data files (.bdf), event logs (events.tsv), and additional metadata (.json) that describe the experiment and recording conditions. - beh: Contains physiological recordings like ECG (electrocardiogram) and EOG (electrooculogram) stored in compressed .tsv.gz files, with accompanying metadata in .json files. - eyetrack: Contains physiological recordings like eye-tracking (gaze coordinates and pupil size), and head movement data, stored in compressed .tsv.gz files, with accompanying metadata in .json files.
| Modality | Filename Format | Data File Extension | Metadata | Notes |
|-----------|---------------------------------------------------------------------|---------------------|----------------------------------------------|--------------------------------------------------------------------------------------------|
| EEG | `sub_xx-ses_xx-task-stimxx_{file_of_interest.extension}` | `.bdf` | Exists for each file as a `.json` file | There are event files with a `.tsv` extension that include start and end times in seconds.|
| Beh | `sub_xx-ses_xx-task-stimxx_recording-{modality}_physio.{extension}` | `.tsv.gz` | Exists for each file as a `.json` file | |
| EyeTrack | `sub_xx-ses_xx-task-stimxx_{modality}_eyetrack.{extension}` | `.tsv.gz` | Exists for each file as a `.json` file | |
There is also a derivatives directory which contains preprocessed data derived from the raw recordings, such as filtered heart rate data or preprocessed physiological signals, making it easy to work with and apply advanced analyses. Files in this directory are also stored in the BIDS structure (subject-wise → session-wise → modality-wise). A brief overview on what you can expect in the derivatives directory: - eeg: Contains filtered electroencephalogram (EEG) data files (.bdf). - beh: Contains heart beats (r-peak timestamps synchronized with the stimulus), heart-rates, filtered ECG, breath-rates, and they are stored in compressed .tsv.gz files. - eyetrack: Contains saccades (timestamps), saccade-rates, fixations (timestamps), fixation-rates, blinks (timestamps), blink-rates, and filtered pupil and gaze files all stored in compressed .tsv.gz files.
How to Navigate the Dataset
Top-Level Files: Files like dataset_description.json and participants.tsv give you an overview of the study and participants, serving as your starting point when exploring a dataset.
Derivatives Folder: In this folder, processed data is organized like how raw data is organized in the BIDS directory.
Subject Folders (sub-XX): Inside these folders, data is organized by individual participants, providing separate directories for each person involved in the study.
Session Folders (ses-XX): For longitudinal or multi-session studies, session folders contain the raw and derived data for each session, making it easy to track and analyze data collected over time.
Modality-Specific Subfolders: Each session is further split into subfolders according to data modalities (e.g., EEG, behavior/physio), helping you to quickly locate the data of interest, whether it’s brain recordings, heart rate data, or eye movement information.
Tasks: Each subject is exposed to different stimuli during data collection, and BIDS uses “tasks” in filenames to clearly differentiate recordings based on these experimental conditions. This makes it easy to identify, retrieve, and analyze data associated with specific tasks across multiple modalities.
Dataset Overview
The table below is a breakdown on the total minutes of raw data available for each session across the complete dataset:
| Modality | Hours |
|----------------------------|---------|
| EEG | 65 |
| ECG | 93.5 |
| EOG | 94.35 |
| Head | 92.46 |
| Gaze | 110.56 |
| Pupil | 110.56 |
| Respiration | 44.2 |
Dataset Information#
Dataset ID |
|
Title |
The Brain, Body, and Behaviour Dataset (1.0.0) - Experiment 2 |
Author (year) |
|
Canonical |
— |
Importable as |
|
Year |
— |
Authors |
Jens Madsen, Nikhil Kuppa, Lucas Parra |
License |
CC BY 4.0 |
Citation / DOI |
Unknown |
Source links |
OpenNeuro | NeMAR | Source URL |
Found an issue with this dataset?
If you encounter any problems with this dataset (missing files, incorrect metadata, loading errors, etc.), please let us know!
Technical Details#
Subjects: 30
Recordings: 291
Tasks: 5
Channels: 64
Sampling rate (Hz): 128
Duration (hours): 0.0404166666666666
Pathology: Not specified
Modality: —
Type: —
Size on disk: 5.3 GB
File count: 291
Format: BIDS
License: CC BY 4.0
DOI: —
API Reference#
Use the NM000255 class to access this dataset programmatically.
- class eegdash.dataset.NM000255(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#
Bases:
EEGDashDatasetThe Brain, Body, and Behaviour Dataset (1.0.0) - Experiment 2
- Study:
nm000255(NeMAR)- Author (year):
Madsen2024_E2- Canonical:
—
Also importable as:
NM000255,Madsen2024_E2.Modality:
eeg. Subjects: 30; recordings: 291; tasks: 5.- Parameters:
cache_dir (str | Path) – Directory where data are cached locally.
query (dict | None) – Additional MongoDB-style filters to AND with the dataset selection. Must not contain the key
dataset.s3_bucket (str | None) – Base S3 bucket used to locate the data.
**kwargs (dict) – Additional keyword arguments forwarded to
EEGDashDataset.
- data_dir#
Local dataset cache directory (
cache_dir / dataset_id).- Type:
Path
- query#
Merged query with the dataset filter applied.
- Type:
dict
- records#
Metadata records used to build the dataset, if pre-fetched.
- Type:
list[dict] | None
Notes
Each item is a recording; recording-level metadata are available via
dataset.description.querysupports MongoDB-style filters on fields inALLOWED_QUERY_FIELDSand is combined with the dataset filter. Dataset-specific caveats are not provided in the summary metadata.References
OpenNeuro dataset: https://openneuro.org/datasets/nm000255 NeMAR dataset: https://nemar.org/dataexplorer/detail?dataset_id=nm000255
Examples
>>> from eegdash.dataset import NM000255 >>> dataset = NM000255(cache_dir="./data") >>> recording = dataset[0] >>> raw = recording.load()
See Also#
eegdash.dataset.EEGDashDataseteegdash.dataset