DS007629: eeg dataset, 1 subjects#
ROAMM
Access recordings and metadata through EEGDash.
Citation: Haorui Sun, Ardyn Vivienne Olszko, Niharika Singh, David C. Jangraw (2026). ROAMM. 10.18112/openneuro.ds007629.v1.0.2
Modality: eeg Subjects: 1 Recordings: 5 License: CC0 Source: openneuro
Metadata: Complete (100%)
Quickstart#
Install
pip install eegdash
Access the data
from eegdash.dataset import DS007629
dataset = DS007629(cache_dir="./data")
# Get the raw object of the first recording
raw = dataset.datasets[0].raw
print(raw.info)
Filter by subject
dataset = DS007629(cache_dir="./data", subject="01")
Advanced query
dataset = DS007629(
cache_dir="./data",
query={"subject": {"$in": ["01", "02"]}},
)
Iterate recordings
for rec in dataset:
print(rec.subject, rec.raw.info['sfreq'])
If you use this dataset in your research, please cite the original authors.
BibTeX
@dataset{ds007629,
title = {ROAMM},
author = {Haorui Sun and Ardyn Vivienne Olszko and Niharika Singh and David C. Jangraw},
doi = {10.18112/openneuro.ds007629.v1.0.2},
url = {https://doi.org/10.18112/openneuro.ds007629.v1.0.2},
}
About This Dataset#
ROAMM: Reading Observed At Mindless Moments
**ROAMM**is a large-scale multimodal dataset featuring simultaneous ** EEG and eye-tracking**data collected during naturalistic reading with **span-level mind-wandering annotations**. ROAMM provides a benchmark dataset for MW detection and EEG-to-text decoding tasks, and enables the study of attention-related degradation in language decoding from brain activity in naturalistic reading.
Dataset Status
- *\*Synchronized ML Dataset:* For researchers looking for the pre-processed, synchronized EEG and eye-tracking data (Pickle format), please navigate to:
derivatives/synced/
View full README
ROAMM: Reading Observed At Mindless Moments
**ROAMM**is a large-scale multimodal dataset featuring simultaneous ** EEG and eye-tracking**data collected during naturalistic reading with **span-level mind-wandering annotations**. ROAMM provides a benchmark dataset for MW detection and EEG-to-text decoding tasks, and enables the study of attention-related degradation in language decoding from brain activity in naturalistic reading.
Dataset Status
- *\*Synchronized ML Dataset:* For researchers looking for the pre-processed, synchronized EEG and eye-tracking data (Pickle format), please navigate to:
derivatives/synced/
*\*Linguistic Content:* Reading materials (words with coordinate information) are stored in derivatives/stimuli/wiki_stories. Each word is assigned a unique key to enable mapping fixated words back to their original corpus.
*\*Raw EEG (BIDS):* **Work in Progress.** We are currently converting the full raw EEG dataset for all participants into BIDS-compliant format.
Project Details
Task: Naturalistic reading of standardized articles with retrospective self-report paradigm (ReMind task).
Participants: 44 subjects (50+ hours of data).
- Modalities:
EEG (BioSemi ActiveTwo 64 channels).
Simultaneous Eye-Tracking (SR Research EyeLink 1000 Plus).
Span-level mind-wandering annotations.
Reading comprehension scores (page-level, multiple-choice questions).
Structure
This repository follows the Brain Imaging Data Structure (BIDS).
- participants.tsv: Demographic information (age, sex, handedness, ADHD/Reading Disability status).
- derivatives/synced/: Synchronized multi-modal data frames ready for Machine Learning pipelines.
Publication & Citation
The dataset paper describing the collection, synchronization, and baseline modeling of this data will be available online shortly. Once published, please use the citation provided here to credit the work.
Dataset Information#
Dataset ID |
|
Title |
ROAMM |
Author (year) |
— |
Canonical |
— |
Importable as |
|
Year |
2026 |
Authors |
Haorui Sun, Ardyn Vivienne Olszko, Niharika Singh, David C. Jangraw |
License |
CC0 |
Citation / DOI |
|
Source links |
OpenNeuro | NeMAR | Source URL |
Copy-paste BibTeX
@dataset{ds007629,
title = {ROAMM},
author = {Haorui Sun and Ardyn Vivienne Olszko and Niharika Singh and David C. Jangraw},
doi = {10.18112/openneuro.ds007629.v1.0.2},
url = {https://doi.org/10.18112/openneuro.ds007629.v1.0.2},
}
Found an issue with this dataset?
If you encounter any problems with this dataset (missing files, incorrect metadata, loading errors, etc.), please let us know!
Technical Details#
Subjects: 1
Recordings: 5
Tasks: 1
Channels: 64
Sampling rate (Hz): 256.0
Duration (hours): 0.9863834635416666
Pathology: Not specified
Modality: —
Type: —
Size on disk: 223.8 MB
File count: 5
Format: BIDS
License: CC0
DOI: doi:10.18112/openneuro.ds007629.v1.0.2
Electrode Layout#
Electrode layout — EEG · 64 sensors — 64 channels
Dataset Statistics#
Age distribution (n=44, range 18–64 yr)
Sex distribution
Channel counts: 64 ch (n=5 recordings)
Sampling frequencies: 256.0 Hz (n=5 recordings)
Total recording duration: 59 min
NEMAR Processing Statistics#
The plots below are generated by NEMAR’s automated EEG pipeline. The histogram shows pipeline success for data cleaning and ICA decomposition, the percentage of data frames and EEG channels retained after artefact removal, line noise per channel (RMS, dB), and the age/gender distribution of participants.
HED event descriptors word cloud
File Explorer#
Browse the BIDS file structure of this dataset. Records are fetched on demand from the EEGDash catalog the first time you open the explorer.
API Reference#
Use the DS007629 class to access this dataset programmatically.
- class eegdash.dataset.DS007629(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#
Bases:
EEGDashDatasetROAMM
- Study:
ds007629(OpenNeuro)- Author (year):
nan- Canonical:
—
Also importable as:
DS007629,nan.Modality:
eeg. Subjects: 1; recordings: 5; tasks: 1.- Parameters:
cache_dir (str | Path) – Directory where data are cached locally.
query (dict | None) – Additional MongoDB-style filters to AND with the dataset selection. Must not contain the key
dataset.s3_bucket (str | None) – Base S3 bucket used to locate the data.
**kwargs (dict) – Additional keyword arguments forwarded to
EEGDashDataset.
- data_dir#
Local dataset cache directory (
cache_dir / dataset_id).- Type:
Path
Notes
Each item is a recording; recording-level metadata are available via
dataset.description.querysupports MongoDB-style filters on fields inALLOWED_QUERY_FIELDSand is combined with the dataset filter. Dataset-specific caveats are not provided in the summary metadata.References
OpenNeuro dataset: https://openneuro.org/datasets/ds007629 NeMAR dataset: https://nemar.org/dataexplorer/detail?dataset_id=ds007629 DOI: https://doi.org/10.18112/openneuro.ds007629.v1.0.2
Examples
>>> from eegdash.dataset import DS007629 >>> dataset = DS007629(cache_dir="./data") >>> recording = dataset[0] >>> raw = recording.load()
- __init__(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#
- save(path: str, overwrite: bool = False, offset: int = 0)[source]#
Save datasets to files by creating one subdirectory for each dataset:
path/ 0/ 0-raw.fif | 0-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw) 1/ 1-raw.fif | 1-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw)
- Parameters:
path (str) –
- Directory in which subdirectories are created to store
-raw.fif | -epo.fif and .json files to.
overwrite (bool) – Whether to delete old subdirectories that will be saved to in this call.
offset (int) – If provided, the integer is added to the id of the dataset in the concat. This is useful in the setting of very large datasets, where one dataset has to be processed and saved at a time to account for its original position.
See Also#
eegdash.dataset.EEGDashDataseteegdash.dataset