DS007629: eeg dataset, 1 subjects#
ROAMM
Citation: Haorui Sun, Ardyn Vivienne Olszko, Niharika Singh, David C. Jangraw (—). ROAMM. 10.18112/openneuro.ds007629.v1.1.0
1-participant EEG dataset — ROAMM.
Quickstart#
Install
pip install eegdash
Access the data
from eegdash.dataset import DS007629
dataset = DS007629(cache_dir="./data")
# Get the raw object of the first recording
raw = dataset.datasets[0].raw
print(raw.info)
Filter by subject
dataset = DS007629(cache_dir="./data", subject="01")
Advanced query
dataset = DS007629(
cache_dir="./data",
query={"subject": {"$in": ["01", "02"]}},
)
Iterate recordings
for rec in dataset:
print(rec.subject, rec.raw.info['sfreq'])
If you use this dataset in your research, please cite the original authors.
BibTeX
@dataset{ds007629,
title = {ROAMM},
author = {Haorui Sun and Ardyn Vivienne Olszko and Niharika Singh and David C. Jangraw},
doi = {10.18112/openneuro.ds007629.v1.1.0},
url = {https://doi.org/10.18112/openneuro.ds007629.v1.1.0},
}
About This Dataset#
**ROAMM**is a large-scale multimodal dataset featuring simultaneous ** EEG and eye-tracking**data collected during naturalistic reading with **span-level mind-wandering annotations**. ROAMM provides a benchmark dataset for MW detection and EEG-to-text decoding tasks, and enables the study of attention-related degradation in language decoding from brain activity in naturalistic reading.
*\*Synchronized ML Dataset:* For researchers looking for the pre-processed, synchronized EEG and eye-tracking data (Pickle format), please navigate to:
derivatives/synced/
ROAMM: Reading Observed At Mindless Moments
*\*Linguistic Content:* Reading materials (words with coordinate information) are stored in
derivatives/stimuli/wiki_stories. Each word is assigned a unique key to enable mapping fixated words back to their original corpus. *\*Raw EEG (BIDS):* **Work in Progress.** We are currently converting the full raw EEG dataset for all participants into BIDS-compliant format.Project Details
Task: Naturalistic reading of standardized articles with retrospective self-report paradigm (ReMind task).
Participants: 44 subjects (50+ hours of data).
- Modalities:
EEG (BioSemi ActiveTwo 64 channels).
Simultaneous Eye-Tracking (SR Research EyeLink 1000 Plus).
Span-level mind-wandering annotations.
Reading comprehension scores (page-level, multiple-choice questions).
Structure
This repository follows the Brain Imaging Data Structure (BIDS). -
participants.tsv: Demographic information (age, sex, handedness, ADHD/Reading Disability status). -derivatives/synced/: Synchronized multi-modal data frames ready for Machine Learning pipelines.Publication & Citation
The dataset paper describing the collection, synchronization, and baseline modeling of this data will be available online shortly. Once published, please use the citation provided here to credit the work.
Cohort#
Dataset Statistics#
Age distribution by gender (n=1, range 21–21 yr, mean 22.6 yr)
Sex composition
Channel counts: 64 ch (n=5 recordings)
Sampling frequencies: 256.0 Hz (n=5 recordings)
Total recording duration: 59 min
Signal · Electrodes & live trace#
Live trace viewer — sub-10014 · task-ReMind · run-1
Showing one representative recording out of
1 subjects and 5 recordings in this dataset.
Browse the full set on OpenNeuro;
drop any other _eeg.{set,edf,bdf,vhdr} file onto the
viewer (or pass ?eeg=<url>) to inspect it.
Electrode layout — EEG · 64 sensors — 64 channels
NEMAR Processing Statistics#
The plots below are generated by NEMAR’s automated EEG pipeline. The histogram shows pipeline success for data cleaning and ICA decomposition, the percentage of data frames and EEG channels retained after artefact removal, line noise per channel (RMS, dB), and the age/gender distribution of participants.
HED event descriptors word cloud
Manifest#
File Explorer#
Browse the BIDS file structure of this dataset. Records are fetched on demand from the EEGDash catalog the first time you open the explorer.
Full dataset metadata table
Dataset ID |
|
Title |
ROAMM |
Author (year) |
— |
Canonical |
— |
Importable as |
|
Year |
— |
Authors |
Haorui Sun, Ardyn Vivienne Olszko, Niharika Singh, David C. Jangraw |
License |
CC0 |
Citation / DOI |
|
Source links |
OpenNeuro | NeMAR | Source URL |
Copy-paste BibTeX
@dataset{ds007629,
title = {ROAMM},
author = {Haorui Sun and Ardyn Vivienne Olszko and Niharika Singh and David C. Jangraw},
doi = {10.18112/openneuro.ds007629.v1.1.0},
url = {https://doi.org/10.18112/openneuro.ds007629.v1.1.0},
}
API Reference#
eegdash.datasetEEGDashDataset- class eegdash.dataset.DS007629(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#
ROAMM
- Study:
ds007629(OpenNeuro)- Author (year):
nan- Canonical:
—
Also importable as:
DS007629,nan.Modality:
eeg. Subjects: 1; recordings: 5; tasks: 1.- Parameters:
cache_dir (str | Path) – Directory where data are cached locally.
query (dict | None) – Additional MongoDB-style filters to AND with the dataset selection. Must not contain the key
dataset.s3_bucket (str | None) – Base S3 bucket used to locate the data.
**kwargs (dict) – Additional keyword arguments forwarded to
EEGDashDataset.
- data_dir#
Local dataset cache directory (
cache_dir / dataset_id).- Type:
Path
Notes
Each item is a recording; recording-level metadata are available via
dataset.description.querysupports MongoDB-style filters on fields inALLOWED_QUERY_FIELDSand is combined with the dataset filter. Dataset-specific caveats are not provided in the summary metadata.References
OpenNeuro dataset: https://openneuro.org/datasets/ds007629 NeMAR dataset: https://nemar.org/dataexplorer/detail?dataset_id=ds007629 DOI: https://doi.org/10.18112/openneuro.ds007629.v1.1.0
Examples
>>> from eegdash.dataset import DS007629 >>> dataset = DS007629(cache_dir="./data") >>> recording = dataset[0] >>> raw = recording.load()
- __init__(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#
- save(path: str, overwrite: bool = False, offset: int = 0)[source]#
Save datasets to files by creating one subdirectory for each dataset:
path/ 0/ 0-raw.fif | 0-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw) 1/ 1-raw.fif | 1-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw)
- Parameters:
path (str) –
- Directory in which subdirectories are created to store
-raw.fif | -epo.fif and .json files to.
overwrite (bool) – Whether to delete old subdirectories that will be saved to in this call.
offset (int) – If provided, the integer is added to the id of the dataset in the concat. This is useful in the setting of very large datasets, where one dataset has to be processed and saved at a time to account for its original position.
BaseDataset from braindecode — windowed via create_windows_from_events.braindecodeDataLoader; supports parallel workers and on-the-fly augmentations.pytorchSwap any load_dataset(...) call for ds007629 to reproduce the tutorial on this dataset.
Citation
Haorui Sun, Ardyn Vivienne Olszko, Niharika Singh, David C. Jangraw (n.d.). ROAMM. 10.18112/openneuro.ds007629.v1.1.0
Provenance
¹Contributed to openneuro in BIDS format.
²Curated & ingested by the EEGDash catalog; see CITATION.cff for canonical reference.
³Persistent identifier: 10.18112/openneuro.ds007629.v1.1.0.
See Also#
eegdash.dataset.EEGDashDataseteegdash.dataset