DS006104: eeg dataset, 24 subjects#
EEG dataset for speech decoding
Access recordings and metadata through EEGDash.
Citation: João Pedro Carvalho Moreira, Vinícius Rezende Carvalho, Eduardo Mazoni Andrade Marçal Mendes, Ariah Fallah, Terrence J. Sejnowski, Claudia Lainscsek, Lindy Comstock (2025). EEG dataset for speech decoding. 10.18112/openneuro.ds006104.v1.0.1
Modality: eeg Subjects: 24 Recordings: 56 License: CC0 Source: openneuro
Metadata: Complete (100%)
Quickstart#
Install
pip install eegdash
Access the data
from eegdash.dataset import DS006104
dataset = DS006104(cache_dir="./data")
# Get the raw object of the first recording
raw = dataset.datasets[0].raw
print(raw.info)
Filter by subject
dataset = DS006104(cache_dir="./data", subject="01")
Advanced query
dataset = DS006104(
cache_dir="./data",
query={"subject": {"$in": ["01", "02"]}},
)
Iterate recordings
for rec in dataset:
print(rec.subject, rec.raw.info['sfreq'])
If you use this dataset in your research, please cite the original authors.
BibTeX
@dataset{ds006104,
title = {EEG dataset for speech decoding},
author = {João Pedro Carvalho Moreira and Vinícius Rezende Carvalho and Eduardo Mazoni Andrade Marçal Mendes and Ariah Fallah and Terrence J. Sejnowski and Claudia Lainscsek and Lindy Comstock},
doi = {10.18112/openneuro.ds006104.v1.0.1},
url = {https://doi.org/10.18112/openneuro.ds006104.v1.0.1},
}
About This Dataset#
EEG dataset for speech decoding
Dataset Overview
This dataset contains EEG recordings from a phoneme discrimination task with TMS. The data were collected during two related studies in 2019 and 2021. Study 1 (2019, Session 01): - 8 participants (P01-P08)
View full README
EEG dataset for speech decoding
Dataset Overview
This dataset contains EEG recordings from a phoneme discrimination task with TMS. The data were collected during two related studies in 2019 and 2021. Study 1 (2019, Session 01): - 8 participants (P01-P08) - Focus on CV and VC phoneme pairs - 2 blocks: CV pairs and VC pairs - TMS targeted to LipM1 (-56, -8, 46) and TongueM1 (-60, -10, 25)
Study 2 (2021, Session 02): - 16 participants (S01-S16) - Expanded to include single phonemes and phoneme triplets - 4 blocks: single phonemes, CV pairs, real words, and pseudowords - Additional TMS targets included Broca’s area (BA 44: -51, 7, 23) and verbal memory region (BA 6: -46, 1, 41)
Task Description
Participants listened to speech sounds and identified stimuli with a button-press response. The stimuli included: 1. Single phonemes - Consonants (/b/, /p/, /d/, /t/, /s/, /z/) and vowels (/i/, /E/, /A/, /u/, /oU/) 2. Phoneme pairs - CV and VC combinations of the phonemes 3. Phoneme triplets - Real and pseudowords constructed of CVC sequences
TMS Methodology
Detailed information about TMS parameters can be found in the sourcedata/tms_metadata/tms_parameters.json file. TMS was applied using a Magstim Super Rapid Plus1 stimulator with a figure-of-eight 40 mm coil. Stimulation was delivered at 110% of resting motor threshold as paired pulses with 50ms interpulse interval. Detailed information about the methodology and results can be found in the associated publication: Moreira et al. “An open-access EEG dataset for speech decoding: Exploring the role of articulation and coarticulation”
Directory Structure
The dataset follows BIDS convention with the following structure: /sub-[subject]/ses-[session]/eeg/ Where subject is P01-P08 for Study 1 and S01-S16 for Study 2. Session is 01 for Study 1 and 02 for Study 2.
Contact Information
For questions about this dataset, please contact Lindy Comstock at lbcomstock@ucla.edu
Dataset Information#
Dataset ID |
|
Title |
EEG dataset for speech decoding |
Author (year) |
|
Canonical |
— |
Importable as |
|
Year |
2025 |
Authors |
João Pedro Carvalho Moreira, Vinícius Rezende Carvalho, Eduardo Mazoni Andrade Marçal Mendes, Ariah Fallah, Terrence J. Sejnowski, Claudia Lainscsek, Lindy Comstock |
License |
CC0 |
Citation / DOI |
|
Source links |
OpenNeuro | NeMAR | Source URL |
Copy-paste BibTeX
@dataset{ds006104,
title = {EEG dataset for speech decoding},
author = {João Pedro Carvalho Moreira and Vinícius Rezende Carvalho and Eduardo Mazoni Andrade Marçal Mendes and Ariah Fallah and Terrence J. Sejnowski and Claudia Lainscsek and Lindy Comstock},
doi = {10.18112/openneuro.ds006104.v1.0.1},
url = {https://doi.org/10.18112/openneuro.ds006104.v1.0.1},
}
Found an issue with this dataset?
If you encounter any problems with this dataset (missing files, incorrect metadata, loading errors, etc.), please let us know!
Technical Details#
Subjects: 24
Recordings: 56
Tasks: 3
Channels: 61 (53), 83 (3)
Sampling rate (Hz): 2000.0
Duration (hours): 50.75694444444444
Pathology: Healthy
Modality: Auditory
Type: Perception
Size on disk: 43.0 GB
File count: 56
Format: BIDS
License: CC0
DOI: doi:10.18112/openneuro.ds006104.v1.0.1
API Reference#
Use the DS006104 class to access this dataset programmatically.
- class eegdash.dataset.DS006104(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#
Bases:
EEGDashDatasetEEG dataset for speech decoding
- Study:
ds006104(OpenNeuro)- Author (year):
Moreira2025- Canonical:
—
Also importable as:
DS006104,Moreira2025.Modality:
eeg; Experiment type:Perception; Subject type:Healthy. Subjects: 24; recordings: 56; tasks: 3.- Parameters:
cache_dir (str | Path) – Directory where data are cached locally.
query (dict | None) – Additional MongoDB-style filters to AND with the dataset selection. Must not contain the key
dataset.s3_bucket (str | None) – Base S3 bucket used to locate the data.
**kwargs (dict) – Additional keyword arguments forwarded to
EEGDashDataset.
- data_dir#
Local dataset cache directory (
cache_dir / dataset_id).- Type:
Path
- query#
Merged query with the dataset filter applied.
- Type:
dict
- records#
Metadata records used to build the dataset, if pre-fetched.
- Type:
list[dict] | None
Notes
Each item is a recording; recording-level metadata are available via
dataset.description.querysupports MongoDB-style filters on fields inALLOWED_QUERY_FIELDSand is combined with the dataset filter. Dataset-specific caveats are not provided in the summary metadata.References
OpenNeuro dataset: https://openneuro.org/datasets/ds006104 NeMAR dataset: https://nemar.org/dataexplorer/detail?dataset_id=ds006104 DOI: https://doi.org/10.18112/openneuro.ds006104.v1.0.1
Examples
>>> from eegdash.dataset import DS006104 >>> dataset = DS006104(cache_dir="./data") >>> recording = dataset[0] >>> raw = recording.load()
See Also#
eegdash.dataset.EEGDashDataseteegdash.dataset