DS004356: eeg dataset, 22 subjects#
Subcortical responses to music and speech are alike while cortical responses diverge
Citation: Tong Shan, Madeline S. Cappelloni, Ross K. Maddox (2024). Subcortical responses to music and speech are alike while cortical responses diverge. 10.18112/openneuro.ds004356.v2.2.1
22-participant EEG dataset — Subcortical responses to music and speech are alike while cortical responses diverge.
Quickstart#
Install
pip install eegdash
Access the data
from eegdash.dataset import DS004356
dataset = DS004356(cache_dir="./data")
# Get the raw object of the first recording
raw = dataset.datasets[0].raw
print(raw.info)
Filter by subject
dataset = DS004356(cache_dir="./data", subject="01")
Advanced query
dataset = DS004356(
cache_dir="./data",
query={"subject": {"$in": ["01", "02"]}},
)
Iterate recordings
for rec in dataset:
print(rec.subject, rec.raw.info['sfreq'])
If you use this dataset in your research, please cite the original authors.
BibTeX
@dataset{ds004356,
title = {Subcortical responses to music and speech are alike while cortical responses diverge},
author = {Tong Shan and Madeline S. Cappelloni and Ross K. Maddox},
doi = {10.18112/openneuro.ds004356.v2.2.1},
url = {https://doi.org/10.18112/openneuro.ds004356.v2.2.1},
}
About This Dataset#
Please contact the following authors for further information:
Tong Shan (email: tshan@ur.rochester.edu)
Ross K. Maddox (email: rmaddox@ur.rochester.edu)
The goal of this study is to derive Auditory Brainstem Response (ABR) from continuous music and speech stimuli using deconvolution method. Data collected from Jun to Aug, 2021.
README
Details related to access to the data
The details of the experiment can be found at Shan et al. (2024). There were two phases in this experiment. For the first phase, ten trials of one-minute clicks were presented to the subjects. For the second phase, the 12 types (six genres of music and six types of speech) of 12 s stimuli clips were presented. There were 40 trials for each type with shuffled order. Between trials, there was a 0.5 s pause.
The code for stimulus preprocessing and EEG analysis is available on Github: maddoxlab/Music_vs_Speech_abr
View full README
README
Details related to access to the data
The details of the experiment can be found at Shan et al. (2024). There were two phases in this experiment. For the first phase, ten trials of one-minute clicks were presented to the subjects. For the second phase, the 12 types (six genres of music and six types of speech) of 12 s stimuli clips were presented. There were 40 trials for each type with shuffled order. Between trials, there was a 0.5 s pause.
The code for stimulus preprocessing and EEG analysis is available on Github: maddoxlab/Music_vs_Speech_abr
Format
This dataset is formatted according to the EEG Brain Imaging Data Structure. It includes EEG recording from subject 001 to subject 024 (excluding subject 014 and subject 021) in raw brainvision format (including
.eeg,.vhdr, and.vmrktriplet) and stimuli files in format of.wav.For some subjects (sub-03 & sub-19), there are 2 “runs” of data that the first run (
run-01) only contains the click phase (phase 1), and the second run includes the data for the ABR analysis. Triggers with values of “1” were recorded to the onset of the stimulus, and shortly after triggers with values of “4” or “8” were stamped to indicate the stimulus types and the trial number out of 40. This was done by converting the decimal trial number to bits, denoted b, then calculating 2 ** (b + 2). Triggers of “999” denote the start of a new segment of EEG. We’ve specified these trial numbers and more metadata of the events in each of the*_eeg_events.tsvfile, which is sufficient to know which trial corresponded to which type of stimulus and which file.Subjects
24 subjects participated in this study. Subject inclusion criteria 1. Age between 18-40. 2. Normal hearing: audiometric thresholds of 20 dB HL or better from 500 to 8000 Hz. 3. Speak English as their primary language. 4. Self-reported normal or correctable to normal vision.
Subject exclusion criteria 1. Subject 014 self-withdrew partway through the experiment. 2. Subject 021 was excluded because of technical problems during data collection that led to unusable data.
Therefore, after excluding the two subjects, there were 22 subjects (11 male and 11 female) with an age of 22.7 ± 5.1 (mean ± SD) years that we included in the analysis. Please see
subjects.tsvfor more demography.Apparatus
Subjects were seated in a sound-isolating booth on a chair in front of a 24-inch BenQ monitor with a viewing distance of approximately 60 cm. Stimuli were presented at an average level of 65 dB SPL and a sampling rate of 48000 Hz through ER-2 insert earphones plugged into an RME Babyface Pro digital sound card. The stimulus presentation for the experiment was controlled by a python script using a custom package,
expyfun.
Cohort#
Dataset Statistics#
Age distribution by gender (n=22, range 19–37 yr, mean 22.7 yr)
Sex composition
Channel counts: 34 ch (n=24 recordings)
Sampling frequencies: 10000.0 Hz (n=24 recordings)
Total recording duration: 46 h
Signal · Electrodes & live trace#
Live trace viewer — sub-13 · task-MusicvsSpeech
Showing one representative recording out of
22 subjects and 24 recordings in this dataset.
Browse the full set on OpenNeuro;
drop any other _eeg.{set,edf,bdf,vhdr} file onto the
viewer (or pass ?eeg=<url>) to inspect it.
Electrode layout — EEG · 32 sensors — 32 channels
NEMAR Processing Statistics#
The plots below are generated by NEMAR’s automated EEG pipeline. The histogram shows pipeline success for data cleaning and ICA decomposition, the percentage of data frames and EEG channels retained after artefact removal, line noise per channel (RMS, dB), and the age/gender distribution of participants.
HED event descriptors word cloud
Manifest#
File Explorer#
Browse the BIDS file structure of this dataset. Records are fetched on demand from the EEGDash catalog the first time you open the explorer.
Full dataset metadata table
Dataset ID |
|
Title |
Subcortical responses to music and speech are alike while cortical responses diverge |
Author (year) |
|
Canonical |
— |
Importable as |
|
Year |
2024 |
Authors |
Tong Shan, Madeline S. Cappelloni, Ross K. Maddox |
License |
CC0 |
Citation / DOI |
|
Source links |
OpenNeuro | NeMAR | Source URL |
Copy-paste BibTeX
@dataset{ds004356,
title = {Subcortical responses to music and speech are alike while cortical responses diverge},
author = {Tong Shan and Madeline S. Cappelloni and Ross K. Maddox},
doi = {10.18112/openneuro.ds004356.v2.2.1},
url = {https://doi.org/10.18112/openneuro.ds004356.v2.2.1},
}
API Reference#
eegdash.datasetEEGDashDatasetDS004356 · Shan2022eegdash/dataset/registry.py · [source ↗]- class eegdash.dataset.DS004356(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#
Subcortical responses to music and speech are alike while cortical responses diverge
- Study:
ds004356(OpenNeuro)- Author (year):
Shan2022- Canonical:
—
Also importable as:
DS004356,Shan2022.Modality:
eeg. Subjects: 22; recordings: 24; tasks: 1.- Parameters:
cache_dir (str | Path) – Directory where data are cached locally.
query (dict | None) – Additional MongoDB-style filters to AND with the dataset selection. Must not contain the key
dataset.s3_bucket (str | None) – Base S3 bucket used to locate the data.
**kwargs (dict) – Additional keyword arguments forwarded to
EEGDashDataset.
- data_dir#
Local dataset cache directory (
cache_dir / dataset_id).- Type:
Path
Notes
Each item is a recording; recording-level metadata are available via
dataset.description.querysupports MongoDB-style filters on fields inALLOWED_QUERY_FIELDSand is combined with the dataset filter. Dataset-specific caveats are not provided in the summary metadata.References
OpenNeuro dataset: https://openneuro.org/datasets/ds004356 NeMAR dataset: https://nemar.org/dataexplorer/detail?dataset_id=ds004356 DOI: https://doi.org/10.18112/openneuro.ds004356.v2.2.1 NEMAR citation count: 2
Examples
>>> from eegdash.dataset import DS004356 >>> dataset = DS004356(cache_dir="./data") >>> recording = dataset[0] >>> raw = recording.load()
- __init__(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#
- save(path: str, overwrite: bool = False, offset: int = 0)[source]#
Save datasets to files by creating one subdirectory for each dataset:
path/ 0/ 0-raw.fif | 0-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw) 1/ 1-raw.fif | 1-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw)
- Parameters:
path (str) –
- Directory in which subdirectories are created to store
-raw.fif | -epo.fif and .json files to.
overwrite (bool) – Whether to delete old subdirectories that will be saved to in this call.
offset (int) – If provided, the integer is added to the id of the dataset in the concat. This is useful in the setting of very large datasets, where one dataset has to be processed and saved at a time to account for its original position.
BaseDataset from braindecode — windowed via create_windows_from_events.braindecodeDataLoader; supports parallel workers and on-the-fly augmentations.pytorchdatasets.load_dataset("EEGDash/ds004356").huggingfaceSwap any load_dataset(...) call for ds004356 to reproduce the tutorial on this dataset.
Citation
Tong Shan, Madeline S. Cappelloni, Ross K. Maddox (2024). Subcortical responses to music and speech are alike while cortical responses diverge. 10.18112/openneuro.ds004356.v2.2.1
Provenance
¹Contributed to openneuro in BIDS format.
²Curated & ingested by the EEGDash catalog; see CITATION.cff for canonical reference.
³Persistent identifier: 10.18112/openneuro.ds004356.v2.2.1.
See Also#
eegdash.dataset.EEGDashDataseteegdash.dataset