DS005555: eeg dataset, 128 subjects#
The Bitbrain Open Access Sleep (BOAS) dataset
Citation: Eduardo López-Larraz, María Sierra-Torralba, Sergio Clemente, Galit Fierro, David Oriol, Javier Minguez, Luis Montesano, Jens G. Klinzing (2015). The Bitbrain Open Access Sleep (BOAS) dataset. 10.18112/openneuro.ds005555.v1.1.1
128-participant EEG dataset — The Bitbrain Open Access Sleep (BOAS) dataset.
Quickstart#
Install
pip install eegdash
Access the data
from eegdash.dataset import DS005555
dataset = DS005555(cache_dir="./data")
# Get the raw object of the first recording
raw = dataset.datasets[0].raw
print(raw.info)
Filter by subject
dataset = DS005555(cache_dir="./data", subject="01")
Advanced query
dataset = DS005555(
cache_dir="./data",
query={"subject": {"$in": ["01", "02"]}},
)
Iterate recordings
for rec in dataset:
print(rec.subject, rec.raw.info['sfreq'])
If you use this dataset in your research, please cite the original authors.
BibTeX
@dataset{ds005555,
title = {The Bitbrain Open Access Sleep (BOAS) dataset},
author = {Eduardo López-Larraz and María Sierra-Torralba and Sergio Clemente and Galit Fierro and David Oriol and Javier Minguez and Luis Montesano and Jens G. Klinzing},
doi = {10.18112/openneuro.ds005555.v1.1.1},
url = {https://doi.org/10.18112/openneuro.ds005555.v1.1.1},
}
About This Dataset#
The Bitbrain Open Access Sleep (BOAS) dataset.
This project aimed at bridging the gap between gold-standard clinical sleep monitoring and emerging wearable EEG technologies. The dataset contains data from **128 nights**in which participants were simultaneously monitored with two technologies: a ** Brain Quick Plus Evolution PSG system by Micromed**and a **wearable EEG headband by Bitbrain**. The Micromed PSG system records a comprehensive and clinically validated set of physiological sleep parameters, while the Bitbrain wearable EEG headband offers a user-friendly, self-administered alternative, limited to forehead EEG electrodes, movement sensors, and photo-plethysmography. **Data from both systems were acquired simultaneously**, allowing for direct comparison and validation of the wearable EEG device against the established PSG standard. This dual-recording approach provides a rich resource for evaluating the performance and potential of wearable EEG technology in sleep studies.
Human sleep scoring: To ensure robust and reliable sleep staging, we followed a rigorous labeling process. Three expert sleep scorers independently annotated the PSG recordings**following criteria developed by the American Academy of Sleep Medicine (AASM) (Berry et al., 2015). From the resulting three scorings, a ** consensus label was derived: each epoch of sleep data received the label scored by at least two of the scorers. In cases where all three scorers had given different labels, a fourth scorer made the final decision. This consensus labeling approach addresses the inherent variability in human-derived sleep scoring, with an estimated inter-scorer agreement of approximately 85% (Danker-Hopfe et al., 2009; Rosenberg and Van Hout, 2013). Automatic scoring: We used the human expert consensus labels to train a deep learning model (Esparza-Iaizzo et al., 2024). By implementing a cross-validation procedure, we trained and validated the model separately on the PSG and wearable EEG datasets. The model achieved an 87.13% match between human-consensus and network-provided labels for the PSG data, and an 86.71% match for the wearable EEG data.
README
Our dataset includes: 1. PSG recordings**from 128 nights (files ending with “*psg_eeg.edf*”), 2. **Wearable EEG recordings**from the same nights (files ending with “*headband_eeg.edf*”), 3. **Human-consensus sleep stage labels, obtained from the PSG recordings (”stage_hum” in the PSG data’s event files), 4. AI-generated sleep stage labels, separately obtained from PSG recordings and from wearable EEG recordings (”stage_ai” in both the PSG and headband data’s event files). 5. **Further meta data**for each recording (i.e., the participants’ age, sex, and BMI, provided in the file “participants.tsv”)
Participants
View full README
README
Our dataset includes: 1. PSG recordings**from 128 nights (files ending with “*psg_eeg.edf*”), 2. **Wearable EEG recordings**from the same nights (files ending with “*headband_eeg.edf*”), 3. **Human-consensus sleep stage labels, obtained from the PSG recordings (”stage_hum” in the PSG data’s event files), 4. AI-generated sleep stage labels, separately obtained from PSG recordings and from wearable EEG recordings (”stage_ai” in both the PSG and headband data’s event files). 5. **Further meta data**for each recording (i.e., the participants’ age, sex, and BMI, provided in the file “participants.tsv”)
Participants
Participants were members of the general population, provided written informed consent, and received economic compensation of 50€ per night.
In order to represent the general population, we recruited a broad spectrum of participants along the dimensions of age, sex, and body mass index. We did not recruit patients with particular health conditions but only excluded severe conditions that could have affected the feasibility or safety of the study. In detail, inclusion and exclusion criteria were as follows. Inclusion criteria - Age > 18 years, - Sufficient knowledge of Spanish to understand the explanatory text, the consent form and study-related instructions.
Exclusion criteria - Current severe medical interventions or medication, - History of severe neurological or psychiatric disorders, - Severe health problems in the last 12 months (especially neurological or cardiac disorders), - Current pregnancy or nursing, - Use of psychotropic medication, benzodiazepines, gamma-hydroxybutyric acid, and similar drugs before or during the study.
Format
The dataset is formatted according to the Brain Imaging Data Structure (BIDS). Please note that while the recordings are named from sub-1 up to sub-128, some come from the same participants. 108 unique individuals participated in the recordings, data of which can be matched using the pid (= unique participant ID) property in the file “participants.tsv” The folder of each recording contains the data recorded with the PSG (”sub-xx_task-Sleep_acq-psg_eeg.edf”) and with the wearable EEG headband (”sub-xx_task-Sleep_acq-headband_eeg.edf”). Channel groups Not all recordings contain data from all available sensors. The full list of available sensors for each recording can be obtained on the “channels.tsv” file. Channels in this file are coded in groups: - PSG_EEG: Electroencephalography recorded with the PSG system. Channels available are F3, F4, C3, C4, O1, O2 (PSG_F3, PSG_F4, PSG_C3, PSG_C4, PSG_O1, PSG_O2). - PSG_EOG: Electrooculography signals recorded with the PSG system. The location of the EOG electrodes was lateral of the eyes; one slightly lower than the participant’s left eye and one slightly higher than the participant’s right eye (according to AASM guidelines). For recordings containing only one EOG channel (PSG_EOG), the electrodes were recorded as a bipolar derivation. If two EOG channels are present (PSG_EOGR, PSG_EOGL), both electrodes were referenced against the left mastoid. - PSG_EMG: Electromyography signals recorded with the PSG system. Data contain a single EMG channel (PSG_EMG), which is the result of a bipolar derivation of two chin electrodes. - PSG_BELTS: Breathing activity recorded by the PSG system using abdominal and thoracic breathing belts (PSG_ABD, PSG_THOR). - PSG_THER: Respiratory airflow recorded with the PSG system using a thermistor (PSG_THER). - PSG_CAN: Respiratory airflow recorded with the PSG system using a nasal cannula (PSG_CAN). - PSG_PPG: Photopletismographic (PPG) activity recorded with the PSG system. Channels available are pulse (PSG_PULSE), heart beat (PSG_BEAT) and oxygen saturation (PSG_SPO2). - HB_EEG: Electroencephalography recorded with the wearable EEG headband. Headband channels are approximately located at AF7 and AF8 (HB_1, HB_2). - HB_IMU: Movement activity recorded by an Inertial Measurement Unit (IMU) in the headband. Signals are derived from an accelerometer (HB_IMU_1, HB_IMU_2, HB_IMU_3) and gyroscope (HB_IMU_4, HB_IMU_5, HB_IMU_6), both recording signals for all three spatial dimensions. - HB_PULSE: Pulse activity recorded with the wearable EEG headband using a PPG sensor (HB_PULSE).
Sleep staging labels The sleep stage labels for each recording are coded as events in corresponding event files (stage_hum and stage_ai; see above). Stages are coded as follows: - 0: Wake, - 1: NonREM sleep stage 1 (N1), - 2: NonREM sleep stage 2 (N2), - 3: NonREM sleep stage 3 (N3), - 4: REM sleep, - 8: PSG disconnections (e.g., due to bathroom breaks; human-scored only) - -2: Artifacts and missing data (AI-scored only)
References
Berry, R. B., Brooks, R., Gamaldo, C. E., Harding, S. M., Lloyd, R. M., Marcus, C. L., et al. (2015). The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications, Version 2.2. Darien, Illinois.
Danker-Hopfe, H., Anderer, P., Zeitlhofer, J., Boeck, M., Dorn, H., Gruber, G., et al. (2009). Interrater reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM standard. J. Sleep Res. 18, 74–84. doi: 10.1111/j.1365-2869.2008.00700.x. Esparza-Iaizzo, M., Sierra-Torralba, M., Klinzing, J. G., Minguez, J., Montesano, L., and López-Larraz, E. (2024). Automatic sleep scoring for real-time monitoring and stimulation in individuals with and without sleep apnea. bioRxiv, 2024.06.12.597764. doi: 10.1101/2024.06.12.597764.
Rosenberg, R. S., and Van Hout, S. (2013). The American Academy of Sleep Medicine inter-scorer reliability program: sleep stage scoring. J. Clin. sleep Med. 9, 81–87. doi: 10.5664/jcsm.2350.
Contact
If you have any questions or comments, please contact:
Eduardo López-Larraz: eduardo.lopez@bitbrain.com Jens G. Klinzing: jens.klinzing@bitbrain.com
Cohort#
Dataset Statistics#
Age distribution by gender (n=127, range 18–82 yr, mean 42.0 yr)
Sex composition
Channel counts (ch)
Sampling frequencies: 256.0 Hz (n=256 recordings)
Total recording duration: 2002 h
Signal · Electrodes & live trace#
Live trace viewer — sub-13 · task-Sleep
Showing one representative recording out of
128 subjects and 256 recordings in this dataset.
Browse the full set on OpenNeuro;
drop any other _eeg.{set,edf,bdf,vhdr} file onto the
viewer (or pass ?eeg=<url>) to inspect it.
No scalp electrode layout is currently indexed for this dataset. Once the eegdash montage registry ingests it, the interactive viewer will appear here automatically.
NEMAR Processing Statistics#
The plots below are generated by NEMAR’s automated EEG pipeline. The histogram shows pipeline success for data cleaning and ICA decomposition, the percentage of data frames and EEG channels retained after artefact removal, line noise per channel (RMS, dB), and the age/gender distribution of participants.
HED event descriptors word cloud
Manifest#
File Explorer#
Browse the BIDS file structure of this dataset. Records are fetched on demand from the EEGDash catalog the first time you open the explorer.
Full dataset metadata table
Dataset ID |
|
Title |
The Bitbrain Open Access Sleep (BOAS) dataset |
Author (year) |
|
Canonical |
— |
Importable as |
|
Year |
2015 |
Authors |
Eduardo López-Larraz, María Sierra-Torralba, Sergio Clemente, Galit Fierro, David Oriol, Javier Minguez, Luis Montesano, Jens G. Klinzing |
License |
CC0 |
Citation / DOI |
|
Source links |
OpenNeuro | NeMAR | Source URL |
Copy-paste BibTeX
@dataset{ds005555,
title = {The Bitbrain Open Access Sleep (BOAS) dataset},
author = {Eduardo López-Larraz and María Sierra-Torralba and Sergio Clemente and Galit Fierro and David Oriol and Javier Minguez and Luis Montesano and Jens G. Klinzing},
doi = {10.18112/openneuro.ds005555.v1.1.1},
url = {https://doi.org/10.18112/openneuro.ds005555.v1.1.1},
}
API Reference#
eegdash.datasetEEGDashDatasetDS005555 · LopezLarraz2024eegdash/dataset/registry.py · [source ↗]- class eegdash.dataset.DS005555(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#
The Bitbrain Open Access Sleep (BOAS) dataset
- Study:
ds005555(OpenNeuro)- Author (year):
LopezLarraz2024- Canonical:
—
Also importable as:
DS005555,LopezLarraz2024.Modality:
eeg; Experiment type:Sleep; Subject type:Healthy. Subjects: 128; recordings: 256; tasks: 1.- Parameters:
cache_dir (str | Path) – Directory where data are cached locally.
query (dict | None) – Additional MongoDB-style filters to AND with the dataset selection. Must not contain the key
dataset.s3_bucket (str | None) – Base S3 bucket used to locate the data.
**kwargs (dict) – Additional keyword arguments forwarded to
EEGDashDataset.
- data_dir#
Local dataset cache directory (
cache_dir / dataset_id).- Type:
Path
Notes
Each item is a recording; recording-level metadata are available via
dataset.description.querysupports MongoDB-style filters on fields inALLOWED_QUERY_FIELDSand is combined with the dataset filter. Dataset-specific caveats are not provided in the summary metadata.References
OpenNeuro dataset: https://openneuro.org/datasets/ds005555 NeMAR dataset: https://nemar.org/dataexplorer/detail?dataset_id=ds005555 DOI: https://doi.org/10.18112/openneuro.ds005555.v1.1.1 NEMAR citation count: 0
Examples
>>> from eegdash.dataset import DS005555 >>> dataset = DS005555(cache_dir="./data") >>> recording = dataset[0] >>> raw = recording.load()
- __init__(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#
- save(path: str, overwrite: bool = False, offset: int = 0)[source]#
Save datasets to files by creating one subdirectory for each dataset:
path/ 0/ 0-raw.fif | 0-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw) 1/ 1-raw.fif | 1-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw)
- Parameters:
path (str) –
- Directory in which subdirectories are created to store
-raw.fif | -epo.fif and .json files to.
overwrite (bool) – Whether to delete old subdirectories that will be saved to in this call.
offset (int) – If provided, the integer is added to the id of the dataset in the concat. This is useful in the setting of very large datasets, where one dataset has to be processed and saved at a time to account for its original position.
BaseDataset from braindecode — windowed via create_windows_from_events.braindecodeDataLoader; supports parallel workers and on-the-fly augmentations.pytorchdatasets.load_dataset("EEGDash/ds005555").huggingfaceSwap any load_dataset(...) call for ds005555 to reproduce the tutorial on this dataset.
Citation
Eduardo López-Larraz, María Sierra-Torralba, Sergio Clemente, Galit Fierro, David Oriol, … (2015). The Bitbrain Open Access Sleep (BOAS) dataset. 10.18112/openneuro.ds005555.v1.1.1
Provenance
¹Contributed to openneuro in BIDS format.
²Curated & ingested by the EEGDash catalog; see CITATION.cff for canonical reference.
³Persistent identifier: 10.18112/openneuro.ds005555.v1.1.1.
See Also#
eegdash.dataset.EEGDashDataseteegdash.dataset