DS006923: eeg dataset, 140 subjects#
Dataset of Electroencephalograms of Juvenile Offenders
Citation: Aura Polo, Elmer León, Mariana Pino-Melgarejo, Julie Viloria-Porto (20). Dataset of Electroencephalograms of Juvenile Offenders. 10.18112/openneuro.ds006923.v1.0.0
140-participant EEG dataset — Dataset of Electroencephalograms of Juvenile Offenders.
Quickstart#
Install
pip install eegdash
Access the data
from eegdash.dataset import DS006923
dataset = DS006923(cache_dir="./data")
# Get the raw object of the first recording
raw = dataset.datasets[0].raw
print(raw.info)
Filter by subject
dataset = DS006923(cache_dir="./data", subject="01")
Advanced query
dataset = DS006923(
cache_dir="./data",
query={"subject": {"$in": ["01", "02"]}},
)
Iterate recordings
for rec in dataset:
print(rec.subject, rec.raw.info['sfreq'])
If you use this dataset in your research, please cite the original authors.
BibTeX
@dataset{ds006923,
title = {Dataset of Electroencephalograms of Juvenile Offenders},
author = {Aura Polo and Elmer León and Mariana Pino-Melgarejo and Julie Viloria-Porto},
doi = {10.18112/openneuro.ds006923.v1.0.0},
url = {https://doi.org/10.18112/openneuro.ds006923.v1.0.0},
}
About This Dataset#
Desarrollo de un sistema inteligente multiparamétrico para el reconocimiento de patrones asociados a disfunciones neurocognitivas en jóvenes en conflicto con la ley en el departamento del Atlántico.
2021
Dataset of Electroencephalograms of Juvenile Offenders
Project’s name
Authors and acknowledgment
Aura Polo, Elmer León, Mariana Pino-Melgarejo and Julie Viloria-Porto.
Ronald Ruiz for his assistance during the data collection process, and Sergio Miranda for his dedication to data processing and cleaning.
View full README
Dataset of Electroencephalograms of Juvenile Offenders
Project’s name
Authors and acknowledgment
Aura Polo, Elmer León, Mariana Pino-Melgarejo and Julie Viloria-Porto.
Ronald Ruiz for his assistance during the data collection process, and Sergio Miranda for his dedication to data processing and cleaning.
Work team
* MAGMA Ingeniería research group * Hogares Claret foundation
Institutions
Institución Universitaria de Barranquilla (sede Soledad)
Universidad del Magdalena
Universidad Autónoma del Caribe
Description
This repository contains resting-state EEG data collected with the Biosemi ActiveTwo of 140 participants: - 74 juvenile offenders (JO) - 66 juvenile non-offender controls
Exclusion criteria: No psychiatric treatment, dental/orthodontic appliances.
Recruitment: JO Hogares Claret Foundation (Centro de Reeducación el Oasis & Fundación Luz de Esperanza). Controls: Institución Nacional de Educación Media INEM Miguel Antonio Caro (Barranquilla).
Contents of the dataset
Core Files
dataset_description.json: General information about the studyparticipants.json: Demographic and group assignment dataparticipants.tsv: Demographic and group assignment data in table format
Features Data (EEGJODataset/code)
Feature file nomenclature
Files are named using the pattern:
FR_Dats_band_{BAND}_EP_{EYESTATE}_{EPOCH#}_can_{CHANNEL}.xlsx
| Component | Example | Description |
|--------------------|-------------|---------------------------------------------------------------------------|
| **FR_Dats_band** | Fixed | Prefix = "Feature Results Dataset" |
| **{BAND}** | `ALFA` | EEG frequency band: `ALFA` = Alpha (8-13Hz); `BETA` = Beta (13-30Hz); `DELTA` = Delta (1-4Hz); `THETA` = Theta (4-8Hz) |
| **EP_{EYESTATE}_** | `EP_C_` | Eye state during epoch: `C` = Eyes closed; `O` = Eyes open |
| **{EPOCH#}** | `1` | Epoch number (1 or 2) two epochs per eye state |
| **can_** | Fixed | "Channel" prefix |
| **{CHANNEL}** | `A1` | Electrode position (ABCD system): First letter = A • B • C • D
Number = Electrode ID (1-32) |
File Contents:
Each Excel file contains 7 features for the specified band/channel/epoch combination: 1. Mean Power 2. RMS of PSD 3. Standard Deviation 4. Min Power 5. Max Power 6. Skewness 7. Kurtosis
Examples:
FR_Dats_band_ALFA_EP_C_1_can_A1.xlsx- Alpha band features - First closed-eyes epoch - Channel A1 (Frontal electrode 1)FR_Dats_band_THETA_EP_O_2_can_C15.xlsx- Theta band features - Second open-eyes epoch - Channel C15 (Posterior electrode 15)FR_Dats_band_BETA_EP_C_2_can_B7.xlsx- Beta band features - Second closed-eyes epoch - Channel B7 (Central electrode 7)
Dataset Structure:
4 epochs per subject: - 2 closed-eyes:
EP_C_1,EP_C_2- 2 open-eyes:EP_O_1,EP_O_2128 channels (A1-D32)
4 frequency bands
Total files per subject: 4 epochs × 128 channels × 4 bands = 2,048 files
EEG Data
EEG_JO_Dataset/
├── code/
├── sub-{Subject ID}{Group}/
| ├── eeg/
| | ├── sub-{Subject ID}{Group}_coordsystem.json
| | ├── sub-{Subject ID}{Group}_electrodes.tsv
| | ├── sub-{Subject ID}{Group}_task-{Task Name}_acq-{Datatype}_eeg.json # Epoched data sidecar json
| | ├── sub-{Subject ID}{Group}_task-{Task Name}_acq-{Datatype}_eeg.set # Epoched data
| | ├── sub-{Subject ID}{Group}_task-{Task Name}_channels.tsv
| | ├── sub-{Subject ID}{Group}_task-{Task Name}_desc-{Datatype}_eeg.json # Preprocessed data sidecar json
| | └── sub-{Subject ID}{Group}_task-{Task Name}_desc-{Datatype}_eeg.set # Preprocessed data
├── ...
├── CHANGES
├── dataset_description.json
├── participants.json
├── participants.tsv
└── README.md
File Nomenclature
| Denomination | Value | Description |
|-----------------------|-----------------|------------------------------------------------------------------|
| `sub-` | Fixed | Subject prefix |
| `{Subject ID}` | Fixed | **Unique identifier**:
| `{Group}` | `cg`/`sg`/`sg2` | **Group**: `cg`=control, `sg`=study group 1, `sg2`=study group 2 |
| `{Task Name}` | `restingstate` | **Task name** (resting state) |
| `acq-` `desc-` | `acq-`/`desc-` | **Label**: `acq-` = acquisition, `desc-` = description |
| `{Datatype}` | `epochs`/`preprocessed` | Adquisition type |
| `eeg` | Electroencephalography data | Data type |
| Extension | `.set` | **File type**: processed |
Examples
sub-1005sg_task-restingstate_acq-epochs_eeg.set= Epochs EEG for study group 1 subject 005 (full ID 1005)sub-1005sg_task-restingstate_desc-preprocessing_eeg.set= Preprocessed EEG for study group 1 subject 005 (full ID 1005)
Methods
EEG Acquisition
Device: Biosemi ActiveTwo system
Electrodes: 128 channels (radial placement, 10-20 system reference)
Additional channels: EOG, ECG recorded
Sampling rate: 2048 Hz (downsampled to 128 Hz during preprocessing)
Online filtering: 0.1-100 Hz bandpass
Setup: - Participants seated awake - Continuous monitoring for movements/sleep - Event markers via serial communication (paradigm triggers)
Paradigms
(Dataset contains only resting-state recordings) - Resting State (RS):
Total duration: 12 minutes
Sequence: - 4 min alternating eyes closed/open (COCO: Closed-Open-Closed-Open) - 8 min eyes closed (excluded from current dataset)
- Segment trimming:
5s post-event onset
5s pre-event offset (to avoid transition artifacts)
Preprocessing pipeline (EEGLAB/MATLAB)
Visual inspection: - Raw data review using BDFreader - Identification of bad channels/artifacts
Downsampling: - 2048 Hz → 128 Hz (resting-state data)
Rereferencing: - Average reference (replaced failed earlobe reference)
Filtering: - Bandpass FIR: 1-40 Hz - High-pass: 1 Hz (0.5 Hz cutoff, 425 points) - Low-pass: 40 Hz (45 Hz cutoff, 45 points)
Artifact Removal: - Bad channel rejection:
Flat signals > 5s
SD > 4
Correlation < 0.8 with neighbors
ASR (Artifact Subspace Reconstruction)
ICA + ICLabel (components >90% non-brain removed)
Feature Extraction
PSD Calculation: Welch’s method (50% overlap, Hamming window)
Frequency bands: - Delta (δ): 1-4 Hz - Theta (θ): 4-8 Hz - Alpha (α): 8-13 Hz - Beta (β): 13-30 Hz
Features per band/channel: 1. Mean Power 2. RMS of PSD 3. Standard Deviation 4. Minimum Power 5. Maximum Power 6. Skewness 7. Kurtosis
Feature volume: 14,336 features/subject (4 bands × 128 channels × 4 segments × 7 features)
Technical Specifications
Processing Hardware: - Intel Core i5-9400F @2.9GHz - 16GB RAM - Windows 10 (64-bit)
Software: - MATLAB 2020a - EEGLAB toolbox - Python (scikit-learn, pandas for feature selection)
Processing Time: ~10 minutes/subject
Funding
This research was funded by the SISTEMA GENERAL DE REGALÍAS - SGR and the MINISTERIO DE CIENCIA TECNOLOGÍA E INNOVACIÓN - MINCIENCIAS from Colombia, in the framework of the project “Desarrollo de un sistema inteligente multiparamétrico para el reconocimiento de patrones asociados a disfunciones neurocognitivas en jóvenes en conflicto con la ley en el departamento del Atlántico”, with grant number BPIN 2020000100006.
Support
Correspondence: Aura Polo (apolol@unimagdalena.edu.co); Elmer León (elmerleondb@unimagdalena.edu.co); Julie Viloria-Porto (julieviloriapp@unimagdalena.edu.co)
Cohort#
Dataset Statistics#
Age distribution (n=140, range 14–19 yr, mean 16.7 yr · sex per subject not reported)
Sex composition
Channel counts: 128 ch (n=280 recordings)
Sampling frequencies: 128.0 Hz (n=280 recordings)
Total recording duration: 37 h
Signal · Electrodes & live trace#
Live trace viewer — sub-2012cg · task-restingstate
Showing one representative recording out of
140 subjects and 280 recordings in this dataset.
Browse the full set on OpenNeuro;
drop any other _eeg.{set,edf,bdf,vhdr} file onto the
viewer (or pass ?eeg=<url>) to inspect it.
Electrode layout — EEG · 128 sensors — 128 channels
NEMAR Processing Statistics#
The plots below are generated by NEMAR’s automated EEG pipeline. The histogram shows pipeline success for data cleaning and ICA decomposition, the percentage of data frames and EEG channels retained after artefact removal, line noise per channel (RMS, dB), and the age/gender distribution of participants.
HED event descriptors word cloud
Manifest#
File Explorer#
Browse the BIDS file structure of this dataset. Records are fetched on demand from the EEGDash catalog the first time you open the explorer.
Full dataset metadata table
Dataset ID |
|
Title |
Dataset of Electroencephalograms of Juvenile Offenders |
Author (year) |
|
Canonical |
— |
Importable as |
|
Year |
20 |
Authors |
Aura Polo, Elmer León, Mariana Pino-Melgarejo, Julie Viloria-Porto |
License |
CC0 |
Citation / DOI |
|
Source links |
OpenNeuro | NeMAR | Source URL |
Copy-paste BibTeX
@dataset{ds006923,
title = {Dataset of Electroencephalograms of Juvenile Offenders},
author = {Aura Polo and Elmer León and Mariana Pino-Melgarejo and Julie Viloria-Porto},
doi = {10.18112/openneuro.ds006923.v1.0.0},
url = {https://doi.org/10.18112/openneuro.ds006923.v1.0.0},
}
API Reference#
eegdash.datasetEEGDashDatasetDS006923 · Polo2025eegdash/dataset/registry.py · [source ↗]- class eegdash.dataset.DS006923(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#
Dataset of Electroencephalograms of Juvenile Offenders
- Study:
ds006923(OpenNeuro)- Author (year):
Polo2025- Canonical:
—
Also importable as:
DS006923,Polo2025.Modality:
eeg; Experiment type:Clinical/Intervention; Subject type:Other. Subjects: 140; recordings: 280; tasks: 1.- Parameters:
cache_dir (str | Path) – Directory where data are cached locally.
query (dict | None) – Additional MongoDB-style filters to AND with the dataset selection. Must not contain the key
dataset.s3_bucket (str | None) – Base S3 bucket used to locate the data.
**kwargs (dict) – Additional keyword arguments forwarded to
EEGDashDataset.
- data_dir#
Local dataset cache directory (
cache_dir / dataset_id).- Type:
Path
Notes
Each item is a recording; recording-level metadata are available via
dataset.description.querysupports MongoDB-style filters on fields inALLOWED_QUERY_FIELDSand is combined with the dataset filter. Dataset-specific caveats are not provided in the summary metadata.References
OpenNeuro dataset: https://openneuro.org/datasets/ds006923 NeMAR dataset: https://nemar.org/dataexplorer/detail?dataset_id=ds006923 DOI: https://doi.org/10.18112/openneuro.ds006923.v1.0.0
Examples
>>> from eegdash.dataset import DS006923 >>> dataset = DS006923(cache_dir="./data") >>> recording = dataset[0] >>> raw = recording.load()
- __init__(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#
- save(path: str, overwrite: bool = False, offset: int = 0)[source]#
Save datasets to files by creating one subdirectory for each dataset:
path/ 0/ 0-raw.fif | 0-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw) 1/ 1-raw.fif | 1-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw)
- Parameters:
path (str) –
- Directory in which subdirectories are created to store
-raw.fif | -epo.fif and .json files to.
overwrite (bool) – Whether to delete old subdirectories that will be saved to in this call.
offset (int) – If provided, the integer is added to the id of the dataset in the concat. This is useful in the setting of very large datasets, where one dataset has to be processed and saved at a time to account for its original position.
BaseDataset from braindecode — windowed via create_windows_from_events.braindecodeDataLoader; supports parallel workers and on-the-fly augmentations.pytorchdatasets.load_dataset("EEGDash/ds006923").huggingfaceSwap any load_dataset(...) call for ds006923 to reproduce the tutorial on this dataset.
Citation
Aura Polo, Elmer León, Mariana Pino-Melgarejo, Julie Viloria-Porto (20). Dataset of Electroencephalograms of Juvenile Offenders. 10.18112/openneuro.ds006923.v1.0.0
Provenance
¹Contributed to openneuro in BIDS format.
²Curated & ingested by the EEGDash catalog; see CITATION.cff for canonical reference.
³Persistent identifier: 10.18112/openneuro.ds006923.v1.0.0.
See Also#
eegdash.dataset.EEGDashDataseteegdash.dataset