EEGdash›OpenNeuro›DS006803

Iss. 6803 · 63 subjects · 126 recordings · CC0

Dataset Brief · NeuroTechs Dataset for Stem Skills

DS006803: eeg dataset, 63 subjects#

NeuroTechs Dataset for Stem Skills

Citation: Tania Yareni Pech-Canul, Roberto Guajardo, Luis Fernando Acosta-Soto, Mónica Sofía Margoya-Constantino, Juan Pablo Rosado-Aíza, Luz María Alonso-Valerdi (20). NeuroTechs Dataset for Stem Skills. 10.18112/openneuro.ds006803.v1.1.1

63-participant EEG dataset — NeuroTechs Dataset for Stem Skills.

Data & curation Tania Yareni Pech-Canul · Roberto Guajardo · Luis Fernando Acosta-Soto · Mónica Sofía Margoya-Constantino · Juan Pablo Rosado-Aíza · Luz María Alonso-Valerdi
Year 20 · Distributed via OpenNeuro

EEG · 8 ch250 HzBIDS 1.8.0Task · STEMSKILLS2 sessionsHealthyVisualLearning

Layer 01Study

What was asked

Hypothesis, independent & dependent variables, paradigm, cohort, and the editorial caveats around what the recordings can and cannot answer.

Layer 02Signal · BIDS

What was recorded

Sidecars, channels & electrodes, coordinate system, event semantics, and quality stats from the NEMAR pipeline when available.

Layer 03Training · ML

What you can train on

Recommended access modes — MNE Raw, braindecode windows, PyTorch DataLoader — plus the targets the metadata makes addressable.

§ 01Access · Get started

Quickstart#

Get Started

Install

pip install eegdash

Access the data

from eegdash.dataset import DS006803

dataset = DS006803(cache_dir="./data")
# Get the raw object of the first recording
raw = dataset.datasets[0].raw
print(raw.info)

Query & Filter

Filter by subject

dataset = DS006803(cache_dir="./data", subject="01")

Advanced query

dataset = DS006803(
    cache_dir="./data",
    query={"subject": {"$in": ["01", "02"]}},
)

Iterate recordings

for rec in dataset:
    print(rec.subject, rec.raw.info['sfreq'])

Cite This Dataset

If you use this dataset in your research, please cite the original authors.

BibTeX

@dataset{ds006803,
  title = {NeuroTechs Dataset for Stem Skills},
  author = {Tania Yareni Pech-Canul and Roberto Guajardo and Luis Fernando Acosta-Soto and Mónica Sofía Margoya-Constantino and Juan Pablo Rosado-Aíza and Luz María Alonso-Valerdi},
  doi = {10.18112/openneuro.ds006803.v1.1.1},
  url = {https://doi.org/10.18112/openneuro.ds006803.v1.1.1},
}

§ 02Study · The README

About This Dataset#

Juan Pablo Rosado Aíza jprosadoa@gmail.com

ORCID 0009-0004-5690-1753 - Practical information to access the data

The data units are in microvolts, transformed from raw Unicorn API for Python values.

README

Details related to access to the data]

Contact person

Overview

Evaluating STEM skills in students

View full README

README

Details related to access to the data]

Contact person

Overview

Evaluating STEM skills in students - Year(s) that the project ran

2025 May - July - Brief overview of the tasks in the experiment

Participants answered a computer test through psychopy. The paradigm includes a 2 minute basal state (minute 1 with eyes closed, minute 2 with eyes open) and sections for each skill evaluated. 4 math sections, 1 per basic operation (sum, subtraction, multiplication and division), 1 programming section and 1 spatial ability section. The sections ran until either time or questions ran out. There was a 30 second break between sections.

The event markers with each question, answer and time can be found within each subject folder. The point of the paradigm is to compare different class groups and their global performance. The point of the EEG data is to image the brain for potential analysis of band activity to help explain differences in the groups. the experimental group took classes using interactive tools like Google Colab during class. - Description of the contents of the dataset

8 Channel EEG data for 63 subjects, 23 experimental “intervention” subjects and 40 control subjects. You can find both raw (Session 1) and preprocessed (Session 2) data. All EEG data starts at second 3, since seconds (0-3) were cut in preprocessing. The timestamps in all event markers are in this time signature (Timestamp in second 3 corresponds to sample 1, second 4 is sample 251). - Independent variables

Groups for the subjects. - Dependent variables

Performance, EEG data. - Control variables

Time of participation (End of semester), place for data acquisition, status as student.

Methods

Subjects

All subjects are either experimental or control, whose ID is in the format XXc for control and XXe for experimental. [ ] Subject inclusion/exclusion criteria (if relevant) Only students enrolled in the course at hand.

Participants 1e, 3e, 4e, 6e, 9e, 10e, 12e, 14e, 15e, 24e, 25e, 33e, 34e, 36e, 37e, 39e, 40e, 41e, 14c and 16c were outliers on RMS voltage.

Apparatus

the room was performed in a closed room with a single researcher there to give instructions and answer any questions. There was a laptop and the EEG device was mounted using conductive gel.

Initial setup

Signing consent on paper was the first thing that was done, afterwards impedance measurements using UHB recorder software were made until all signals were “good” on the sofware.

The subjects then answered the test.

Task organization

The test’s sections are not randomized nor counterbalanced, the order is as described above. The questions within each section were randomized.

Task details

Each question answered has a code, an answer and a timestamp, which can be found in the corresponding main section file for each subject. The questions themselves with codes and correct answers can be found in the stimuli folder.

Additional data acquired

Average cycle data for female subjects was calculated for each group, anonymously. Refer to extra_metadata.xlsx.

Experimental location

All data collection was collected in a controlled environment.

Missing data

Subject 17c, 30e, 32e and 35e where lost in the process of acquisition. All records start at second 3, instead of second 0, to eliminate connectivity noise and drift at the beginning. The basal state lasted 123 seconds to account for this, so the first 120 seconds correspond to the basal states.

All responses to “OR4” in the spatial ability sections are invalid, given that the correct answer is not among the options. It was excluded from all calculations shown in extra_metadata.xlsx.

§ 03Cohort · Participants

Cohort#

Dataset Statistics#

Age distribution (n=63, range 18–24 yr, mean 19.5 yr · sex per subject not reported)

1520

Sex composition

Female
37
Male
26
F : M ratio
1.42 : 1

59% female · n = 63 subjects with reported sex.

Channel counts: 8 ch (n=126 recordings)

Sampling frequencies: 250.0 Hz (n=126 recordings)

Total recording duration: 45 h

§ 04Signal · Electrodes & trace

Signal · Electrodes & live trace#

Fig. 01 Signal & montage 8 ch · EEG · 250 Hz · 63 subjects, 126 recordings

Live trace viewer — sub-28e · ses-2 · task-STEMSKILLS

Showing one representative recording out of 63 subjects and 126 recordings in this dataset. Browse the full set on OpenNeuro; drop any other _eeg.{set,edf,bdf,vhdr} file onto the viewer (or pass ?eeg=<url>) to inspect it.

Electrode layout — EEG · 8 sensors — 8 channels

NEMAR Processing Statistics#

The plots below are generated by NEMAR’s automated EEG pipeline. The histogram shows pipeline success for data cleaning and ICA decomposition, the percentage of data frames and EEG channels retained after artefact removal, line noise per channel (RMS, dB), and the age/gender distribution of participants.

HED event descriptors word cloud

§ 05Manifest · BIDS tree

Manifest#

File Explorer#

Browse the BIDS file structure of this dataset. Records are fetched on demand from the EEGDash catalog the first time you open the explorer.

Recordings—

Files—

Subjects—

Modalities—

Click to load file structure…

§ 06API · Programmatic access

API Reference#

Signature

eegdash.dataset

class

eegdash.dataset.DS006803(cache_dir, query=None, s3_bucket=None, **kwargs)

Bases: EEGDashDataset

Author (year)PechCanul2025

Canonical—

Importable asDS006803 · PechCanul2025

Sourceeegdash/dataset/registry.py · [source ↗]

class eegdash.dataset.DS006803(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#

NeuroTechs Dataset for Stem Skills

Study:: ds006803 (OpenNeuro)
Author (year):: PechCanul2025
Canonical:: —

Also importable as: DS006803, PechCanul2025.

Modality: eeg; Experiment type: Learning; Subject type: Healthy. Subjects: 63; recordings: 126; tasks: 1.

Parameters:

cache_dir (str | Path) – Directory where data are cached locally.
query (dict | None) – Additional MongoDB-style filters to AND with the dataset selection. Must not contain the key dataset.
s3_bucket (str | None) – Base S3 bucket used to locate the data.
**kwargs (dict) – Additional keyword arguments forwarded to EEGDashDataset.

data_dir#

Local dataset cache directory (cache_dir / dataset_id).

Type:: Path

query#

Merged query with the dataset filter applied.

Type:: dict

records#

Metadata records used to build the dataset, if pre-fetched.

Type:: list[dict] | None

Notes

Each item is a recording; recording-level metadata are available via dataset.description. query supports MongoDB-style filters on fields in ALLOWED_QUERY_FIELDS and is combined with the dataset filter. Dataset-specific caveats are not provided in the summary metadata.

References

OpenNeuro dataset: https://openneuro.org/datasets/ds006803 NeMAR dataset: https://nemar.org/dataexplorer/detail?dataset_id=ds006803 DOI: https://doi.org/10.18112/openneuro.ds006803.v1.1.1

Examples

>>> from eegdash.dataset import DS006803
>>> dataset = DS006803(cache_dir="./data")
>>> recording = dataset[0]
>>> raw = recording.load()

__init__(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#

save(path: str, overwrite: bool = False, offset: int = 0)[source]#

Save datasets to files by creating one subdirectory for each dataset:

path/
    0/
        0-raw.fif | 0-epo.fif
        description.json
        raw_preproc_kwargs.json (if raws were preprocessed)
        window_kwargs.json (if this is a windowed dataset)
        window_preproc_kwargs.json  (if windows were preprocessed)
        target_name.json (if target_name is not None and dataset is raw)
    1/
        1-raw.fif | 1-epo.fif
        description.json
        raw_preproc_kwargs.json (if raws were preprocessed)
        window_kwargs.json (if this is a windowed dataset)
        window_preproc_kwargs.json  (if windows were preprocessed)
        target_name.json (if target_name is not None and dataset is raw)

Parameters:

path (str) –

Directory in which subdirectories are created to store
-raw.fif | -epo.fif and .json files to.
overwrite (bool) – Whether to delete old subdirectories that will be saved to in this call.
offset (int) – If provided, the integer is added to the id of the dataset in the concat. This is useful in the setting of very large datasets, where one dataset has to be processed and saved at a time to account for its original position.

Access modesMNE → braindecode → PyTorch → ML

.rawMNE Raw object — standard tools (filter, epoch, ICA, plot_psd).mne

BaseConcatDatasetEach record is a lazy BaseDataset from braindecode — windowed via create_windows_from_events.braindecode

DataLoaderWraps the windowed dataset into a PyTorch DataLoader; supports parallel workers and on-the-fly augmentations.pytorch

Zarr cacheOptional braindecode Zarr mirror for fast resume; persisted to cache_dir.zarr

Hugging FacePre-bundled mirror at EEGDash/ds006803 · pull with datasets.load_dataset("EEGDash/ds006803").huggingface

Croissant 1.0Machine-readable JSON-LD descriptor — DS006803.croissant.json (MLCommons schema, ingestible by PyTorch / TensorFlow / JAX).mlcommons

Examples using EEGDashcurated · start here

Find datasets with the EEGDash APIQuery the catalogue, filter by task or modality, list candidates.

Load one EEG recordingResolve a single record to an MNE Raw with channels and events.

EEG recording to PyTorch DataLoaderWrap braindecode windows in a DataLoader for model training.

Preprocess EEG and create windowsFilter, resample, epoch — and persist the windowed dataset.

Save and reload prepared dataCache a windowed dataset to disk and reattach it without recompute.

Download a dataset locallyPrefetch BIDS files to a local cache and validate the layout.

Swap any load_dataset(...) call for ds006803 to reproduce the tutorial on this dataset.

Citation

Tania Yareni Pech-Canul, Roberto Guajardo, Luis Fernando Acosta-Soto, Mónica Sofía Margoya-Constantino, Juan Pablo Rosado-Aíza, … (20). NeuroTechs Dataset for Stem Skills. 10.18112/openneuro.ds006803.v1.1.1

Provenance

¹Contributed to openneuro in BIDS format.

²Curated & ingested by the EEGDash catalog; see CITATION.cff for canonical reference.

³Persistent identifier: 10.18112/openneuro.ds006803.v1.1.1.

Related & sibling datasets

DS004166EEG · 71 subj DS004147EEG · 12 subj DS006802EEG · 24 subj DS004017EEG · 21 subj NM000339EEG · 62 subj

+ 1 more — see See Also below →

BIDS

BIDS 1.8.0

Sidecars

channels · electrodes · coordsystem · eeg.json

Provenance

CC0 · 10.18112/openneuro.ds006803.v1.1.1

Machine-readable

schema.org/Dataset · Croissant

Mirrors

OpenNeuro · NEMAR · HuggingFace

Dataset ID	`DS006803`
Title	NeuroTechs Dataset for Stem Skills
Author (year)	`PechCanul2025`
Canonical	—
Importable as	`DS006803`, `PechCanul2025`
Year	20
Authors	Tania Yareni Pech-Canul, Roberto Guajardo, Luis Fernando Acosta-Soto, Mónica Sofía Margoya-Constantino, Juan Pablo Rosado-Aíza, Luz María Alonso-Valerdi
License	CC0
Citation / DOI	doi:10.18112/openneuro.ds006803.v1.1.1
Source links	OpenNeuro \| NeMAR \| Source URL

DS006803: eeg dataset, 63 subjects#

Quickstart#

About This Dataset#

Cohort#

Dataset Statistics#

Signal · Electrodes & live trace#

NEMAR Processing Statistics#

Manifest#

File Explorer#

API Reference#

Citation

Provenance

Related & sibling datasets

See Also#