DS006803#

NeuroTechs Dataset for Stem Skills

Access recordings and metadata through EEGDash.

Citation: Tania Yareni Pech-Canul, Roberto Guajardo, Luis Fernando Acosta-Soto, Mónica Sofía Margoya-Constantino, Juan Pablo Rosado-Aíza, Luz María Alonso-Valerdi (2025). NeuroTechs Dataset for Stem Skills. 10.18112/openneuro.ds006803.v1.0.1

Modality: eeg Subjects: 63 Recordings: 636 License: CC0 Source: openneuro

Metadata: Complete (100%)

Quickstart#

Install

pip install eegdash

Access the data

from eegdash.dataset import DS006803

dataset = DS006803(cache_dir="./data")
# Get the raw object of the first recording
raw = dataset.datasets[0].raw
print(raw.info)

Filter by subject

dataset = DS006803(cache_dir="./data", subject="01")

Advanced query

dataset = DS006803(
    cache_dir="./data",
    query={"subject": {"$in": ["01", "02"]}},
)

Iterate recordings

for rec in dataset:
    print(rec.subject, rec.raw.info['sfreq'])

If you use this dataset in your research, please cite the original authors.

BibTeX

@dataset{ds006803,
  title = {NeuroTechs Dataset for Stem Skills},
  author = {Tania Yareni Pech-Canul and Roberto Guajardo and Luis Fernando Acosta-Soto and Mónica Sofía Margoya-Constantino and Juan Pablo Rosado-Aíza and Luz María Alonso-Valerdi},
  doi = {10.18112/openneuro.ds006803.v1.0.1},
  url = {https://doi.org/10.18112/openneuro.ds006803.v1.0.1},
}

About This Dataset#

README

Details related to access to the data]

  • Contact person

Juan Pablo Rosado Aíza jprosadoa@gmail.com

View full README

README

Details related to access to the data]

  • Contact person

Juan Pablo Rosado Aíza jprosadoa@gmail.com

ORCID 0009-0004-5690-1753

  • Practical information to access the data

The data units are in microvolts, transformed from raw Unicorn API for Python values.

Overview

Evaluating STEM skills in students

  • Year(s) that the project ran

2025 May - July

  • Brief overview of the tasks in the experiment

Participants answered a computer test through psychopy. The paradigm includes a 2 minute basal state (minute 1 with eyes closed, minute 2 with eyes open) and sections for each skill evaluated. 4 math sections, 1 per basic operation (sum, subtraction, multiplication and division), 1 programming section and 1 spatial ability section. The sections ran until either time or questions ran out. There was a 30 second break between sections.

The event markers with each question, answer and time can be found within each subject folder. The point of the paradigm is to compare different class groups and their global performance. The point of the EEG data is to image the brain for potential analysis of band activity to help explain differences in the groups. the experimental group took classes using interactive tools like Google Colab during class.

  • Description of the contents of the dataset

8 Channel EEG data for 63 subjects, 23 experimental “intervention” subjects and 40 control subjects. You can find both raw (Session 1) and preprocessed (Session 2) data. All EEG data starts at second 3, since seconds (0-3) were cut in preprocessing. The timestamps in all event markers are in this time signature (Timestamp in second 3 corresponds to sample 1, second 4 is sample 251).

  • Independent variables

Groups for the subjects.

  • Dependent variables

Performance, EEG data.

  • Control variables

Time of participation (End of semester), place for data acquisition, status as student.

Methods

Subjects

All subjects are either experimental or control, whose ID is in the format XXc for control and XXe for experimental.

[ ] Subject inclusion/exclusion criteria (if relevant) Only students enrolled in the course at hand.

Apparatus

the room was performed in a closed room with a single researcher there to give instructions and answer any questions. There was a laptop and the EEG device was mounted using conductive gel.

Initial setup

Signing consent on paper was the first thing that was done, afterwards impedance measurements using UHB recorder software were made until all signals were “good” on the sofware.

The subjects then answered the test.

Task organization

The test’s sections are not randomized nor counterbalanced, the order is as described above. The questions within each section were randomized.

Task details

Each question answered has a code, an answer and a timestamp, which can be found in the corresponding main section file for each subject. The questions themselves with codes and correct answers can be found in the stimuli folder. Due to a programming quirk the first programming and first spatial ability question appearance time is lost, it should be calculated from the final answer time of a given previous section plus 30 seconds. Afterwards, the answer time for each given question is found under “question time” and the question appearance time for any given next question is under “Answer time” for the previous question. This discrepancy shall be fixed in the next release.

Additional data acquired

Average cycle data for female subjects was calculated for each group, anonymously.

Experimental location

All data collection was collected in a controlled environment.

Missing data

Subject 17c, 30e, 32e and 35e where lost in the process of acquisition. All records start at second 3, instead of second 0, to eliminate connectivity noise and drift at the beginning. The basal state lasted 123 seconds to account for this, so the first 120 seconds correspond to the basal states.

Dataset Information#

Dataset ID

DS006803

Title

NeuroTechs Dataset for Stem Skills

Year

2025

Authors

Tania Yareni Pech-Canul, Roberto Guajardo, Luis Fernando Acosta-Soto, Mónica Sofía Margoya-Constantino, Juan Pablo Rosado-Aíza, Luz María Alonso-Valerdi

License

CC0

Citation / DOI

doi:10.18112/openneuro.ds006803.v1.0.1

Source links

OpenNeuro | NeMAR | Source URL

Copy-paste BibTeX
@dataset{ds006803,
  title = {NeuroTechs Dataset for Stem Skills},
  author = {Tania Yareni Pech-Canul and Roberto Guajardo and Luis Fernando Acosta-Soto and Mónica Sofía Margoya-Constantino and Juan Pablo Rosado-Aíza and Luz María Alonso-Valerdi},
  doi = {10.18112/openneuro.ds006803.v1.0.1},
  url = {https://doi.org/10.18112/openneuro.ds006803.v1.0.1},
}

Found an issue with this dataset?

If you encounter any problems with this dataset (missing files, incorrect metadata, loading errors, etc.), please let us know!

Report an Issue on GitHub

Technical Details#

Subjects & recordings
  • Subjects: 63

  • Recordings: 636

  • Tasks: 1

Channels & sampling rate
  • Channels: 8

  • Sampling rate (Hz): 250.0

  • Duration (hours): 0.0

Tags
  • Pathology: Healthy

  • Modality: Visual

  • Type: Learning

Files & format
  • Size on disk: 1.4 GB

  • File count: 636

  • Format: BIDS

License & citation
  • License: CC0

  • DOI: doi:10.18112/openneuro.ds006803.v1.0.1

Provenance

API Reference#

Use the DS006803 class to access this dataset programmatically.

class eegdash.dataset.DS006803(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#

Bases: EEGDashDataset

OpenNeuro dataset ds006803. Modality: eeg; Experiment type: Learning; Subject type: Healthy. Subjects: 63; recordings: 126; tasks: 1.

Parameters:
  • cache_dir (str | Path) – Directory where data are cached locally.

  • query (dict | None) – Additional MongoDB-style filters to AND with the dataset selection. Must not contain the key dataset.

  • s3_bucket (str | None) – Base S3 bucket used to locate the data.

  • **kwargs (dict) – Additional keyword arguments forwarded to EEGDashDataset.

data_dir#

Local dataset cache directory (cache_dir / dataset_id).

Type:

Path

query#

Merged query with the dataset filter applied.

Type:

dict

records#

Metadata records used to build the dataset, if pre-fetched.

Type:

list[dict] | None

Notes

Each item is a recording; recording-level metadata are available via dataset.description. query supports MongoDB-style filters on fields in ALLOWED_QUERY_FIELDS and is combined with the dataset filter. Dataset-specific caveats are not provided in the summary metadata.

References

OpenNeuro dataset: https://openneuro.org/datasets/ds006803 NeMAR dataset: https://nemar.org/dataexplorer/detail?dataset_id=ds006803

Examples

>>> from eegdash.dataset import DS006803
>>> dataset = DS006803(cache_dir="./data")
>>> recording = dataset[0]
>>> raw = recording.load()
__init__(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#
save(path, overwrite=False)[source]#

Save the dataset to disk.

Parameters:
  • path (str or Path) – Destination file path.

  • overwrite (bool, default False) – If True, overwrite existing file.

Return type:

None

See Also#