eegdash.data_utils#

Data utilities and dataset classes for EEG data handling.

This module provides core dataset classes for working with EEG data in the EEGDash ecosystem, including classes for individual recordings and collections of datasets. It integrates with braindecode for machine learning workflows and handles data loading from both local and remote sources.

Classes

EEGDashBaseDataset(record, cache_dir[, ...])

A single EEG recording dataset.

EEGBIDSDataset([data_dir, dataset])

An interface to a local BIDS dataset containing EEG recordings.

EEGDashBaseRaw(input_fname, metadata[, ...])

MNE BaseRaw wrapper for automatic S3 data fetching.

class eegdash.data_utils.EEGDashBaseDataset(record: dict[str, Any], cache_dir: str, s3_bucket: str | None = None, **kwargs)[source]#

Bases: BaseDataset

A single EEG recording dataset.

Represents a single EEG recording, typically hosted on a remote server (like AWS S3) and cached locally upon first access. This class is a subclass of braindecode.datasets.BaseDataset and can be used with braindecode’s preprocessing and training pipelines.

Parameters:
  • record (dict) – A fully resolved metadata record for the data to load.

  • cache_dir (str) – The local directory where the data will be cached.

  • s3_bucket (str, optional) – The S3 bucket to download data from. If not provided, defaults to the OpenNeuro bucket.

  • **kwargs – Additional keyword arguments passed to the braindecode.datasets.BaseDataset constructor.

property raw: BaseRaw#

The MNE Raw object for this recording.

Accessing this property triggers the download and caching of the data if it has not been accessed before.

Returns:

The loaded MNE Raw object.

Return type:

mne.io.BaseRaw

class eegdash.data_utils.EEGBIDSDataset(data_dir=None, dataset='')[source]#

Bases: object

An interface to a local BIDS dataset containing EEG recordings.

This class centralizes interactions with a BIDS dataset on the local filesystem, providing methods to parse metadata, find files, and retrieve BIDS-related information.

Parameters:
  • data_dir (str or Path) – The path to the local BIDS dataset directory.

  • dataset (str) – A name for the dataset (e.g., “ds002718”).

ALLOWED_FILE_FORMAT = ['eeglab', 'brainvision', 'biosemi', 'european']#
RAW_EXTENSIONS = {'.bdf': ['.bdf'], '.edf': ['.edf'], '.set': ['.set', '.fdt'], '.vhdr': ['.eeg', '.vhdr', '.vmrk', '.dat', '.raw']}#
METADATA_FILE_EXTENSIONS = ['eeg.json', 'channels.tsv', 'electrodes.tsv', 'events.tsv', 'events.json']#
check_eeg_dataset() bool[source]#

Check if the BIDS dataset contains EEG data.

Returns:

True if the dataset’s modality is EEG, False otherwise.

Return type:

bool

get_bids_metadata_files(filepath: str | Path, metadata_file_extension: str) list[Path][source]#

Retrieve all metadata files that apply to a given data file.

Follows the BIDS inheritance principle to find all relevant metadata files (e.g., channels.tsv, eeg.json) for a specific recording.

Parameters:
  • filepath (str or Path) – The path to the data file.

  • metadata_file_extension (str) – The extension of the metadata file to search for (e.g., “channels.tsv”).

Returns:

A list of paths to the matching metadata files.

Return type:

list of Path

load_and_preprocess_raw(raw_file: str, preprocess: bool = False) ndarray[source]#

Load and optionally preprocess a raw data file.

This is a utility function for testing or debugging, not for general use.

Parameters:
  • raw_file (str) – Path to the raw EEGLAB file (.set).

  • preprocess (bool, default False) – If True, apply a high-pass filter, notch filter, and resample the data.

Returns:

The loaded and processed data as a NumPy array.

Return type:

numpy.ndarray

get_files() list[str][source]#

Get all EEG recording file paths in the BIDS dataset.

Returns:

A list of file paths for all valid EEG recordings.

Return type:

list of str

resolve_bids_json(json_files: list[str]) dict[source]#

Resolve BIDS JSON inheritance and merge files.

Parameters:

json_files (list of str) – A list of JSON file paths, ordered from the lowest (most specific) to highest level of the BIDS hierarchy.

Returns:

A dictionary containing the merged JSON data.

Return type:

dict

get_bids_file_attribute(attribute: str, data_filepath: str) Any[source]#

Retrieve a specific attribute from BIDS metadata.

Parameters:
  • attribute (str) – The name of the attribute to retrieve (e.g., “sfreq”, “subject”).

  • data_filepath (str) – The path to the data file.

Returns:

The value of the requested attribute, or None if not found.

Return type:

Any

channel_labels(data_filepath: str) list[str][source]#

Get a list of channel labels from channels.tsv.

Parameters:

data_filepath (str) – The path to the data file.

Returns:

A list of channel names.

Return type:

list of str

channel_types(data_filepath: str) list[str][source]#

Get a list of channel types from channels.tsv.

Parameters:

data_filepath (str) – The path to the data file.

Returns:

A list of channel types.

Return type:

list of str

num_times(data_filepath: str) int[source]#

Get the number of time points in the recording.

Calculated from SamplingFrequency and RecordingDuration in eeg.json.

Parameters:

data_filepath (str) – The path to the data file.

Returns:

The approximate number of time points.

Return type:

int

subject_participant_tsv(data_filepath: str) dict[str, Any][source]#

Get the participants.tsv record for a subject.

Parameters:

data_filepath (str) – The path to a data file belonging to the subject.

Returns:

A dictionary of the subject’s information from participants.tsv.

Return type:

dict

eeg_json(data_filepath: str) dict[str, Any][source]#

Get the merged eeg.json metadata for a data file.

Parameters:

data_filepath (str) – The path to the data file.

Returns:

The merged eeg.json metadata.

Return type:

dict

channel_tsv(data_filepath: str) dict[str, Any][source]#

Get the channels.tsv metadata as a dictionary.

Parameters:

data_filepath (str) – The path to the data file.

Returns:

The channels.tsv data, with columns as keys.

Return type:

dict

class eegdash.data_utils.EEGDashBaseRaw(input_fname: str, metadata: dict[str, Any], preload: bool = False, *, cache_dir: str | None = None, bids_dependencies: list[str] = [], verbose: Any = None)[source]#

Bases: BaseRaw

MNE BaseRaw wrapper for automatic S3 data fetching.

This class extends mne.io.BaseRaw to automatically fetch data from an S3 bucket and cache it locally when data is first accessed. It is intended for internal use within the EEGDash ecosystem.

Parameters:
  • input_fname (str) – The path to the file on the S3 bucket (relative to the bucket root).

  • metadata (dict) – The metadata record for the recording, containing information like sampling frequency, channel names, etc.

  • preload (bool, default False) – If True, preload the data into memory.

  • cache_dir (str, optional) – Local directory for caching data. If None, a default directory is used.

  • bids_dependencies (list of str, default []) – A list of BIDS metadata files to download alongside the main recording.

  • verbose (str, int, or None, default None) – The MNE verbosity level.

See also

mne.io.Raw

The base class for Raw objects in MNE.