eegdash.dataset.DS004657#

class eegdash.dataset.DS004657(cache_dir: str, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]#

Bases: EEGDashDataset

OpenNeuro dataset ds004657.

Modality: Motor | Type: Decision-making | Subjects: nan

This dataset contains 24 subjects with 119 recordings across 1 tasks. Total duration: 27.205 hours. Dataset size: 43.06 GB.

dataset

#Subj

#Chan

#Classes

Freq(Hz)

Duration(H)

Size

ds004657

24

64

1

1024,8192

27.205

43.06 GB

Short overview of dataset ds004657 more details in the NeMAR documentation.

This dataset class provides convenient access to the ds004657 dataset through the EEGDash interface. It inherits all functionality from EEGDashDataset with the dataset filter pre-configured.

Parameters:
  • cache_dir (str) – Directory to cache downloaded data.

  • query (dict, optional) – Additional MongoDB-style filters to AND with the dataset selection. Must not contain the key dataset.

  • s3_bucket (str, optional) – Base S3 bucket used to locate the data.

  • **kwargs – Additional arguments passed to the base dataset class.

See also

EEGDashDataset

Base dataset class with full API documentation

Notes

More details available in the NEMAR documentation.

Examples

Basic usage:

>>> from eegdash.dataset import DS004657
>>> dataset = DS004657(cache_dir="./data")
>>> print(f"Number of recordings: {len(dataset)}")

Load a specific recording:

>>> if len(dataset) > 0:
...     recording = dataset[0]
...     raw = recording.load()
...     print(f"Sampling rate: {raw.info['sfreq']} Hz")
...     print(f"Number of channels: {len(raw.ch_names)}")

Filter by additional criteria:

>>> # Get subset with specific task or subject
>>> filtered_dataset = DS004657(
...     cache_dir="./data",
...     query={"task": "RestingState"}  # if applicable
... )

Initialize self. See help(type(self)) for accurate signature.

Parameters:
  • cache_dir – The description is missing.

  • query – The description is missing.

  • s3_bucket – The description is missing.

  • **kwargs – The description is missing.

property cummulative_sizes#
static cumsum(sequence)[source]#
property description: DataFrame#
get_metadata() DataFrame[source]#

Concatenate the metadata and description of the wrapped Epochs.

Returns:

metadata – DataFrame containing as many rows as there are windows in the BaseConcatDataset, with the metadata and description information for each window.

Return type:

pd.DataFrame

save(path: str, overwrite: bool = False, offset: int = 0)[source]#

Save datasets to files by creating one subdirectory for each dataset: path/

0/

0-raw.fif | 0-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw)

1/

1-raw.fif | 1-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw)

Parameters:
  • path (str) –

    Directory in which subdirectories are created to store

    -raw.fif | -epo.fif and .json files to.

  • overwrite (bool) – Whether to delete old subdirectories that will be saved to in this call.

  • offset (int) – If provided, the integer is added to the id of the dataset in the concat. This is useful in the setting of very large datasets, where one dataset has to be processed and saved at a time to account for its original position.

set_description(description: dict | DataFrame, overwrite: bool = False)[source]#

Update (add or overwrite) the dataset description.

Parameters:
  • description (dict | pd.DataFrame) – Description in the form key: value where the length of the value has to match the number of datasets.

  • overwrite (bool) – Has to be True if a key in description already exists in the dataset description.

split(by: str | list[int] | list[list[int]] | dict[str, list[int]] | None = None, property: str | None = None, split_ids: list[int] | list[list[int]] | dict[str, list[int]] | None = None) dict[str, BaseConcatDataset][source]#

Split the dataset based on information listed in its description.

The format could be based on a DataFrame or based on indices.

Parameters:
  • by (str | list | dict) – If by is a string, splitting is performed based on the description DataFrame column with this name. If by is a (list of) list of integers, the position in the first list corresponds to the split id and the integers to the datapoints of that split. If a dict then each key will be used in the returned splits dict and each value should be a list of int.

  • property (str) – Some property which is listed in the info DataFrame.

  • split_ids (list | dict) – List of indices to be combined in a subset. It can be a list of int or a list of list of int.

Returns:

splits – A dictionary with the name of the split (a string) as key and the dataset as value.

Return type:

dict

property target_transform#
property transform#
datasets: list[Dataset[_T_co]]#
cumulative_sizes: list[int]#

Dataset Information#

  • Dataset ID: DS004657

  • Summary: Modality: Motor | Type: Decision-making

  • Number of Subjects: 24

  • Number of Recordings: 119

  • Number of Tasks: 1

  • Number of Channels: 64

  • Sampling Frequencies: 1024,8192

  • Total Duration (hours): 27.205

  • Dataset Size: 43.06 GB

  • OpenNeuro: ds004657

  • NeMAR: ds004657

dataset

#Subj

#Chan

#Classes

Freq(Hz)

Duration(H)

Size

ds004657

24

64

1

1024,8192

27.205

43.06 GB

Usage Example#

from eegdash.dataset import DS004657

dataset = DS004657(cache_dir="./data")

print(f"Number of recordings: {len(dataset)}")

if len(dataset):
    recording = dataset[0]
    raw = recording.load()
    print(f"Sampling rate: {raw.info['sfreq']} Hz")
    print(f"Channels: {len(raw.ch_names)}")

See Also#