EEGDashDataset#
- class eegdash.EEGDashDataset(cache_dir: str | Path, query: dict[str, Any] = None, description_fields: list[str] | None = None, s3_bucket: str | None = None, records: list[dict] | None = None, download: bool = True, n_jobs: int = -1, eeg_dash_instance: Any = None, **kwargs)[source]#
Bases:
BaseConcatDatasetCreate a new EEGDashDataset from a given query or local BIDS dataset directory and dataset name. An EEGDashDataset is pooled collection of EEGDashBaseDataset instances (individual recordings) and is a subclass of braindecode’s BaseConcatDataset.
- Parameters:
cache_dir (str | Path) – Directory where data are cached locally.
query (dict | None) – Raw MongoDB query to filter records. If provided, it is merged with keyword filtering arguments (see
**kwargs) using logical AND. You must provide at least adataset(either inqueryor as a keyword argument). Only fields inALLOWED_QUERY_FIELDSare considered for filtering.description_fields (list[str]) – Fields to extract from each record and include in dataset descriptions (e.g., “subject”, “session”, “run”, “task”).
s3_bucket (str | None) – Optional S3 bucket URI (e.g., “s3://mybucket”) to use instead of the default OpenNeuro bucket when downloading data files.
records (list[dict] | None) – Pre-fetched metadata records. If provided, the dataset is constructed directly from these records and no MongoDB query is performed.
download (bool, default True) – If False, load from local BIDS files only. Local data are expected under
cache_dir / dataset; no DB or S3 access is attempted.n_jobs (int) – Number of parallel jobs to use where applicable (-1 uses all cores).
eeg_dash_instance (EEGDash | None) – Optional existing EEGDash client to reuse for DB queries. If None, a new client is created on demand, not used in the case of no download.
**kwargs (dict) –
Additional keyword arguments serving two purposes:
Filtering: any keys present in
ALLOWED_QUERY_FIELDSare treated as query filters (e.g.,dataset,subject,task, …).Dataset options: remaining keys are forwarded to
EEGDashBaseDataset.
Examples
Basic usage with dataset and subject filtering:
>>> from eegdash import EEGDashDataset >>> dataset = EEGDashDataset( ... cache_dir="./data", ... dataset="ds002718", ... subject="012" ... ) >>> print(f"Number of recordings: {len(dataset)}")
Filter by multiple subjects and specific task:
>>> subjects = ["012", "013", "014"] >>> dataset = EEGDashDataset( ... cache_dir="./data", ... dataset="ds002718", ... subject=subjects, ... task="RestingState" ... )
Load and inspect EEG data from recordings:
>>> if len(dataset) > 0: ... recording = dataset[0] ... raw = recording.load() ... print(f"Sampling rate: {raw.info['sfreq']} Hz") ... print(f"Number of channels: {len(raw.ch_names)}") ... print(f"Duration: {raw.times[-1]:.1f} seconds")
Advanced filtering with raw MongoDB queries:
>>> from eegdash import EEGDashDataset >>> query = { ... "dataset": "ds002718", ... "subject": {"$in": ["012", "013"]}, ... "task": "RestingState" ... } >>> dataset = EEGDashDataset(cache_dir="./data", query=query)
Working with dataset collections and braindecode integration:
>>> # EEGDashDataset is a braindecode BaseConcatDataset >>> for i, recording in enumerate(dataset): ... if i >= 2: # limit output ... break ... print(f"Recording {i}: {recording.description}") ... raw = recording.load() ... print(f" Channels: {len(raw.ch_names)}, Duration: {raw.times[-1]:.1f}s")
Initialize self. See help(type(self)) for accurate signature.
- Parameters:
cache_dir – The description is missing.
query – The description is missing.
description_fields – The description is missing.
s3_bucket – The description is missing.
records – The description is missing.
download – The description is missing.
n_jobs – The description is missing.
eeg_dash_instance – The description is missing.
**kwargs – The description is missing.
- save(path, overwrite=False)[source]#
Save the dataset to disk.
- Parameters:
path (str or Path) – Destination file path.
overwrite (bool, default False) – If True, overwrite existing file.
- Return type:
None
- property cummulative_sizes#
- property description: DataFrame#
- get_metadata() DataFrame[source]#
Concatenate the metadata and description of the wrapped Epochs.
- Returns:
metadata – DataFrame containing as many rows as there are windows in the BaseConcatDataset, with the metadata and description information for each window.
- Return type:
pd.DataFrame
- set_description(description: dict | DataFrame, overwrite: bool = False)[source]#
Update (add or overwrite) the dataset description.
- Parameters:
description (dict | pd.DataFrame) – Description in the form key: value where the length of the value has to match the number of datasets.
overwrite (bool) – Has to be True if a key in description already exists in the dataset description.
- split(by: str | list[int] | list[list[int]] | dict[str, list[int]] | None = None, property: str | None = None, split_ids: list[int] | list[list[int]] | dict[str, list[int]] | None = None) dict[str, BaseConcatDataset][source]#
Split the dataset based on information listed in its description.
The format could be based on a DataFrame or based on indices.
- Parameters:
by (str | list | dict) – If
byis a string, splitting is performed based on the description DataFrame column with this name. Ifbyis a (list of) list of integers, the position in the first list corresponds to the split id and the integers to the datapoints of that split. If a dict then each key will be used in the returned splits dict and each value should be a list of int.property (str) – Some property which is listed in the info DataFrame.
split_ids (list | dict) – List of indices to be combined in a subset. It can be a list of int or a list of list of int.
- Returns:
splits – A dictionary with the name of the split (a string) as key and the dataset as value.
- Return type:
dict
- property target_transform#
- property transform#
- datasets: list[Dataset[_T_co]]#
- cumulative_sizes: list[int]#
Usage Example#
from eegdash import EEGDashDataset
dataset = EEGDashDataset(cache_dir="./data", dataset="ds002718")
print(f"Number of recordings: {len(dataset)}")
See Also#
eegdash.dataset