eegdash.features.datasets module#

class eegdash.features.datasets.FeaturesConcatDataset(list_of_ds: list[FeaturesDataset] | None = None, target_transform: Callable | None = None)[source]

Bases: BaseConcatDataset

A concatenated dataset of FeaturesDataset objects.

This class holds a list of FeaturesDataset instances and allows them to be treated as a single, larger dataset. It provides methods for

splitting, saving, and performing DataFrame-like operations (e.g., mean, var, fillna) across all contained datasets.

Parameters:

list_of_ds (list of FeaturesDataset) – A list of FeaturesDataset objects to concatenate.
target_transform (callable, optional) – A function to apply to the target values before they are returned.

count(numeric_only: bool = False, n_jobs: int = 1) → Series[source]

Count non-NA cells for each feature column.

Parameters:

numeric_only (bool, default False) – Include only float, int, boolean columns.
n_jobs (int, default 1) – Number of jobs to run in parallel.

Returns:

The count of non-NA cells for each column.

Return type:

pandas.Series

drop(*args, **kwargs) → None[source]: Drop specified labels from rows or columns in-place. See pandas.DataFrame.drop().

dropna(*args, **kwargs) → None[source]: Remove missing values in-place. See pandas.DataFrame.dropna().

fillna(*args, **kwargs) → None[source]: Fill NA/NaN values in-place. See pandas.DataFrame.fillna().

get_metadata() → DataFrame[source]

Get the metadata of all datasets as a single DataFrame.

Concatenates the metadata from all contained datasets and adds columns from their description attributes.

Returns:: A DataFrame containing the metadata for every sample in the concatenated dataset.
Return type:: pandas.DataFrame
Raises:: TypeError – If any of the contained datasets is not a FeaturesDataset.

interpolate(*args, **kwargs) → None[source]: Interpolate values in-place. See pandas.DataFrame.interpolate().

join(concat_dataset: FeaturesConcatDataset, **kwargs) → None[source]

Join columns with other FeaturesConcatDataset in-place.

Parameters:

concat_dataset (FeaturesConcatDataset) – The dataset to join with. Must have the same number of datasets, and each corresponding dataset must have the same length.
**kwargs – Keyword arguments to pass to pandas.DataFrame.join().

mean(numeric_only: bool = False, n_jobs: int = 1) → Series[source]

Compute the mean for each feature column.

Parameters:

numeric_only (bool, default False) – Include only float, int, boolean columns.
n_jobs (int, default 1) – Number of jobs to run in parallel.

Returns:

The mean of each column.

Return type:

pandas.Series

replace(*args, **kwargs) → None[source]: Replace values in-place. See pandas.DataFrame.replace().

save(path: str, overwrite: bool = False, offset: int = 0) → None[source]

Save the concatenated dataset to a directory.

Creates a directory structure where each contained dataset is saved in its own numbered subdirectory.

path/
    0/
        0-feat.parquet
        metadata_df.pkl
        description.json
        ...
    1/
        1-feat.parquet
        ...

Parameters:

path (str) – The directory where the dataset will be saved.
overwrite (bool, default False) – If True, any existing subdirectories that conflict with the new ones will be removed.
offset (int, default 0) – An integer to add to the subdirectory names. Useful for saving datasets in chunks.

Raises:

ValueError – If the dataset is empty.
FileExistsError – If a subdirectory already exists and overwrite is False.

split(by: str | list[int] | list[list[int]] | dict[str, list[int]]) → dict[str, FeaturesConcatDataset][source]

Split the dataset into subsets.

The splitting can be done based on a column in the description DataFrame or by providing explicit indices for each split.

Parameters:

by (str or list or dict) –

If a string, splits are created for each unique value in the description column by.
If a list of integers, a single split is created containing the datasets at the specified indices.
If a list of lists of integers, multiple splits are created, one for each sublist of indices.
If a dictionary, keys are used as split names and values are lists of dataset indices.

Returns:

A dictionary where keys are split names and values are the new FeaturesConcatDataset subsets.

Return type:

dict[str, FeaturesConcatDataset]

std(ddof: int = 1, numeric_only: bool = False, eps: float = 0, n_jobs: int = 1) → Series[source]

Compute the standard deviation for each feature column.

Parameters:

ddof (int, default 1) – Delta Degrees of Freedom.
numeric_only (bool, default False) – Include only float, int, boolean columns.
eps (float, default 0) – A small epsilon value to add to the variance before taking the square root to avoid numerical instability.
n_jobs (int, default 1) – Number of jobs to run in parallel.

Returns:

The standard deviation of each column.

Return type:

pandas.Series

to_dataframe(include_metadata: bool | str | List[str] = False, include_target: bool = False, include_crop_inds: bool = False) → DataFrame[source]

Convert the dataset to a single pandas DataFrame.

Parameters:

include_metadata (bool or str or list of str, default False) – If True, include all metadata columns. If a string or list of strings, include only the specified metadata columns.
include_target (bool, default False) – If True, include the ‘target’ column.
include_crop_inds (bool, default False) – If True, include window cropping index columns.

Returns:

A DataFrame containing the features and requested metadata.

Return type:

pandas.DataFrame

var(ddof: int = 1, numeric_only: bool = False, n_jobs: int = 1) → Series[source]

Compute the variance for each feature column.

Parameters:

ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof.
numeric_only (bool, default False) – Include only float, int, boolean columns.
n_jobs (int, default 1) – Number of jobs to run in parallel.

Returns:

The variance of each column.

Return type:

pandas.Series

zscore(ddof: int = 1, numeric_only: bool = False, eps: float = 0, n_jobs: int = 1) → None[source]

Apply z-score normalization to numeric columns in-place.

Parameters:

ddof (int, default 1) – Delta Degrees of Freedom for variance calculation.
numeric_only (bool, default False) – Include only float, int, boolean columns.
eps (float, default 0) – Epsilon for numerical stability.
n_jobs (int, default 1) – Number of jobs to run in parallel for statistics computation.

Bases: EEGWindowsDataset

A dataset of features extracted from EEG windows.

This class holds features in a pandas DataFrame and provides an interface compatible with braindecode’s dataset structure. Each row in the feature DataFrame corresponds to a single sample (e.g., an EEG window).

Parameters:

features (pandas.DataFrame) – A DataFrame where each row is a sample and each column is a feature.
metadata (pandas.DataFrame, optional) – A DataFrame containing metadata for each sample, indexed consistently with features. Must include columns ‘i_window_in_trial’, ‘i_start_in_trial’, ‘i_stop_in_trial’, and ‘target’.
description (dict or pandas.Series, optional) – Additional high-level information about the dataset (e.g., subject ID).
transform (callable, optional) – A function or transform to apply to the feature data on-the-fly.
raw_info (dict, optional) – Information about the original raw recording, for provenance.
raw_preproc_kwargs (dict, optional) – Keyword arguments used for preprocessing the raw data.
window_kwargs (dict, optional) – Keyword arguments used for windowing the data.
window_preproc_kwargs (dict, optional) – Keyword arguments used for preprocessing the windowed data.
features_kwargs (dict, optional) – Keyword arguments used for feature extraction.

eegdash.features.datasets module#

This Page