eegdash.features.datasets module#

class eegdash.features.datasets.FeaturesConcatDataset(list_of_ds: list[FeaturesDataset] | None = None, target_transform: Callable | None = None)[source]

Bases: BaseConcatDataset

A concatenated dataset of FeaturesDataset objects.

This class holds a list of FeaturesDataset instances and allows them to be treated as a single, larger dataset. It provides methods for

splitting, saving, and performing DataFrame-like operations (e.g., mean, var, fillna) across all contained datasets.

Parameters:
  • list_of_ds (list of FeaturesDataset) – A list of FeaturesDataset objects to concatenate.

  • target_transform (callable, optional) – A function to apply to the target values before they are returned.

count(numeric_only: bool = False, n_jobs: int = 1) Series[source]

Count non-NA cells for each feature column.

Parameters:
  • numeric_only (bool, default False) – Include only float, int, boolean columns.

  • n_jobs (int, default 1) – Number of jobs to run in parallel.

Returns:

The count of non-NA cells for each column.

Return type:

pandas.Series

drop(*args, **kwargs) None[source]

Drop specified labels from rows or columns in-place. See pandas.DataFrame.drop().

dropna(*args, **kwargs) None[source]

Remove missing values in-place. See pandas.DataFrame.dropna().

fillna(*args, **kwargs) None[source]

Fill NA/NaN values in-place. See pandas.DataFrame.fillna().

get_metadata() DataFrame[source]

Get the metadata of all datasets as a single DataFrame.

Concatenates the metadata from all contained datasets and adds columns from their description attributes.

Returns:

A DataFrame containing the metadata for every sample in the concatenated dataset.

Return type:

pandas.DataFrame

Raises:

TypeError – If any of the contained datasets is not a FeaturesDataset.

interpolate(*args, **kwargs) None[source]

Interpolate values in-place. See pandas.DataFrame.interpolate().

join(concat_dataset: FeaturesConcatDataset, **kwargs) None[source]

Join columns with other FeaturesConcatDataset in-place.

Parameters:
  • concat_dataset (FeaturesConcatDataset) – The dataset to join with. Must have the same number of datasets, and each corresponding dataset must have the same length.

  • **kwargs – Keyword arguments to pass to pandas.DataFrame.join().

mean(numeric_only: bool = False, n_jobs: int = 1) Series[source]

Compute the mean for each feature column.

Parameters:
  • numeric_only (bool, default False) – Include only float, int, boolean columns.

  • n_jobs (int, default 1) – Number of jobs to run in parallel.

Returns:

The mean of each column.

Return type:

pandas.Series

replace(*args, **kwargs) None[source]

Replace values in-place. See pandas.DataFrame.replace().

save(path: str, overwrite: bool = False, offset: int = 0) None[source]

Save the concatenated dataset to a directory.

Creates a directory structure where each contained dataset is saved in its own numbered subdirectory.

path/
    0/
        0-feat.parquet
        metadata_df.pkl
        description.json
        ...
    1/
        1-feat.parquet
        ...
Parameters:
  • path (str) – The directory where the dataset will be saved.

  • overwrite (bool, default False) – If True, any existing subdirectories that conflict with the new ones will be removed.

  • offset (int, default 0) – An integer to add to the subdirectory names. Useful for saving datasets in chunks.

Raises:
  • ValueError – If the dataset is empty.

  • FileExistsError – If a subdirectory already exists and overwrite is False.

split(by: str | list[int] | list[list[int]] | dict[str, list[int]]) dict[str, FeaturesConcatDataset][source]

Split the dataset into subsets.

The splitting can be done based on a column in the description DataFrame or by providing explicit indices for each split.

Parameters:

by (str or list or dict) –

  • If a string, splits are created for each unique value in the description column by.

  • If a list of integers, a single split is created containing the datasets at the specified indices.

  • If a list of lists of integers, multiple splits are created, one for each sublist of indices.

  • If a dictionary, keys are used as split names and values are lists of dataset indices.

Returns:

A dictionary where keys are split names and values are the new FeaturesConcatDataset subsets.

Return type:

dict[str, FeaturesConcatDataset]

std(ddof: int = 1, numeric_only: bool = False, eps: float = 0, n_jobs: int = 1) Series[source]

Compute the standard deviation for each feature column.

Parameters:
  • ddof (int, default 1) – Delta Degrees of Freedom.

  • numeric_only (bool, default False) – Include only float, int, boolean columns.

  • eps (float, default 0) – A small epsilon value to add to the variance before taking the square root to avoid numerical instability.

  • n_jobs (int, default 1) – Number of jobs to run in parallel.

Returns:

The standard deviation of each column.

Return type:

pandas.Series

to_dataframe(include_metadata: bool | str | List[str] = False, include_target: bool = False, include_crop_inds: bool = False) DataFrame[source]

Convert the dataset to a single pandas DataFrame.

Parameters:
  • include_metadata (bool or str or list of str, default False) – If True, include all metadata columns. If a string or list of strings, include only the specified metadata columns.

  • include_target (bool, default False) – If True, include the ‘target’ column.

  • include_crop_inds (bool, default False) – If True, include window cropping index columns.

Returns:

A DataFrame containing the features and requested metadata.

Return type:

pandas.DataFrame

var(ddof: int = 1, numeric_only: bool = False, n_jobs: int = 1) Series[source]

Compute the variance for each feature column.

Parameters:
  • ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof.

  • numeric_only (bool, default False) – Include only float, int, boolean columns.

  • n_jobs (int, default 1) – Number of jobs to run in parallel.

Returns:

The variance of each column.

Return type:

pandas.Series

zscore(ddof: int = 1, numeric_only: bool = False, eps: float = 0, n_jobs: int = 1) None[source]

Apply z-score normalization to numeric columns in-place.

Parameters:
  • ddof (int, default 1) – Delta Degrees of Freedom for variance calculation.

  • numeric_only (bool, default False) – Include only float, int, boolean columns.

  • eps (float, default 0) – Epsilon for numerical stability.

  • n_jobs (int, default 1) – Number of jobs to run in parallel for statistics computation.

class eegdash.features.datasets.FeaturesDataset(features: DataFrame, metadata: DataFrame | None = None, description: dict | Series | None = None, transform: Callable | None = None, raw_info: Dict | None = None, raw_preproc_kwargs: Dict | None = None, window_kwargs: Dict | None = None, window_preproc_kwargs: Dict | None = None, features_kwargs: Dict | None = None)[source]

Bases: EEGWindowsDataset

A dataset of features extracted from EEG windows.

This class holds features in a pandas DataFrame and provides an interface compatible with braindecode’s dataset structure. Each row in the feature DataFrame corresponds to a single sample (e.g., an EEG window).

Parameters:
  • features (pandas.DataFrame) – A DataFrame where each row is a sample and each column is a feature.

  • metadata (pandas.DataFrame, optional) – A DataFrame containing metadata for each sample, indexed consistently with features. Must include columns ‘i_window_in_trial’, ‘i_start_in_trial’, ‘i_stop_in_trial’, and ‘target’.

  • description (dict or pandas.Series, optional) – Additional high-level information about the dataset (e.g., subject ID).

  • transform (callable, optional) – A function or transform to apply to the feature data on-the-fly.

  • raw_info (dict, optional) – Information about the original raw recording, for provenance.

  • raw_preproc_kwargs (dict, optional) – Keyword arguments used for preprocessing the raw data.

  • window_kwargs (dict, optional) – Keyword arguments used for windowing the data.

  • window_preproc_kwargs (dict, optional) – Keyword arguments used for preprocessing the windowed data.

  • features_kwargs (dict, optional) – Keyword arguments used for feature extraction.