eegdash.features.datasets module#
- class eegdash.features.datasets.FeaturesConcatDataset(list_of_ds: list[FeaturesDataset] | None = None, target_transform: Callable | None = None)[source]
Bases:
BaseConcatDataset
A concatenated dataset of FeaturesDataset objects.
This class holds a list of
FeaturesDataset
instances and allows them to be treated as a single, larger dataset. It provides methods forsplitting, saving, and performing DataFrame-like operations (e.g., mean, var, fillna) across all contained datasets.
- Parameters:
list_of_ds (list of FeaturesDataset) – A list of
FeaturesDataset
objects to concatenate.target_transform (callable, optional) – A function to apply to the target values before they are returned.
- count(numeric_only: bool = False, n_jobs: int = 1) Series [source]
Count non-NA cells for each feature column.
- Parameters:
numeric_only (bool, default False) – Include only float, int, boolean columns.
n_jobs (int, default 1) – Number of jobs to run in parallel.
- Returns:
The count of non-NA cells for each column.
- Return type:
pandas.Series
- drop(*args, **kwargs) None [source]
Drop specified labels from rows or columns in-place. See
pandas.DataFrame.drop()
.
- dropna(*args, **kwargs) None [source]
Remove missing values in-place. See
pandas.DataFrame.dropna()
.
- fillna(*args, **kwargs) None [source]
Fill NA/NaN values in-place. See
pandas.DataFrame.fillna()
.
- get_metadata() DataFrame [source]
Get the metadata of all datasets as a single DataFrame.
Concatenates the metadata from all contained datasets and adds columns from their description attributes.
- Returns:
A DataFrame containing the metadata for every sample in the concatenated dataset.
- Return type:
pandas.DataFrame
- Raises:
TypeError – If any of the contained datasets is not a
FeaturesDataset
.
- interpolate(*args, **kwargs) None [source]
Interpolate values in-place. See
pandas.DataFrame.interpolate()
.
- join(concat_dataset: FeaturesConcatDataset, **kwargs) None [source]
Join columns with other FeaturesConcatDataset in-place.
- Parameters:
concat_dataset (FeaturesConcatDataset) – The dataset to join with. Must have the same number of datasets, and each corresponding dataset must have the same length.
**kwargs – Keyword arguments to pass to
pandas.DataFrame.join()
.
- mean(numeric_only: bool = False, n_jobs: int = 1) Series [source]
Compute the mean for each feature column.
- Parameters:
numeric_only (bool, default False) – Include only float, int, boolean columns.
n_jobs (int, default 1) – Number of jobs to run in parallel.
- Returns:
The mean of each column.
- Return type:
pandas.Series
- replace(*args, **kwargs) None [source]
Replace values in-place. See
pandas.DataFrame.replace()
.
- save(path: str, overwrite: bool = False, offset: int = 0) None [source]
Save the concatenated dataset to a directory.
Creates a directory structure where each contained dataset is saved in its own numbered subdirectory.
path/ 0/ 0-feat.parquet metadata_df.pkl description.json ... 1/ 1-feat.parquet ...
- Parameters:
path (str) – The directory where the dataset will be saved.
overwrite (bool, default False) – If True, any existing subdirectories that conflict with the new ones will be removed.
offset (int, default 0) – An integer to add to the subdirectory names. Useful for saving datasets in chunks.
- Raises:
ValueError – If the dataset is empty.
FileExistsError – If a subdirectory already exists and overwrite is False.
- split(by: str | list[int] | list[list[int]] | dict[str, list[int]]) dict[str, FeaturesConcatDataset] [source]
Split the dataset into subsets.
The splitting can be done based on a column in the description DataFrame or by providing explicit indices for each split.
- Parameters:
by (str or list or dict) –
If a string, splits are created for each unique value in the description column by.
If a list of integers, a single split is created containing the datasets at the specified indices.
If a list of lists of integers, multiple splits are created, one for each sublist of indices.
If a dictionary, keys are used as split names and values are lists of dataset indices.
- Returns:
A dictionary where keys are split names and values are the new
FeaturesConcatDataset
subsets.- Return type:
dict[str, FeaturesConcatDataset]
- std(ddof: int = 1, numeric_only: bool = False, eps: float = 0, n_jobs: int = 1) Series [source]
Compute the standard deviation for each feature column.
- Parameters:
ddof (int, default 1) – Delta Degrees of Freedom.
numeric_only (bool, default False) – Include only float, int, boolean columns.
eps (float, default 0) – A small epsilon value to add to the variance before taking the square root to avoid numerical instability.
n_jobs (int, default 1) – Number of jobs to run in parallel.
- Returns:
The standard deviation of each column.
- Return type:
pandas.Series
- to_dataframe(include_metadata: bool | str | List[str] = False, include_target: bool = False, include_crop_inds: bool = False) DataFrame [source]
Convert the dataset to a single pandas DataFrame.
- Parameters:
include_metadata (bool or str or list of str, default False) – If True, include all metadata columns. If a string or list of strings, include only the specified metadata columns.
include_target (bool, default False) – If True, include the ‘target’ column.
include_crop_inds (bool, default False) – If True, include window cropping index columns.
- Returns:
A DataFrame containing the features and requested metadata.
- Return type:
pandas.DataFrame
- var(ddof: int = 1, numeric_only: bool = False, n_jobs: int = 1) Series [source]
Compute the variance for each feature column.
- Parameters:
ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof.
numeric_only (bool, default False) – Include only float, int, boolean columns.
n_jobs (int, default 1) – Number of jobs to run in parallel.
- Returns:
The variance of each column.
- Return type:
pandas.Series
- zscore(ddof: int = 1, numeric_only: bool = False, eps: float = 0, n_jobs: int = 1) None [source]
Apply z-score normalization to numeric columns in-place.
- Parameters:
ddof (int, default 1) – Delta Degrees of Freedom for variance calculation.
numeric_only (bool, default False) – Include only float, int, boolean columns.
eps (float, default 0) – Epsilon for numerical stability.
n_jobs (int, default 1) – Number of jobs to run in parallel for statistics computation.
- class eegdash.features.datasets.FeaturesDataset(features: DataFrame, metadata: DataFrame | None = None, description: dict | Series | None = None, transform: Callable | None = None, raw_info: Dict | None = None, raw_preproc_kwargs: Dict | None = None, window_kwargs: Dict | None = None, window_preproc_kwargs: Dict | None = None, features_kwargs: Dict | None = None)[source]
Bases:
EEGWindowsDataset
A dataset of features extracted from EEG windows.
This class holds features in a pandas DataFrame and provides an interface compatible with braindecode’s dataset structure. Each row in the feature DataFrame corresponds to a single sample (e.g., an EEG window).
- Parameters:
features (pandas.DataFrame) – A DataFrame where each row is a sample and each column is a feature.
metadata (pandas.DataFrame, optional) – A DataFrame containing metadata for each sample, indexed consistently with features. Must include columns ‘i_window_in_trial’, ‘i_start_in_trial’, ‘i_stop_in_trial’, and ‘target’.
description (dict or pandas.Series, optional) – Additional high-level information about the dataset (e.g., subject ID).
transform (callable, optional) – A function or transform to apply to the feature data on-the-fly.
raw_info (dict, optional) – Information about the original raw recording, for provenance.
raw_preproc_kwargs (dict, optional) – Keyword arguments used for preprocessing the raw data.
window_kwargs (dict, optional) – Keyword arguments used for windowing the data.
window_preproc_kwargs (dict, optional) – Keyword arguments used for preprocessing the windowed data.
features_kwargs (dict, optional) – Keyword arguments used for feature extraction.