eegdash.features#
Functions
|
Decorator to mark a feature as bivariate. |
Get a list of all available FeatureExtractor classes. |
|
Get a list of all available feature 'kind' classes. |
|
Get a list of all available feature functions. |
|
|
Get the 'kind' of a feature function. |
|
Get the dependency hierarchy for a feature or feature extractor. |
|
Load a stored FeaturesConcatDataset from a directory. |
|
Extract features from a concatenated dataset of windows. |
|
Fit trainable feature extractors on a dataset. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Classes
|
A concatenated dataset of FeaturesDataset objects. |
|
A dataset of features extracted from EEG windows. |
|
A decorator to specify the kind of a feature. |
|
A decorator to specify parent extractors for a feature function. |
|
A feature kind for operations on pairs of channels. |
|
A feature kind for directed operations on pairs of channels. |
|
A composite feature extractor that applies multiple feature functions. |
A mixin for features that operate on multiple channels. |
|
Abstract base class for features that require training. |
|
A feature kind for operations applied to each channel independently. |
|
|
|
|
|
|
|
- class eegdash.features.FeaturesConcatDataset(list_of_ds: list[FeaturesDataset] | None = None, target_transform: Callable | None = None)[source]#
Bases:
BaseConcatDataset
A concatenated dataset of FeaturesDataset objects.
This class holds a list of
FeaturesDataset
instances and allows them to be treated as a single, larger dataset. It provides methods forsplitting, saving, and performing DataFrame-like operations (e.g., mean, var, fillna) across all contained datasets.
- Parameters:
list_of_ds (list of FeaturesDataset) – A list of
FeaturesDataset
objects to concatenate.target_transform (callable, optional) – A function to apply to the target values before they are returned.
- count(numeric_only: bool = False, n_jobs: int = 1) Series [source]#
Count non-NA cells for each feature column.
- Parameters:
numeric_only (bool, default False) – Include only float, int, boolean columns.
n_jobs (int, default 1) – Number of jobs to run in parallel.
- Returns:
The count of non-NA cells for each column.
- Return type:
pandas.Series
- drop(*args, **kwargs) None [source]#
Drop specified labels from rows or columns in-place. See
pandas.DataFrame.drop()
.
- dropna(*args, **kwargs) None [source]#
Remove missing values in-place. See
pandas.DataFrame.dropna()
.
- get_metadata() DataFrame [source]#
Get the metadata of all datasets as a single DataFrame.
Concatenates the metadata from all contained datasets and adds columns from their description attributes.
- Returns:
A DataFrame containing the metadata for every sample in the concatenated dataset.
- Return type:
pandas.DataFrame
- Raises:
TypeError – If any of the contained datasets is not a
FeaturesDataset
.
- interpolate(*args, **kwargs) None [source]#
Interpolate values in-place. See
pandas.DataFrame.interpolate()
.
- join(concat_dataset: FeaturesConcatDataset, **kwargs) None [source]#
Join columns with other FeaturesConcatDataset in-place.
- Parameters:
concat_dataset (FeaturesConcatDataset) – The dataset to join with. Must have the same number of datasets, and each corresponding dataset must have the same length.
**kwargs – Keyword arguments to pass to
pandas.DataFrame.join()
.
- mean(numeric_only: bool = False, n_jobs: int = 1) Series [source]#
Compute the mean for each feature column.
- Parameters:
numeric_only (bool, default False) – Include only float, int, boolean columns.
n_jobs (int, default 1) – Number of jobs to run in parallel.
- Returns:
The mean of each column.
- Return type:
pandas.Series
- save(path: str, overwrite: bool = False, offset: int = 0) None [source]#
Save the concatenated dataset to a directory.
Creates a directory structure where each contained dataset is saved in its own numbered subdirectory.
path/ 0/ 0-feat.parquet metadata_df.pkl description.json ... 1/ 1-feat.parquet ...
- Parameters:
path (str) – The directory where the dataset will be saved.
overwrite (bool, default False) – If True, any existing subdirectories that conflict with the new ones will be removed.
offset (int, default 0) – An integer to add to the subdirectory names. Useful for saving datasets in chunks.
- Raises:
ValueError – If the dataset is empty.
FileExistsError – If a subdirectory already exists and overwrite is False.
- split(by: str | list[int] | list[list[int]] | dict[str, list[int]]) dict[str, FeaturesConcatDataset] [source]#
Split the dataset into subsets.
The splitting can be done based on a column in the description DataFrame or by providing explicit indices for each split.
- Parameters:
by (str or list or dict) –
If a string, splits are created for each unique value in the description column by.
If a list of integers, a single split is created containing the datasets at the specified indices.
If a list of lists of integers, multiple splits are created, one for each sublist of indices.
If a dictionary, keys are used as split names and values are lists of dataset indices.
- Returns:
A dictionary where keys are split names and values are the new
FeaturesConcatDataset
subsets.- Return type:
dict[str, FeaturesConcatDataset]
- std(ddof: int = 1, numeric_only: bool = False, eps: float = 0, n_jobs: int = 1) Series [source]#
Compute the standard deviation for each feature column.
- Parameters:
ddof (int, default 1) – Delta Degrees of Freedom.
numeric_only (bool, default False) – Include only float, int, boolean columns.
eps (float, default 0) – A small epsilon value to add to the variance before taking the square root to avoid numerical instability.
n_jobs (int, default 1) – Number of jobs to run in parallel.
- Returns:
The standard deviation of each column.
- Return type:
pandas.Series
- to_dataframe(include_metadata: bool | str | List[str] = False, include_target: bool = False, include_crop_inds: bool = False) DataFrame [source]#
Convert the dataset to a single pandas DataFrame.
- Parameters:
include_metadata (bool or str or list of str, default False) – If True, include all metadata columns. If a string or list of strings, include only the specified metadata columns.
include_target (bool, default False) – If True, include the ‘target’ column.
include_crop_inds (bool, default False) – If True, include window cropping index columns.
- Returns:
A DataFrame containing the features and requested metadata.
- Return type:
pandas.DataFrame
- var(ddof: int = 1, numeric_only: bool = False, n_jobs: int = 1) Series [source]#
Compute the variance for each feature column.
- Parameters:
ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof.
numeric_only (bool, default False) – Include only float, int, boolean columns.
n_jobs (int, default 1) – Number of jobs to run in parallel.
- Returns:
The variance of each column.
- Return type:
pandas.Series
- zscore(ddof: int = 1, numeric_only: bool = False, eps: float = 0, n_jobs: int = 1) None [source]#
Apply z-score normalization to numeric columns in-place.
- Parameters:
ddof (int, default 1) – Delta Degrees of Freedom for variance calculation.
numeric_only (bool, default False) – Include only float, int, boolean columns.
eps (float, default 0) – Epsilon for numerical stability.
n_jobs (int, default 1) – Number of jobs to run in parallel for statistics computation.
- class eegdash.features.FeaturesDataset(features: DataFrame, metadata: DataFrame | None = None, description: dict | Series | None = None, transform: Callable | None = None, raw_info: Dict | None = None, raw_preproc_kwargs: Dict | None = None, window_kwargs: Dict | None = None, window_preproc_kwargs: Dict | None = None, features_kwargs: Dict | None = None)[source]#
Bases:
EEGWindowsDataset
A dataset of features extracted from EEG windows.
This class holds features in a pandas DataFrame and provides an interface compatible with braindecode’s dataset structure. Each row in the feature DataFrame corresponds to a single sample (e.g., an EEG window).
- Parameters:
features (pandas.DataFrame) – A DataFrame where each row is a sample and each column is a feature.
metadata (pandas.DataFrame, optional) – A DataFrame containing metadata for each sample, indexed consistently with features. Must include columns ‘i_window_in_trial’, ‘i_start_in_trial’, ‘i_stop_in_trial’, and ‘target’.
description (dict or pandas.Series, optional) – Additional high-level information about the dataset (e.g., subject ID).
transform (callable, optional) – A function or transform to apply to the feature data on-the-fly.
raw_info (dict, optional) – Information about the original raw recording, for provenance.
raw_preproc_kwargs (dict, optional) – Keyword arguments used for preprocessing the raw data.
window_kwargs (dict, optional) – Keyword arguments used for windowing the data.
window_preproc_kwargs (dict, optional) – Keyword arguments used for preprocessing the windowed data.
features_kwargs (dict, optional) – Keyword arguments used for feature extraction.
- class eegdash.features.FeatureKind(feature_kind: MultivariateFeature)[source]#
Bases:
object
A decorator to specify the kind of a feature.
This decorator attaches a “feature kind” (e.g., univariate, bivariate) to a feature extraction function.
- Parameters:
feature_kind (MultivariateFeature) – An instance of a feature kind class, such as
UnivariateFeature
orBivariateFeature
.
- class eegdash.features.FeaturePredecessor(*parent_extractor_type: List[Type])[source]#
Bases:
object
A decorator to specify parent extractors for a feature function.
This decorator attaches a list of parent extractor types to a feature extraction function. This information can be used to build a dependency graph of features.
- Parameters:
*parent_extractor_type (list of Type) – A list of feature extractor classes (subclasses of
FeatureExtractor
) that this feature depends on.
- eegdash.features.bivariate_feature(func: Callable, directed: bool = False) Callable [source]#
Decorator to mark a feature as bivariate.
This decorator specifies that the feature operates on pairs of channels.
- Parameters:
func (callable) – The feature extraction function to decorate.
directed (bool, default False) – If True, the feature is directed (e.g., connectivity from channel A to B is different from B to A). If False, the feature is undirected.
- Returns:
The decorated function with the appropriate bivariate feature kind attached.
- Return type:
callable
- class eegdash.features.BivariateFeature(*args, channel_pair_format: str = '{}<>{}')[source]#
Bases:
MultivariateFeature
A feature kind for operations on pairs of channels.
- Parameters:
channel_pair_format (str, default="{}<>{}") – A format string used to create feature names from pairs of channel names.
- class eegdash.features.DirectedBivariateFeature(*args, channel_pair_format: str = '{}<>{}')[source]#
Bases:
BivariateFeature
A feature kind for directed operations on pairs of channels.
- class eegdash.features.FeatureExtractor(feature_extractors: Dict[str, Callable], **preprocess_kwargs: Dict)[source]#
Bases:
TrainableFeature
A composite feature extractor that applies multiple feature functions.
This class orchestrates the application of a dictionary of feature extraction functions to input data. It can handle nested extractors, pre-processing, and trainable features.
- Parameters:
feature_extractors (dict[str, callable]) – A dictionary where keys are feature names and values are the feature extraction functions or other FeatureExtractor instances.
**preprocess_kwargs – Keyword arguments to be passed to the preprocess method.
- class eegdash.features.MultivariateFeature[source]#
Bases:
object
A mixin for features that operate on multiple channels.
This class provides a __call__ method that converts a feature array into a dictionary with named features, where names are derived from channel names.
- class eegdash.features.TrainableFeature[source]#
Bases:
ABC
Abstract base class for features that require training.
This ABC defines the interface for feature extractors that need to be fitted on data before they can be used. It includes methods for fitting the feature extractor and for resetting its state.
- class eegdash.features.UnivariateFeature[source]#
Bases:
MultivariateFeature
A feature kind for operations applied to each channel independently.
- eegdash.features.get_all_feature_extractors() list[tuple[str, type[FeatureExtractor]]] [source]#
Get a list of all available FeatureExtractor classes.
Scans the eegdash.features.feature_bank module for all classes that subclass
FeatureExtractor
.- Returns:
A list of (name, class) tuples for all discovered feature extractors, including the base FeatureExtractor itself.
- Return type:
list[tuple[str, type[FeatureExtractor]]]
- eegdash.features.get_all_feature_kinds() list[tuple[str, type[MultivariateFeature]]] [source]#
Get a list of all available feature ‘kind’ classes.
Scans the eegdash.features.extractors module for all classes that subclass
MultivariateFeature
.- Returns:
A list of (name, class) tuples for all discovered feature kinds.
- Return type:
list[tuple[str, type[MultivariateFeature]]]
- eegdash.features.get_all_features() list[tuple[str, Callable]] [source]#
Get a list of all available feature functions.
Scans the eegdash.features.feature_bank module for functions that have been decorated to have a feature_kind attribute.
- Returns:
A list of (name, function) tuples for all discovered features.
- Return type:
list[tuple[str, callable]]
- eegdash.features.get_feature_kind(feature: Callable) MultivariateFeature [source]#
Get the ‘kind’ of a feature function.
The feature kind (e.g., univariate, bivariate) is typically attached by a decorator.
- Parameters:
feature (callable) – The feature function to inspect.
- Returns:
An instance of the feature kind (e.g., UnivariateFeature()).
- Return type:
- eegdash.features.get_feature_predecessors(feature_or_extractor: Callable) list [source]#
Get the dependency hierarchy for a feature or feature extractor.
This function recursively traverses the parent_extractor_type attribute of a feature or extractor to build a list representing its dependency lineage.
- Parameters:
feature_or_extractor (callable) – The feature function or
FeatureExtractor
class to inspect.- Returns:
A nested list representing the dependency tree. For a simple linear chain, this will be a flat list from the specific feature up to the base FeatureExtractor. For multiple dependencies, it will contain tuples of sub-dependencies.
- Return type:
list
- eegdash.features.load_features_concat_dataset(path: str | Path, ids_to_load: list[int] | None = None, n_jobs: int = 1) FeaturesConcatDataset [source]#
Load a stored FeaturesConcatDataset from a directory.
This function reconstructs a
FeaturesConcatDataset
by loading individualFeaturesDataset
instances from subdirectories within the given path. It uses joblib for parallel loading.- Parameters:
path (str or pathlib.Path) – The path to the directory where the dataset was saved. This directory should contain subdirectories (e.g., “0”, “1”, “2”, …) for each individual dataset.
ids_to_load (list of int, optional) – A list of specific dataset IDs (subdirectory names) to load. If None, all subdirectories in the path will be loaded.
n_jobs (int, default 1) – The number of jobs to use for parallel loading. -1 means using all processors.
- Returns:
A concatenated dataset containing the loaded FeaturesDataset instances.
- Return type:
- eegdash.features.extract_features(concat_dataset: BaseConcatDataset, features: FeatureExtractor | Dict[str, Callable] | List[Callable], *, batch_size: int = 512, n_jobs: int = 1) FeaturesConcatDataset [source]#
Extract features from a concatenated dataset of windows.
This function applies a feature extractor to each WindowsDataset within a BaseConcatDataset in parallel and returns a FeaturesConcatDataset with the results.
- Parameters:
concat_dataset (BaseConcatDataset) – A concatenated dataset of WindowsDataset or EEGWindowsDataset instances.
features (FeatureExtractor or dict or list) – The feature extractor(s) to apply. Can be a FeatureExtractor instance, a dictionary of named feature functions, or a list of feature functions.
batch_size (int, default 512) – The size of batches to use for feature extraction.
n_jobs (int, default 1) – The number of parallel jobs to use for extracting features from the datasets.
- Returns:
A new concatenated dataset containing the extracted features.
- Return type:
- eegdash.features.fit_feature_extractors(concat_dataset: BaseConcatDataset, features: FeatureExtractor | Dict[str, Callable] | List[Callable], batch_size: int = 8192) FeatureExtractor [source]#
Fit trainable feature extractors on a dataset.
If the provided feature extractor (or any of its sub-extractors) is trainable (i.e., subclasses TrainableFeature), this function iterates through the dataset to fit it.
- Parameters:
concat_dataset (BaseConcatDataset) – The dataset to use for fitting the feature extractors.
features (FeatureExtractor or dict or list) – The feature extractor(s) to fit.
batch_size (int, default 8192) – The batch size to use when iterating through the dataset for fitting.
- Returns:
The fitted feature extractor.
- Return type:
- class eegdash.features.EntropyFeatureExtractor(feature_extractors: Dict[str, Callable], **preprocess_kwargs: Dict)[source]#
Bases:
FeatureExtractor
- parent_extractor_type = (<class 'eegdash.features.extractors.FeatureExtractor'>, <class 'eegdash.features.feature_bank.signal.HilbertFeatureExtractor'>)#
- class eegdash.features.CoherenceFeatureExtractor(feature_extractors: Dict[str, Callable], **preprocess_kwargs: Dict)[source]#
Bases:
FeatureExtractor
- eegdash.features.connectivity_magnitude_square_coherence(f, c, bands={'alpha': (8, 12), 'beta': (12, 30), 'delta': (1, 4.5), 'theta': (4.5, 8)})[source]#
- eegdash.features.connectivity_imaginary_coherence(f, c, bands={'alpha': (8, 12), 'beta': (12, 30), 'delta': (1, 4.5), 'theta': (4.5, 8)})[source]#
- eegdash.features.connectivity_lagged_coherence(f, c, bands={'alpha': (8, 12), 'beta': (12, 30), 'delta': (1, 4.5), 'theta': (4.5, 8)})[source]#
- class eegdash.features.CommonSpatialPattern[source]#
Bases:
TrainableFeature
- feature_kind = <eegdash.features.extractors.MultivariateFeature object>#
- fit()[source]#
Finalize the training of the feature extractor.
This method should be called after all data has been seen via partial_fit. It marks the feature as fitted.
- class eegdash.features.HilbertFeatureExtractor(feature_extractors: Dict[str, Callable], **preprocess_kwargs: Dict)[source]#
Bases:
FeatureExtractor
- parent_extractor_type = (<class 'eegdash.features.extractors.FeatureExtractor'>,)#
- class eegdash.features.SpectralFeatureExtractor(feature_extractors: Dict[str, Callable], **preprocess_kwargs: Dict)[source]#
Bases:
FeatureExtractor
- class eegdash.features.NormalizedSpectralFeatureExtractor(feature_extractors: Dict[str, Callable], **preprocess_kwargs: Dict)[source]#
Bases:
FeatureExtractor
- parent_extractor_type = (<class 'eegdash.features.feature_bank.spectral.SpectralFeatureExtractor'>,)#
- class eegdash.features.DBSpectralFeatureExtractor(feature_extractors: Dict[str, Callable], **preprocess_kwargs: Dict)[source]#
Bases:
FeatureExtractor
- parent_extractor_type = (<class 'eegdash.features.feature_bank.spectral.SpectralFeatureExtractor'>,)#