eegdash package#
Subpackages#
Submodules#
Module contents#
EEGDash: A comprehensive platform for EEG data management and analysis.
EEGDash provides a unified interface for accessing, querying, and analyzing large-scale EEG datasets. It integrates with cloud storage and REST APIs to streamline EEG research workflows.
- exception eegdash.DataIntegrityError(message: str, record: dict[str, Any] | None = None, issues: list[str] | None = None, authors: list[str] | None = None, contact_info: list[str] | None = None, source_url: str | None = None)[source]
Bases:
EEGDashErrorRaised when a dataset record has known data integrity issues.
This exception is raised when attempting to load a record that has been flagged during ingestion as having missing companion files or other integrity problems.
- record
The problematic record metadata.
- Type:
dict
- issues
List of specific integrity issues found.
- Type:
list[str]
- authors
Dataset authors who can be contacted about the issue.
- Type:
list[str]
- contact_info
Contact information for reporting the issue.
- Type:
list[str] | None
- source_url
URL to the dataset source for reporting issues.
- Type:
str | None
Examples
>>> try: ... dataset.raw # Attempt to load data ... except DataIntegrityError as e: ... print(f"Cannot load: {e.issues}") ... print(f"Contact authors: {e.authors}")
- classmethod from_record(record: dict[str, Any]) DataIntegrityError[source]
Create a DataIntegrityError from a record with integrity issues.
- Parameters:
record (dict) – Record containing
_data_integrity_issuesand optionally_dataset_authors,_dataset_contact,_source_url.- Returns:
Exception with all relevant context.
- Return type:
DataIntegrityError
- log_error() None[source]
Log the error using the EEGDash logger with rich formatting.
- log_warning() None[source]
Log the integrity issues as warnings (non-blocking).
- print_rich(console: Console | None = None) None[source]
Print a rich formatted version of the error to the console.
- Parameters:
console (Console, optional) – Rich console to print to. If None, creates a new one.
- classmethod warn_from_record(record: dict[str, Any]) None[source]
Log a warning about data integrity issues without raising an exception.
Use this when you want to warn about issues but still allow loading.
- Parameters:
record (dict) – Record containing
_data_integrity_issuesand optionally_dataset_authors,_dataset_contact,_source_url.
- class eegdash.EEGChallengeDataset(release: str, cache_dir: str, mini: bool = True, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]
Bases:
EEGDashDatasetA dataset helper for the EEG 2025 Challenge.
This class simplifies access to the EEG 2025 Challenge datasets. It is a specialized version of
EEGDashDatasetthat is pre-configured for the challenge’s data releases. It automatically maps a release name (e.g., “R1”) to the corresponding OpenNeuro dataset and handles the selection of subject subsets (e.g., “mini” release).- Parameters:
release (str) – The name of the challenge release to load. Must be one of the keys in
RELEASE_TO_OPENNEURO_DATASET_MAP(e.g., “R1”, “R2”, …, “R11”).cache_dir (str) – The local directory where the dataset will be downloaded and cached.
mini (bool, default True) – If True, the dataset is restricted to the official “mini” subset of subjects for the specified release. If False, all subjects for the release are included.
query (dict, optional) – An additional MongoDB-style query to apply as a filter. This query is combined with the release and subject filters using a logical AND. The query must not contain the
datasetkey, as this is determined by thereleaseparameter.s3_bucket (str, optional) – The base S3 bucket URI where the challenge data is stored. Defaults to the official challenge bucket.
**kwargs – Additional keyword arguments that are passed directly to the
EEGDashDatasetconstructor.
- Raises:
ValueError – If the specified
releaseis unknown, or if thequeryargument contains adatasetkey. Also raised ifminiis True and a requested subject is not part of the official mini-release subset.
See also
EEGDashDatasetThe base class for creating datasets from queries.
- class eegdash.EEGDash(*, database: str = 'eegdash', api_url: str | None = None, auth_token: str | None = None)[source]
Bases:
objectHigh-level interface to the EEGDash metadata database.
Provides methods to query, insert, and update metadata records stored in the EEGDash database via REST API gateway.
For working with collections of recordings as PyTorch datasets, prefer
EEGDashDataset.Create a new EEGDash client.
- Parameters:
database (str, default "eegdash") – Name of the MongoDB database to connect to. Common values:
"eegdash"(production),"eegdash_staging"(staging),"eegdash_v1"(legacy archive).api_url (str, optional) – Override the default API URL. If not provided, uses the default public endpoint or the
EEGDASH_API_URLenvironment variable.auth_token (str, optional) – Authentication token for admin write operations. Not required for public read operations.
Examples
>>> eegdash = EEGDash() # production >>> eegdash = EEGDash(database="eegdash_staging") # staging >>> records = eegdash.find({"dataset": "ds002718"})
- count(query: dict[str, Any] = None, /, **kwargs) int[source]
Count documents matching the query.
- Parameters:
query (dict, optional) – Complete query dictionary. This is a positional-only argument.
**kwargs – User-friendly field filters (same as find()).
- Returns:
Number of matching documents.
- Return type:
int
Examples
>>> eeg = EEGDash() >>> count = eeg.count({}) # count all >>> count = eeg.count(dataset="ds002718") # count by dataset
- exists(query: dict[str, Any] = None, /, **kwargs) bool[source]
Check if at least one record matches the query.
- Parameters:
query (dict, optional) – Complete query dictionary. This is a positional-only argument.
**kwargs – User-friendly field filters (same as find()).
- Returns:
True if at least one matching record exists; False otherwise.
- Return type:
bool
Examples
>>> eeg = EEGDash() >>> eeg.exists(dataset="ds002718") # check by dataset >>> eeg.exists({"data_name": "ds002718_sub-001_eeg.set"}) # check by data_name
- find(query: dict[str, Any] = None, /, **kwargs) list[Mapping[str, Any]][source]
Find records in the collection.
Examples
>>> from eegdash import EEGDash >>> eegdash = EEGDash() >>> eegdash.find({"dataset": "ds002718", "subject": {"$in": ["012", "013"]}}) # pre-built query >>> eegdash.find(dataset="ds002718", subject="012") # keyword filters >>> eegdash.find(dataset="ds002718", subject=["012", "013"]) # sequence -> $in >>> eegdash.find({}) # fetch all (use with care) >>> eegdash.find({"dataset": "ds002718"}, subject=["012", "013"]) # combine query + kwargs (AND)
- Parameters:
query (dict, optional) – Complete MongoDB query dictionary. This is a positional-only argument.
**kwargs – User-friendly field filters that are converted to a MongoDB query. Values can be scalars (e.g.,
"sub-01") or sequences (translated to$inqueries). Special parameters:limit(int) andskip(int) for pagination.
- Returns:
DB records that match the query.
- Return type:
list of dict
- find_datasets(query: dict[str, Any] | None = None, limit: int = 1000) list[Mapping[str, Any]][source]
Find datasets matching query.
- Parameters:
query (dict) – Filter query.
limit (int) – Max number of datasets to return.
- Returns:
List of dataset metadata documents.
- Return type:
list of dict
- find_one(query: dict[str, Any] = None, /, **kwargs) Mapping[str, Any] | None[source]
Find a single record matching the query.
- Parameters:
query (dict, optional) – Complete query dictionary. This is a positional-only argument.
**kwargs – User-friendly field filters (same as find()).
- Returns:
The first matching record, or None if no match.
- Return type:
dict or None
Examples
>>> eeg = EEGDash() >>> record = eeg.find_one(data_name="ds002718_sub-001_eeg.set")
- get_dataset(dataset_id: str) Mapping[str, Any] | None[source]
Fetch metadata for a specific dataset.
- Parameters:
dataset_id (str) – The unique identifier of the dataset (e.g., ‘ds002718’).
- Returns:
The dataset metadata document, or None if not found.
- Return type:
dict or None
- insert(records: dict[str, Any] | list[dict[str, Any]]) int[source]
Insert one or more records (requires auth_token).
- Parameters:
records (dict or list of dict) – A single record or list of records to insert.
- Returns:
Number of records inserted.
- Return type:
int
Examples
>>> eeg = EEGDash(auth_token="...") >>> eeg.insert({"dataset": "ds001", "subject": "01", ...}) # single >>> eeg.insert([record1, record2, record3]) # batch
- update_dataset(dataset_id: str, update: dict[str, Any]) int[source]
Update metadata for a specific dataset (requires auth_token).
- Parameters:
dataset_id (str) – The unique identifier of the dataset (e.g., ‘ds002718’).
update (dict) – Dictionary of fields to update.
- Returns:
Number of documents modified (0 or 1).
- Return type:
int
Examples
>>> eeg = EEGDash(auth_token="...") >>> eeg.update_dataset("ds002718", {"clinical.is_clinical": True})
- update_field(query: dict[str, Any] = None, /, *, update: dict[str, Any], **kwargs) tuple[int, int][source]
Update fields on records matching the query (requires auth_token).
Use this to add or modify fields across matching records, e.g., after re-extracting entities with an improved algorithm.
- Parameters:
query (dict, optional) – Filter query to match records. This is a positional-only argument.
update (dict) – Fields to update. Keys are field names, values are new values.
**kwargs – User-friendly field filters (same as find()).
- Returns:
Number of records matched and actually modified.
- Return type:
tuple of (matched_count, modified_count)
Examples
>>> eeg = EEGDash(auth_token="...") >>> # Update entities for all records in a dataset >>> eeg.update_field({"dataset": "ds002718"}, update={"entities": {"subject": "01"}}) >>> # Using kwargs for filter >>> eeg.update_field(dataset="ds002718", update={"entities": new_entities}) >>> # Combine query + kwargs >>> eeg.update_field({"dataset": "ds002718"}, subject="01", update={"entities": new_entities})
- class eegdash.EEGDashDataset(cache_dir: str | Path, query: dict[str, Any] = None, description_fields: list[str] | None = None, s3_bucket: str | None = None, records: list[dict] | None = None, download: bool = True, n_jobs: int = -1, eeg_dash_instance: Any = None, database: str | None = None, auth_token: str | None = None, on_error: str = 'raise', **kwargs)[source]
Bases:
BaseConcatDatasetCreate a new EEGDashDataset from a given query or local BIDS dataset directory and dataset name. An EEGDashDataset is pooled collection of EEGDashBaseDataset instances (individual recordings) and is a subclass of braindecode’s BaseConcatDataset.
Examples
Basic usage with dataset and subject filtering:
>>> from eegdash import EEGDashDataset >>> dataset = EEGDashDataset( ... cache_dir="./data", ... dataset="ds002718", ... subject="012" ... ) >>> print(f"Number of recordings: {len(dataset)}")
Filter by multiple subjects and specific task:
>>> subjects = ["012", "013", "014"] >>> dataset = EEGDashDataset( ... cache_dir="./data", ... dataset="ds002718", ... subject=subjects, ... task="RestingState" ... )
Load and inspect EEG data from recordings:
>>> if len(dataset) > 0: ... recording = dataset[0] ... raw = recording.load() ... print(f"Sampling rate: {raw.info['sfreq']} Hz") ... print(f"Number of channels: {len(raw.ch_names)}") ... print(f"Duration: {raw.times[-1]:.1f} seconds")
Advanced filtering with raw MongoDB queries:
>>> from eegdash import EEGDashDataset >>> query = { ... "dataset": "ds002718", ... "subject": {"$in": ["012", "013"]}, ... "task": "RestingState" ... } >>> dataset = EEGDashDataset(cache_dir="./data", query=query)
Working with dataset collections and braindecode integration:
>>> # EEGDashDataset is a braindecode BaseConcatDataset >>> for i, recording in enumerate(dataset): ... if i >= 2: # limit output ... break ... print(f"Recording {i}: {recording.description}") ... raw = recording.load() ... print(f" Channels: {len(raw.ch_names)}, Duration: {raw.times[-1]:.1f}s")
- Parameters:
cache_dir (str | Path) – Directory where data are cached locally.
query (dict | None) – Raw MongoDB query to filter records. If provided, it is merged with keyword filtering arguments (see
**kwargs) using logical AND. You must provide at least adataset(either inqueryor as a keyword argument). Only fields inALLOWED_QUERY_FIELDSare considered for filtering.dataset (str) – Dataset identifier (e.g.,
"ds002718"). Required ifquerydoes not already specify a dataset.task (str | list[str]) – Task name(s) to filter by (e.g.,
"RestingState").subject (str | list[str]) – Subject identifier(s) to filter by (e.g.,
"NDARCA153NKE").session (str | list[str]) – Session identifier(s) to filter by (e.g.,
"1").run (str | list[str]) – Run identifier(s) to filter by (e.g.,
"1").description_fields (list[str]) – Fields to extract from each record and include in dataset descriptions (e.g., “subject”, “session”, “run”, “task”).
s3_bucket (str | None) – Optional S3 bucket URI (e.g., “s3://mybucket”) to use instead of the default OpenNeuro bucket when downloading data files.
records (list[dict] | None) – Pre-fetched metadata records. If provided, the dataset is constructed directly from these records and no MongoDB query is performed.
download (bool, default True) – If False, load from local BIDS files only. Local data are expected under
cache_dir / dataset; no DB or S3 access is attempted.n_jobs (int) – Number of parallel jobs to use where applicable (-1 uses all cores).
eeg_dash_instance (EEGDash | None) – Optional existing EEGDash client to reuse for DB queries. If None, a new client is created on demand, not used in the case of no download.
database (str | None) – Database name to use (e.g., “eegdash”, “eegdash_staging”). If None, uses the default database.
auth_token (str | None) – Authentication token for accessing protected databases. Required for staging or admin operations.
on_error (str, default "raise") –
How to handle
DataIntegrityErrorwhen accessing.rawon individual recordings:"raise"(default): propagate the exception."warn": log the error as a warning and set.rawtoNone."skip": silently set.rawtoNone.
Use
drop_bad()after iteration to remove skipped recordings.**kwargs (dict) –
Additional keyword arguments serving two purposes:
Filtering: any keys present in
ALLOWED_QUERY_FIELDSare treated as query filters (e.g.,dataset,subject,task, …).Dataset options: remaining keys are forwarded to
EEGDashRaw.
- property cumulative_sizes: list[int]
Recompute cumulative sizes from current dataset lengths.
Overrides the cached version from BaseConcatDataset because individual dataset lengths can change after lazy raw loading (estimated ntimes from JSON metadata may differ from actual n_times in the raw file).
- download_all(n_jobs: int | None = None) None[source]
Download missing remote files in parallel.
- Parameters:
n_jobs (int | None) – Number of parallel workers to use. If None, defaults to
self.n_jobs.
- drop_bad() list[dict][source]
Remove skipped datasets and return their records.
Call after accessing
.rawon all datasets (e.g. after iteration or preprocessing) to clean up the dataset list.- Returns:
Records that were removed because loading failed.
- Return type:
list of dict
- drop_short(min_samples: int) list[dict][source]
Remove recordings shorter than min_samples and return their records.
This is useful when downstream processing (e.g., fixed-length windowing) requires a minimum number of samples per recording. Recordings whose
.rawisNone(failed to load) are also dropped.- Parameters:
min_samples (int) – Minimum number of time-domain samples a recording must have to be kept.
- Returns:
Records that were removed.
- Return type:
list of dict
- save(path, overwrite=False)[source]
Save the dataset to disk.
- Parameters:
path (str or Path) – Destination file path.
overwrite (bool, default False) – If True, overwrite existing file.
- Return type:
None