eegdash package#
Subpackages#
Submodules#
Module contents#
EEGDash: A comprehensive platform for EEG data management and analysis.
EEGDash provides a unified interface for accessing, querying, and analyzing large-scale EEG datasets. It integrates with cloud storage and REST APIs to streamline EEG research workflows.
- exception eegdash.DataIntegrityError(message: str, record: dict[str, Any] | None = None, issues: list[str] | None = None, authors: list[str] | None = None, contact_info: list[str] | None = None, source_url: str | None = None)[source]
Bases:
EEGDashErrorRaised when a dataset record has known data integrity issues.
This exception is raised when attempting to load a record that has been flagged during ingestion as having missing companion files or other integrity problems.
- record
The problematic record metadata.
- Type:
- source_url
URL to the dataset source for reporting issues.
- Type:
str | None
Examples
>>> try: ... dataset.raw # Attempt to load data ... except DataIntegrityError as e: ... print(f"Cannot load: {e.issues}") ... print(f"Contact authors: {e.authors}")
- classmethod from_record(record: dict[str, Any]) DataIntegrityError[source]
Create a DataIntegrityError from a record with integrity issues.
- Parameters:
record (dict) – Record containing
_data_integrity_issuesand optionally_dataset_authors,_dataset_contact,_source_url.- Returns:
Exception with all relevant context.
- Return type:
DataIntegrityError
- print_rich(console: Console | None = None) None[source]
Print a rich formatted version of the error to the console.
- Parameters:
console (Console, optional) – Rich console to print to. If None, creates a new one.
- classmethod warn_from_record(record: dict[str, Any]) None[source]
Log a warning about data integrity issues without raising an exception.
Use this when you want to warn about issues but still allow loading.
- Parameters:
record (dict) – Record containing
_data_integrity_issuesand optionally_dataset_authors,_dataset_contact,_source_url.
- class eegdash.EEGChallengeDataset(release: str, cache_dir: str, mini: bool = True, query: dict | None = None, s3_bucket: str | None = None, **kwargs)[source]
Bases:
EEGDashDatasetA dataset helper for the EEG 2025 Challenge.
This class simplifies access to the EEG 2025 Challenge datasets. It is a specialized version of
EEGDashDatasetthat is pre-configured for the challenge’s data releases. It automatically maps a release name (e.g., “R1”) to the corresponding OpenNeuro dataset and handles the selection of subject subsets (e.g., “mini” release).- Parameters:
release (str) – The name of the challenge release to load. Must be one of the keys in
RELEASE_TO_OPENNEURO_DATASET_MAP(e.g., “R1”, “R2”, …, “R11”).cache_dir (str) – The local directory where the dataset will be downloaded and cached.
mini (bool, default True) – If True, the dataset is restricted to the official “mini” subset of subjects for the specified release. If False, all subjects for the release are included.
query (dict, optional) – An additional MongoDB-style query to apply as a filter. This query is combined with the release and subject filters using a logical AND. The query must not contain the
datasetkey, as this is determined by thereleaseparameter.s3_bucket (str, optional) – The base S3 bucket URI where the challenge data is stored. Defaults to the official challenge bucket.
**kwargs – Additional keyword arguments that are passed directly to the
EEGDashDatasetconstructor.
- Raises:
ValueError – If the specified
releaseis unknown, or if thequeryargument contains adatasetkey. Also raised ifminiis True and a requested subject is not part of the official mini-release subset.
See also
EEGDashDatasetThe base class for creating datasets from queries.
- class eegdash.EEGDash(*, database: str = 'eegdash', api_url: str | None = None, auth_token: str | None = None)[source]
Bases:
objectHigh-level interface to the EEGDash metadata database.
Provides methods to query, insert, and update metadata records stored in the EEGDash database via REST API gateway.
For working with collections of recordings as PyTorch datasets, prefer
EEGDashDataset.Create a new EEGDash client.
- Parameters:
database (str, default "eegdash") – Name of the MongoDB database to connect to. Common values:
"eegdash"(production),"eegdash_staging"(staging),"eegdash_v1"(legacy archive).api_url (str, optional) – Override the default API URL. If not provided, uses the default public endpoint or the
EEGDASH_API_URLenvironment variable.auth_token (str, optional) – Authentication token for admin write operations. Not required for public read operations.
Examples
>>> eegdash = EEGDash() # production >>> eegdash = EEGDash(database="eegdash_staging") # staging >>> records = eegdash.find({"dataset": "ds002718"})
- count(query: dict[str, Any] = None, /, **kwargs) int[source]
Count documents matching the query.
- Parameters:
query (dict, optional) – Complete query dictionary. This is a positional-only argument.
**kwargs – User-friendly field filters (same as find()).
- Returns:
Number of matching documents.
- Return type:
Examples
>>> eeg = EEGDash() >>> count = eeg.count({}) # count all >>> count = eeg.count(dataset="ds002718") # count by dataset
- exists(query: dict[str, Any] = None, /, **kwargs) bool[source]
Check if at least one record matches the query.
- Parameters:
query (dict, optional) – Complete query dictionary. This is a positional-only argument.
**kwargs – User-friendly field filters (same as find()).
- Returns:
True if at least one matching record exists; False otherwise.
- Return type:
Examples
>>> eeg = EEGDash() >>> eeg.exists(dataset="ds002718") # check by dataset >>> eeg.exists({"data_name": "ds002718_sub-001_eeg.set"}) # check by data_name
- find(query: dict[str, Any] = None, /, **kwargs) list[Mapping[str, Any]][source]
Find records in the collection.
Examples
>>> from eegdash import EEGDash >>> eegdash = EEGDash() >>> eegdash.find({"dataset": "ds002718", "subject": {"$in": ["012", "013"]}}) # pre-built query >>> eegdash.find(dataset="ds002718", subject="012") # keyword filters >>> eegdash.find(dataset="ds002718", subject=["012", "013"]) # sequence -> $in >>> eegdash.find({}) # fetch all (use with care) >>> eegdash.find({"dataset": "ds002718"}, subject=["012", "013"]) # combine query + kwargs (AND)
- Parameters:
query (dict, optional) – Complete MongoDB query dictionary. This is a positional-only argument.
**kwargs – User-friendly field filters that are converted to a MongoDB query. Values can be scalars (e.g.,
"sub-01") or sequences (translated to$inqueries). Special parameters:limit(int) andskip(int) for pagination.
- Returns:
DB records that match the query.
- Return type:
- find_datasets(query: dict[str, Any] | None = None, limit: int = 1000) list[Mapping[str, Any]][source]
Find datasets matching query.
- find_one(query: dict[str, Any] = None, /, **kwargs) Mapping[str, Any] | None[source]
Find a single record matching the query.
- Parameters:
query (dict, optional) – Complete query dictionary. This is a positional-only argument.
**kwargs – User-friendly field filters (same as find()).
- Returns:
The first matching record, or None if no match.
- Return type:
dict or None
Examples
>>> eeg = EEGDash() >>> record = eeg.find_one(data_name="ds002718_sub-001_eeg.set")
- get_dataset(dataset_id: str) Mapping[str, Any] | None[source]
Fetch metadata for a specific dataset.
- insert(records: dict[str, Any] | list[dict[str, Any]]) int[source]
Insert one or more records (requires auth_token).
- Parameters:
records (dict or list of dict) – A single record or list of records to insert.
- Returns:
Number of records inserted.
- Return type:
Examples
>>> eeg = EEGDash(auth_token="...") >>> eeg.insert({"dataset": "ds001", "subject": "01", ...}) # single >>> eeg.insert([record1, record2, record3]) # batch
- search_datasets(*, modality: str | None = None, task: str | None = None, clinical_group: str | None = None, source: str | None = None, n_subjects_min: int | None = None, license: str | None = None, limit: int = 100)[source]
Search the dataset catalogue with friendly keyword filters.
Convenience wrapper around
find_datasets()that translates a small set of human-friendly keyword arguments into a MongoDB-style query and returns a tidy summarypandas.DataFrame. This is the metadata-only entry point used by tutorials such asplot_00_first_search.- Parameters:
modality (str, optional) – Filter by recording modality (e.g.,
"eeg","meeg"). Matched case-insensitively against themodalityfield.task (str, optional) – Filter by BIDS task name (e.g.,
"rest","FacePerception").clinical_group (str, optional) – Filter by clinical cohort label (e.g.,
"healthy","adhd"). Matched againstclinical.group(nested) and falls back to the flatclinical_groupfield.source (str, optional) – Filter by data source (e.g.,
"openneuro","nemar","hbn"). Matched againstsourceandproviderfields.n_subjects_min (int, optional) – Minimum number of subjects in the dataset. Maps to
{"n_subjects": {"$gte": n_subjects_min}}.license (str, optional) – Filter by data license (e.g.,
"CC0","CC-BY-4.0"). Matched against thelicensefield.limit (int, default 100) – Maximum number of datasets to return.
- Returns:
One row per matching dataset with summary columns:
dataset_id,name,modality,task,n_subjects,source,license,dataset_doi. Missing fields surface asNone. The frame is empty (zero rows) when nothing matches.- Return type:
Notes
search_datasetsdoes not download any signal bytes; only small JSON catalogue documents are transferred. Pair withEEGDashDatasetonce a candidate dataset is chosen.Examples
>>> client = EEGDash() >>> df = client.search_datasets(modality="eeg", n_subjects_min=10) >>> df = client.search_datasets(task="rest", source="openneuro")
- update_dataset(dataset_id: str, update: dict[str, Any]) int[source]
Update metadata for a specific dataset (requires auth_token).
- Parameters:
- Returns:
Number of documents modified (0 or 1).
- Return type:
Examples
>>> eeg = EEGDash(auth_token="...") >>> eeg.update_dataset("ds002718", {"clinical.is_clinical": True})
- update_field(query: dict[str, Any] = None, /, *, update: dict[str, Any], **kwargs) tuple[int, int][source]
Update fields on records matching the query (requires auth_token).
Use this to add or modify fields across matching records, e.g., after re-extracting entities with an improved algorithm.
- Parameters:
- Returns:
Number of records matched and actually modified.
- Return type:
tuple of (matched_count, modified_count)
Examples
>>> eeg = EEGDash(auth_token="...") >>> # Update entities for all records in a dataset >>> eeg.update_field({"dataset": "ds002718"}, update={"entities": {"subject": "01"}}) >>> # Using kwargs for filter >>> eeg.update_field(dataset="ds002718", update={"entities": new_entities}) >>> # Combine query + kwargs >>> eeg.update_field({"dataset": "ds002718"}, subject="01", update={"entities": new_entities})
- class eegdash.EEGDashDataset(cache_dir: str | Path, query: dict[str, Any] = None, description_fields: list[str] | None = None, s3_bucket: str | None = None, records: list[dict] | None = None, download: bool = True, n_jobs: int = -1, eeg_dash_instance: Any = None, database: str | None = None, auth_token: str | None = None, on_error: str = 'raise', max_concurrency: int = 20, description_precedence: str = 'participant_tsv', **kwargs)[source]
Bases:
BaseConcatDatasetCreate a new EEGDashDataset from a given query or local BIDS dataset directory and dataset name. An EEGDashDataset is pooled collection of EEGDashBaseDataset instances (individual recordings) and is a subclass of braindecode’s BaseConcatDataset.
Examples
Basic usage with dataset and subject filtering:
>>> from eegdash import EEGDashDataset >>> dataset = EEGDashDataset( ... cache_dir="./data", ... dataset="ds002718", ... subject="012" ... ) >>> print(f"Number of recordings: {len(dataset)}")
Filter by multiple subjects and specific task:
>>> subjects = ["012", "013", "014"] >>> dataset = EEGDashDataset( ... cache_dir="./data", ... dataset="ds002718", ... subject=subjects, ... task="RestingState" ... )
Load and inspect EEG data from recordings:
>>> if len(dataset) > 0: ... recording = dataset[0] ... raw = recording.load() ... print(f"Sampling rate: {raw.info['sfreq']} Hz") ... print(f"Number of channels: {len(raw.ch_names)}") ... print(f"Duration: {raw.times[-1]:.1f} seconds")
Advanced filtering with raw MongoDB queries:
>>> from eegdash import EEGDashDataset >>> query = { ... "dataset": "ds002718", ... "subject": {"$in": ["012", "013"]}, ... "task": "RestingState" ... } >>> dataset = EEGDashDataset(cache_dir="./data", query=query)
Working with dataset collections and braindecode integration:
>>> # EEGDashDataset is a braindecode BaseConcatDataset >>> for i, recording in enumerate(dataset): ... if i >= 2: # limit output ... break ... print(f"Recording {i}: {recording.description}") ... raw = recording.load() ... print(f" Channels: {len(raw.ch_names)}, Duration: {raw.times[-1]:.1f}s")
- Parameters:
cache_dir (str | Path) – Directory where data are cached locally.
query (dict | None) – Raw MongoDB query to filter records. If provided, it is merged with keyword filtering arguments (see
**kwargs) using logical AND. You must provide at least adataset(either inqueryor as a keyword argument). Only fields inALLOWED_QUERY_FIELDSare considered for filtering.dataset (str) – Dataset identifier (e.g.,
"ds002718"). Required ifquerydoes not already specify a dataset.task (str | list[str]) – Task name(s) to filter by (e.g.,
"RestingState").subject (str | list[str]) – Subject identifier(s) to filter by (e.g.,
"NDARCA153NKE").session (str | list[str]) – Session identifier(s) to filter by (e.g.,
"1").run (str | list[str]) – Run identifier(s) to filter by (e.g.,
"1").description_fields (list[str]) – Fields to extract from each record and include in dataset descriptions (e.g., “subject”, “session”, “run”, “task”).
s3_bucket (str | None) – Optional S3 bucket URI (e.g., “s3://mybucket”) to use instead of the default OpenNeuro bucket when downloading data files.
records (list[dict] | None) – Pre-fetched metadata records. If provided, the dataset is constructed directly from these records and no MongoDB query is performed.
download (bool, default True) – If False, load from local BIDS files only. Local data are expected under
cache_dir / dataset; no DB or S3 access is attempted.n_jobs (int) – Number of parallel jobs to use where applicable (-1 uses all cores).
eeg_dash_instance (EEGDash | None) – Optional existing EEGDash client to reuse for DB queries. If None, a new client is created on demand, not used in the case of no download.
database (str | None) – Database name to use (e.g., “eegdash”, “eegdash_staging”). If None, uses the default database.
auth_token (str | None) – Authentication token for accessing protected databases. Required for staging or admin operations.
max_concurrency (int, default 20) – Maximum number of parallel S3 transfer connections used when downloading data. Higher values speed up large/multi-file downloads but consume more bandwidth.
on_error (str, default "raise") –
How to handle
DataIntegrityErrorwhen accessing.rawon individual recordings:"raise"(default): propagate the exception."warn": log the error as a warning and set.rawtoNone."skip": silently set.rawtoNone.
Skipped recordings are flagged via
ds._skippedso callers can filter them out with a list comprehension after iteration.description_precedence (str, default "participant_tsv") –
Which source wins when the same field appears in both the record and the embedded
participant_tsvdata:"participant_tsv"(default): theparticipant_tsvvalue overwrites the record value, includingNonevalues."record": the record-level value is kept.
Raises
ValueErrorif not one of the above.**kwargs (dict) –
Additional keyword arguments serving two purposes:
Filtering: any keys present in
ALLOWED_QUERY_FIELDSare treated as query filters (e.g.,dataset,subject,task, …).Dataset options: remaining keys are forwarded to
EEGDashRaw.
- property cumulative_sizes: list[int]
Recompute cumulative sizes from current dataset lengths.
Overrides the cached version from BaseConcatDataset because individual dataset lengths can change after lazy raw loading (estimated ntimes from JSON metadata may differ from actual n_times in the raw file).