eegdash.api module#

High-level interface to the EEGDash metadata database.

This module provides the main EEGDash class which serves as the primary entry point for interacting with the EEGDash ecosystem. It offers methods to query, insert, and update metadata records stored in the EEGDash database via REST API.

class eegdash.api.EEGDash(*, database: str = 'eegdash', api_url: str | None = None, auth_token: str | None = None)[source]

Bases: object

High-level interface to the EEGDash metadata database.

Provides methods to query, insert, and update metadata records stored in the EEGDash database via REST API gateway.

For working with collections of recordings as PyTorch datasets, prefer EEGDashDataset.

Create a new EEGDash client.

Parameters:
  • database (str, default "eegdash") – Name of the MongoDB database to connect to. Common values: "eegdash" (production), "eegdash_staging" (staging), "eegdash_v1" (legacy archive).

  • api_url (str, optional) – Override the default API URL. If not provided, uses the default public endpoint or the EEGDASH_API_URL environment variable.

  • auth_token (str, optional) – Authentication token for admin write operations. Not required for public read operations.

Examples

>>> eegdash = EEGDash()  # production
>>> eegdash = EEGDash(database="eegdash_staging")  # staging
>>> records = eegdash.find({"dataset": "ds002718"})
count(query: dict[str, Any] = None, /, **kwargs) int[source]

Count documents matching the query.

Parameters:
  • query (dict, optional) – Complete query dictionary. This is a positional-only argument.

  • **kwargs – User-friendly field filters (same as find()).

Returns:

Number of matching documents.

Return type:

int

Examples

>>> eeg = EEGDash()
>>> count = eeg.count({})  # count all
>>> count = eeg.count(dataset="ds002718")  # count by dataset
exists(query: dict[str, Any] = None, /, **kwargs) bool[source]

Check if at least one record matches the query.

Parameters:
  • query (dict, optional) – Complete query dictionary. This is a positional-only argument.

  • **kwargs – User-friendly field filters (same as find()).

Returns:

True if at least one matching record exists; False otherwise.

Return type:

bool

Examples

>>> eeg = EEGDash()
>>> eeg.exists(dataset="ds002718")  # check by dataset
>>> eeg.exists({"data_name": "ds002718_sub-001_eeg.set"})  # check by data_name
find(query: dict[str, Any] = None, /, **kwargs) list[Mapping[str, Any]][source]

Find records in the collection.

Examples

>>> from eegdash import EEGDash
>>> eegdash = EEGDash()
>>> eegdash.find({"dataset": "ds002718", "subject": {"$in": ["012", "013"]}})  # pre-built query
>>> eegdash.find(dataset="ds002718", subject="012")  # keyword filters
>>> eegdash.find(dataset="ds002718", subject=["012", "013"])  # sequence -> $in
>>> eegdash.find({})  # fetch all (use with care)
>>> eegdash.find({"dataset": "ds002718"}, subject=["012", "013"])  # combine query + kwargs (AND)
Parameters:
  • query (dict, optional) – Complete MongoDB query dictionary. This is a positional-only argument.

  • **kwargs – User-friendly field filters that are converted to a MongoDB query. Values can be scalars (e.g., "sub-01") or sequences (translated to $in queries). Special parameters: limit (int) and skip (int) for pagination.

Returns:

DB records that match the query.

Return type:

list of dict

find_datasets(query: dict[str, Any] | None = None, limit: int = 1000) list[Mapping[str, Any]][source]

Find datasets matching query.

Parameters:
  • query (dict) – Filter query.

  • limit (int) – Max number of datasets to return.

Returns:

List of dataset metadata documents.

Return type:

list of dict

find_one(query: dict[str, Any] = None, /, **kwargs) Mapping[str, Any] | None[source]

Find a single record matching the query.

Parameters:
  • query (dict, optional) – Complete query dictionary. This is a positional-only argument.

  • **kwargs – User-friendly field filters (same as find()).

Returns:

The first matching record, or None if no match.

Return type:

dict or None

Examples

>>> eeg = EEGDash()
>>> record = eeg.find_one(data_name="ds002718_sub-001_eeg.set")
get_dataset(dataset_id: str) Mapping[str, Any] | None[source]

Fetch metadata for a specific dataset.

Parameters:

dataset_id (str) – The unique identifier of the dataset (e.g., ‘ds002718’).

Returns:

The dataset metadata document, or None if not found.

Return type:

dict or None

insert(records: dict[str, Any] | list[dict[str, Any]]) int[source]

Insert one or more records (requires auth_token).

Parameters:

records (dict or list of dict) – A single record or list of records to insert.

Returns:

Number of records inserted.

Return type:

int

Examples

>>> eeg = EEGDash(auth_token="...")
>>> eeg.insert({"dataset": "ds001", "subject": "01", ...})  # single
>>> eeg.insert([record1, record2, record3])  # batch
search_datasets(*, modality: str | None = None, task: str | None = None, clinical_group: str | None = None, source: str | None = None, n_subjects_min: int | None = None, license: str | None = None, limit: int = 100)[source]

Search the dataset catalogue with friendly keyword filters.

Convenience wrapper around find_datasets() that translates a small set of human-friendly keyword arguments into a MongoDB-style query and returns a tidy summary pandas.DataFrame. This is the metadata-only entry point used by tutorials such as plot_00_first_search.

Parameters:
  • modality (str, optional) – Filter by recording modality (e.g., "eeg", "meeg"). Matched case-insensitively against the modality field.

  • task (str, optional) – Filter by BIDS task name (e.g., "rest", "FacePerception").

  • clinical_group (str, optional) – Filter by clinical cohort label (e.g., "healthy", "adhd"). Matched against clinical.group (nested) and falls back to the flat clinical_group field.

  • source (str, optional) – Filter by data source (e.g., "openneuro", "nemar", "hbn"). Matched against source and provider fields.

  • n_subjects_min (int, optional) – Minimum number of subjects in the dataset. Maps to {"n_subjects": {"$gte": n_subjects_min}}.

  • license (str, optional) – Filter by data license (e.g., "CC0", "CC-BY-4.0"). Matched against the license field.

  • limit (int, default 100) – Maximum number of datasets to return.

Returns:

One row per matching dataset with summary columns: dataset_id, name, modality, task, n_subjects, source, license, dataset_doi. Missing fields surface as None. The frame is empty (zero rows) when nothing matches.

Return type:

pandas.DataFrame

Notes

search_datasets does not download any signal bytes; only small JSON catalogue documents are transferred. Pair with EEGDashDataset once a candidate dataset is chosen.

Examples

>>> client = EEGDash()
>>> df = client.search_datasets(modality="eeg", n_subjects_min=10)
>>> df = client.search_datasets(task="rest", source="openneuro")
update_dataset(dataset_id: str, update: dict[str, Any]) int[source]

Update metadata for a specific dataset (requires auth_token).

Parameters:
  • dataset_id (str) – The unique identifier of the dataset (e.g., ‘ds002718’).

  • update (dict) – Dictionary of fields to update.

Returns:

Number of documents modified (0 or 1).

Return type:

int

Examples

>>> eeg = EEGDash(auth_token="...")
>>> eeg.update_dataset("ds002718", {"clinical.is_clinical": True})
update_field(query: dict[str, Any] = None, /, *, update: dict[str, Any], **kwargs) tuple[int, int][source]

Update fields on records matching the query (requires auth_token).

Use this to add or modify fields across matching records, e.g., after re-extracting entities with an improved algorithm.

Parameters:
  • query (dict, optional) – Filter query to match records. This is a positional-only argument.

  • update (dict) – Fields to update. Keys are field names, values are new values.

  • **kwargs – User-friendly field filters (same as find()).

Returns:

Number of records matched and actually modified.

Return type:

tuple of (matched_count, modified_count)

Examples

>>> eeg = EEGDash(auth_token="...")
>>> # Update entities for all records in a dataset
>>> eeg.update_field({"dataset": "ds002718"}, update={"entities": {"subject": "01"}})
>>> # Using kwargs for filter
>>> eeg.update_field(dataset="ds002718", update={"entities": new_entities})
>>> # Combine query + kwargs
>>> eeg.update_field({"dataset": "ds002718"}, subject="01", update={"entities": new_entities})