Domino Data API

Note

These APIs are a preview feature, not officially supported.

Training Set

Domino TrainingSet client library.

exception domino_data.training_sets.client.SchemaMismatchException[source]

This exception is raised when the TrainingSet data columns do not match the metadata.

exception domino_data.training_sets.client.ServerException(message: str, server_msg: str)[source]

This exception is raised when the TrainingSet server rejects a request.

Parameters:
  • message (str) –

  • server_msg (str) –

domino_data.training_sets.client.create_training_set_version(training_set_name: str, df: DataFrame, description: str | None = None, key_columns: List[str] | None = None, target_columns: List[str] | None = None, exclude_columns: List[str] | None = None, monitoring_meta: MonitoringMeta | None = None, meta: Mapping[str, str] | None = None, **kwargs) TrainingSetVersion[source]

Create a TrainingSetVersion.

Parameters:
  • training_set_name (str) – Name of the TrainingSet this version belongs to. training_set_name must be a string containing only alphanumeric characters in the basic Latin alphabet including dash and underscore: [-A-Za-z_].

  • df (DataFrame) – A DataFrame holding the data.

  • description (str | None) – Description of this version.

  • key_columns (List[str] | None) – Names of columns that represent IDs for retrieving features.

  • target_columns (List[str] | None) – Target variables for prediction.

  • exclude_columns (List[str] | None) – Columns to exclude when generating the training DataFrame.

  • monitoring_meta (MonitoringMeta | None) – Monitoring specific metadata.

  • meta (Mapping[str, str] | None) – User defined metadata.

  • **kwargs – Arbitrary keyword arguments.

Returns:

The created TrainingSetVersion

Return type:

TrainingSetVersion

domino_data.training_sets.client.delete_training_set(name: str) bool[source]

Delete a TrainingSet.

Note: This deletes the TrainingSet only if it has no versions.

Parameters:

name (str) – Name of the TrainingSet.

Returns:

True if TrainingSet was deleted.

Return type:

bool

domino_data.training_sets.client.delete_training_set_version(training_set_name: str, number: int) bool[source]

Deletes a TrainingSetVersion.

Parameters:
  • training_set_name (str) – Name of the TrainingSet.

  • number (int) – TrainingSetVersion number.

Returns:

True if TrainingSetVersion was deleted.

Return type:

bool

domino_data.training_sets.client.get_training_set(name: str) TrainingSet[source]

Get a TrainingSet by name.

Parameters:

name (str) – Name of the training set.

Returns:

The TrainingSet, if found.

Return type:

TrainingSet

domino_data.training_sets.client.get_training_set_version(training_set_name: str, number: int) TrainingSetVersion[source]

Gets a TrainingSetVersion by version number.

Parameters:
  • training_set_name (str) – Name of the TrainingSet.

  • number (int) – Version number.

Returns:

The requested TrainingSetVersion.

Return type:

TrainingSetVersion

domino_data.training_sets.client.list_training_set_versions(meta: Mapping[str, str] | None = None, training_set_name: str | None = None, training_set_meta: Mapping[str, str] | None = None, asc: bool = True, offset: int = 0, limit: int = 10000) List[TrainingSetVersion][source]

List training sets.

Parameters:
  • meta (Mapping[str, str] | None) – Version metadata.

  • training_set_name (str | None) – Training set name.

  • training_set_meta (Mapping[str, str] | None) – Training set meta data.

  • asc (bool) – Sort order by creation time, 1 for ascending -1 for descending.

  • offset (int) – Offset.

  • limit (int) – Limit.

Returns:

A list of matching TrainingSetVersions.

Return type:

List[TrainingSetVersion]

domino_data.training_sets.client.list_training_sets(meta: Mapping[str, str] | None = None, asc: bool = True, offset: int = 0, limit: int = 10000) List[TrainingSet][source]

Query training sets.

Parameters:
  • meta (Mapping[str, str] | None) – Metadata key-value pairs to match.

  • asc (bool) – Sort order by creation time, 1 for ascending -1 for descending.

  • offset (int) – Offset

  • limit (int) – Limit

Returns:

A list of matching TrainingSets.

Return type:

List[TrainingSet]

domino_data.training_sets.client.update_training_set(updated: TrainingSet) TrainingSet[source]

Update a TrainingSet.

Parameters:

updated (TrainingSet) – Updated TrainingSet.

Returns:

The updated TrainingSet from the server.

Return type:

TrainingSet

domino_data.training_sets.client.update_training_set_version(version: TrainingSetVersion) TrainingSetVersion[source]

Updates this TrainingSetVersion.

Parameters:

version (TrainingSetVersion) – TrainingSetVersion to update.

Returns:

The updated TrainingSetVersion from the server.

Return type:

TrainingSetVersion

class domino_data.training_sets.model.MonitoringMeta(timestamp_columns: ~typing.List[str] = <factory>, categorical_columns: ~typing.List[str] = <factory>, ordinal_columns: ~typing.List[str] = <factory>)[source]

Monitoring Meta.

For more details about the parameters, refer to TrainingSetVersion.

Parameters:
  • timestamp_columns (List[str]) – Timestamp columns.

  • categorical_columns (List[str]) – Categorical columns.

  • ordinal_columns (List[str]) – Ordinal columns. Currently, ordinal columns are skipped by the Model Monitor.

class domino_data.training_sets.model.TrainingSet(name: str, project_id: str, description: str | None = None, meta: ~typing.Mapping[str, str] = <factory>)[source]

A Training Set.

Parameters:
  • name (str) – Unique name of the TrainingSet.

  • description (str | None) – Description of the TrainingSet.

  • meta (Mapping[str, str]) – User defined metadata.

  • project_id (str) –

class domino_data.training_sets.model.TrainingSetVersion(training_set_name: str, number: int, description: str | None = None, key_columns: ~typing.List[str] = <factory>, target_columns: ~typing.List[str] = <factory>, exclude_columns: ~typing.List[str] = <factory>, all_columns: ~typing.List[str] = <factory>, monitoring_meta: ~domino_data.training_sets.model.MonitoringMeta = <factory>, meta: ~typing.Mapping[str, str] = <factory>, path: str | None = None, container_path: str | None = None, pending: bool = True)[source]

A Training Set Version.

Any columns that are not inside key_columns, exclude_columns, MonitoringMeta.categorical_columns, MonitoringMeta.timestamp_columns, or MonitoringMeta.ordinal_columns are automatically marked as numerical columns.

Parameters:
  • number (int) – The TrainingSetVersion number.

  • training_set_name (str) – Name of the TrainingSet this version belongs to.

  • description (str | None) – Description of this version.

  • key_columns (List[str]) – Row identifier columns.

  • target_columns (List[str]) –

    Prediction columns.

    • For classifications models, this must be a categorical column. Be sure to also include this column in MonitoringMeta.categorical_columns.

    • For regression models, it must be a numerical column.

  • exclude_columns (List[str]) – Any columns that should be excluded.

  • all_columns (List[str]) – Names all columns in the dataframe.

  • monitoring_meta (MonitoringMeta) – Monitoring specific metadata.

  • meta (Mapping[str, str]) – User defined metadata

  • path (str | None) –

  • container_path (str | None) –

  • pending (bool) –

load_raw_pandas() DataFrame[source]

Get the raw dataframe.

Return type:

DataFrame

load_training_pandas() DataFrame[source]

Get a pandas dataframe for training.

Dataframe does not include key_columns and exclude_columns.

Return type:

DataFrame

Datasource

Datasource module.

class domino_data.data_sources.BoardingPass(datasource_id: str, query: str, config: Dict[str, str], credential: Dict[str, str])[source]

Represent a query request to the Datasource Proxy service.

Parameters:
  • datasource_id (str) –

  • query (str) –

  • config (Dict[str, str]) –

  • credential (Dict[str, str]) –

to_json() str[source]

Serialize self to JSON.

Return type:

str

class domino_data.data_sources.DataSourceClient(api_key: str | None = NOTHING, token_file: str | None = NOTHING, token_url: str | None = NOTHING)[source]

API client and bindings.

Parameters:
  • api_key (str | None) –

  • token_file (str | None) –

  • token_url (str | None) –

execute(datasource_id: str, query: str, config: Dict[str, str], credential: Dict[str, str]) Result[source]

Execute a given query against a datasource.

Parameters:
  • datasource_id (str) – unique identifier of a datasource

  • query (str) – SQL query to execute

  • config (Dict[str, str]) – overwrite configuration dictionary

  • credential (Dict[str, str]) – overwrite credential dictionary

Returns:

Result entity encapsulating execution response

Raises:

DominoError – if the proxy fails to query or return data

Return type:

Result

get_datasource(name: str) Datasource[source]

Fetch a datasource by name.

Parameters:

name (str) – unique identifier of a datasource

Returns:

Datasource entity with given name

Raises:

Exception – If the response from Domino is not 200

Return type:

Datasource

get_key_url(datasource_id: str, object_key: str, is_read_write: bool, config: Dict[str, str], credential: Dict[str, str]) str[source]

Request a signed URL for a given datasource and object key.

Parameters:
  • datasource_id (str) – unique identifier of a datasource

  • object_key (str) – unique identifier of key to retrieve

  • is_read_write (bool) – whether the signed URL allows write or not.

  • config (Dict[str, str]) – overwrite configuration dictionary

  • credential (Dict[str, str]) – overwrite credential dictionary

Returns:

Signed URL of the requested object.

Raises:
  • Exception – if the response from the Proxy is not 200

  • UnauthenticatedError – if the request has invalid authentication

Return type:

str

list_keys(datasource_id: str, prefix: str, page_size: int, config: Dict[str, str], credential: Dict[str, str]) List[str][source]

List keys in a datasource.

Parameters:
  • datasource_id (str) – unique identifier of a datasource

  • prefix (str) – prefix to filter keys with

  • page_size (int) – number of objects to return

  • config (Dict[str, str]) – overwrite configuration dictionary

  • credential (Dict[str, str]) – overwrite credential dictionary

Returns:

List of keys as string

Raises:
  • Exception – if the response from the Proxy is not 200

  • UnauthenticatedError – if the request has invalid authentication

Return type:

List[str]

class domino_data.data_sources.Datasource(auth_type: str, client: DataSourceClient, config: Dict[str, Any], datasource_type: str, identifier: str, name: str, owner: str)[source]

Represents a Domino datasource.

Parameters:
  • auth_type (str) –

  • client (DataSourceClient) –

  • config (Dict[str, Any]) –

  • datasource_type (str) –

  • identifier (str) –

  • name (str) –

  • owner (str) –

classmethod from_dto(client: DataSourceClient, dto: DatasourceDto) Datasource[source]

Build a datasource from a given DTO.

Parameters:
Return type:

Datasource

http() Client[source]

Singleton http client built for the datasource.

Return type:

Client

pool_manager() PoolManager[source]

Urllib3 pool manager for range downloads.

Return type:

PoolManager

reset_config() None[source]

Reset the configuration override.

Return type:

None

update(config: ADLSConfig | AzureBlobStorageConfig | BigQueryConfig | ClickHouseConfig | DatabricksConfig | DB2Config | DruidConfig | GCSConfig | GenericJDBCConfig | GenericS3Config | GreenplumConfig | IgniteConfig | MariaDBConfig | MongoDBConfig | MySQLConfig | NetezzaConfig | OracleConfig | PalantirConfig | PostgreSQLConfig | RedshiftConfig | S3Config | SAPHanaConfig | SingleStoreConfig | SQLServerConfig | SnowflakeConfig | SynapseConfig | TabularS3GlueConfig | TeradataConfig | TrinoConfig | VerticaConfig | Config) None[source]

Store configuration override for future query calls.

Parameters:

config (ADLSConfig | AzureBlobStorageConfig | BigQueryConfig | ClickHouseConfig | DatabricksConfig | DB2Config | DruidConfig | GCSConfig | GenericJDBCConfig | GenericS3Config | GreenplumConfig | IgniteConfig | MariaDBConfig | MongoDBConfig | MySQLConfig | NetezzaConfig | OracleConfig | PalantirConfig | PostgreSQLConfig | RedshiftConfig | S3Config | SAPHanaConfig | SingleStoreConfig | SQLServerConfig | SnowflakeConfig | SynapseConfig | TabularS3GlueConfig | TeradataConfig | TrinoConfig | VerticaConfig | Config) – specific datasource config class

Return type:

None

exception domino_data.data_sources.DominoError[source]

Base exception for known errors.

class domino_data.data_sources.ObjectStoreDatasource(auth_type: str, client: DataSourceClient, config: Dict[str, Any], datasource_type: str, identifier: str, name: str, owner: str)[source]

Represents a object store type datasource.

Parameters:
  • auth_type (str) –

  • client (DataSourceClient) –

  • config (Dict[str, Any]) –

  • datasource_type (str) –

  • identifier (str) –

  • name (str) –

  • owner (str) –

Object(key: str) _Object[source]

Return an object with given key and datasource client.

Parameters:

key (str) –

Return type:

_Object

download(object_key: str, filename: str, max_workers: int = 10) None[source]

Download object content to file located at filename.

The file will be created if it does not exists.

Parameters:
  • object_key (str) – unique key of object

  • filename (str) – path of file to write content to

  • max_workers (int) – max parallelism for high speed download

Return type:

None

download_file(object_key: str, filename: str) None[source]

Download object content to file located at filename.

The file will be created if it does not exists.

Parameters:
  • object_key (str) – unique key of object

  • filename (str) – path of file to write content to.

Return type:

None

download_fileobj(object_key: str, fileobj: Any) None[source]

Download object content to file like object.

Parameters:
  • object_key (str) – unique key of object

  • fileobj (Any) – A file-like object to download into. At a minimum, it must implement the write method and must accept bytes.

Return type:

None

get(object_key: str) bytes[source]

Get object content as bytes.

Parameters:

object_key (str) – unique key of object

Returns:

object content as bytes

Return type:

bytes

get_key_url(object_key: str, is_read_write: bool = False) str[source]

Get a signed URL for the given key.

Parameters:
  • object_key (str) – unique identifier of object to get signed URL for.

  • is_read_write (bool) – whether the URL should allow writes or not.

Returns:

Signed URL for given key

Return type:

str

list_objects(prefix: str = '', page_size: int = 1000) List[_Object][source]

List objects in the object store datasource.

Parameters:
  • prefix (str) – optional prefix to filter objects

  • page_size (int) – optional number of objects to fetch

Returns:

List of objects

Return type:

List[_Object]

put(object_key: str, content: bytes) None[source]

Upload content to object at given key.

Parameters:
  • object_key (str) – unique key of object

  • content (bytes) – bytes content

Return type:

None

upload_file(object_key: str, filename: str) None[source]

Upload content of file at filename to object at given key.

Parameters:
  • object_key (str) – unique key of object

  • filename (str) – path of file to upload.

Return type:

None

upload_fileobj(object_key: str, fileobj: Any) None[source]

Upload content of file like object to object at given key.

Parameters:
  • object_key (str) – unique key of object

  • fileobj (Any) – A file-like object to upload from. At a minimum, it must implement the read method and must return bytes.

Return type:

None

class domino_data.data_sources.Result(client: DataSourceClient, reader: FlightStreamReader, statement: str)[source]

Represents a query result.

Parameters:
  • client (DataSourceClient) –

  • reader (FlightStreamReader) –

  • statement (str) –

to_pandas() DataFrame[source]

Load and transform the result into a pandas DataFrame.

Returns:

Pandas dataframe loaded with entire resultset

Return type:

DataFrame

to_parquet(where: Any) None[source]

Load and serialize the result to a local parquet file.

Parameters:

where (Any) – path of file-like object.

Return type:

None

class domino_data.data_sources.TabularDatasource(auth_type: str, client: DataSourceClient, config: Dict[str, Any], datasource_type: str, identifier: str, name: str, owner: str)[source]

Represents a tabular type datasource.

Parameters:
  • auth_type (str) –

  • client (DataSourceClient) –

  • config (Dict[str, Any]) –

  • datasource_type (str) –

  • identifier (str) –

  • name (str) –

  • owner (str) –

query(query: str) Result[source]

Execute a query against the datasource.

Parameters:

query (str) – SQL statement to execute

Returns:

Result entity wrapping dataframe

Return type:

Result

exception domino_data.data_sources.UnauthenticatedError[source]

To handle exponential backoff.

domino_data.data_sources.load_aws_credentials(location: str, profile: str = '') Dict[str, str][source]

Load AWS credentials from given location and profile.

Parameters:
  • location (str) – location of file that contains token.

  • profile (str) – profile to load.

Returns:

{
    CredElem.ACCESSKEYID.value: "access_key_id",
    CredElem.SECRETACCESSKEY.value: "secret_access_key",
    CredElem.SESSIONTOKEN.value: "session_token",
}

Raises:

DominoError – if the provided location is not a valid file

Return type:

Dict[str, str]

domino_data.data_sources.load_oauth_credentials() Dict[str, str][source]

Load oauth token from sidecar container or local file.

Returns:

{CredElem.TOKEN.value: "token"}

Raises:

DominoError – if the provided location is not a valid file

Return type:

Dict[str, str]

Authentication

Authentication classes for HTTP and Flight clients.

class domino_data.auth.AuthMiddleware(api_key: str | None, jwt: str | None)[source]

Middleware for authenticating flight requests.

Parameters:
  • api_key (str | None) –

  • jwt (str | None) –

call_completed(_)[source]

No implementation. TODO logging.

received_headers(_)[source]

No implementation.

sending_headers()[source]

Return authentication headers.

class domino_data.auth.AuthMiddlewareFactory(api_key: str | None, token_file: str | None, token_url: str | None)[source]

Middleware Factory for authenticating flight requests.

Parameters:
  • api_key (str | None) –

  • token_file (str | None) –

  • token_url (str | None) –

start_call(_)[source]

Called at the start of an RPC.

class domino_data.auth.AuthenticatedClient(base_url: str, api_key: str | None, token_file: str | None, token_url: str | None, *, cookies: Dict[str, str] = NOTHING, headers: Dict[str, str] = NOTHING, timeout: float = 5.0, verify_ssl: str | bool | SSLContext = True)[source]

A client that authenticates all requests with either the API Key or JWT.

Parameters:
  • base_url (str) –

  • api_key (str | None) –

  • token_file (str | None) –

  • token_url (str | None) –

  • cookies (Dict[str, str]) –

  • headers (Dict[str, str]) –

  • timeout (float) –

  • verify_ssl (str | bool | SSLContext) –

get_headers() Dict[str, str][source]

Get headers with either JWT or API Key.

Return type:

Dict[str, str]

class domino_data.auth.ProxyClient(base_url: str, api_key: str | None, token_file: str | None, token_url: str | None, client_source: str | None, run_id: str | None, *, cookies: Dict[str, str] = NOTHING, headers: Dict[str, str] = NOTHING, timeout: float = 5.0, verify_ssl: str | bool | SSLContext = True)[source]

A client that authenticates all requests but with Proxy headers.

Parameters:
  • base_url (str) –

  • api_key (str | None) –

  • token_file (str | None) –

  • token_url (str | None) –

  • client_source (str | None) –

  • run_id (str | None) –

  • cookies (Dict[str, str]) –

  • headers (Dict[str, str]) –

  • timeout (float) –

  • verify_ssl (str | bool | SSLContext) –

get_headers() Dict[str, str][source]

Get headers with either JWT or API Key.

Return type:

Dict[str, str]

domino_data.auth.get_jwt_token(url: str) str[source]

Gets a domino token from local sidecar API.

Parameters:

url (str) – base url of sidecar container serving token API

Returns:

JWT as string

Raises:

HTTPStatusError – if the API returns an error

Return type:

str