Skip to main content
Version: Next

ADBC Data Connector

ADBC (Arrow Database Connectivity) is a columnar, minimal-overhead alternative to JDBC/ODBC for analytical data access. It transfers data using Apache Arrow, avoiding serialization overhead between the database driver and Spice.

The ADBC data connector dynamically loads any ADBC-compatible driver at runtime and provides federated SQL query access through a managed connection pool. It supports both read and write operations, and pushes filters, projections, and limits down to the source database.

Drivers are available for BigQuery, Trino, Snowflake, Amazon Redshift, Databricks, and more. See ADBC Driver Foundry for the full list.

datasets:
- from: adbc:my_table
name: my_table
params:
adbc_driver: bigquery
adbc_uri: "bigquery:///my-gcp-project"
adbc_driver_options: >-
adbc.bigquery.sql.dataset_id=my_dataset

Prerequisites

An ADBC-compatible driver must be installed on the system where Spice runs. Spice loads the driver shared library by name (e.g., bigquery, trino, snowflake, redshift) or by an explicit file path.

The recommended way to install drivers is with dbc, the command-line tool for installing and managing ADBC drivers:

# Install dbc
curl -LsSf https://dbc.columnar.tech/install.sh | sh

# Install drivers
dbc install bigquery
dbc install trino
dbc install redshift

See the dbc documentation for other installation methods (pip, Homebrew, Windows MSI) and platform support.

Spice includes built-in SQL dialect support for BigQuery, translating federated queries into BigQuery-compatible SQL automatically.

Configuration

from

The from field takes the form adbc:table_name, where table_name is the name of the table to read from the connected database.

Table name casing depends on the source database. For example, Snowflake uses uppercase table names by default, while BigQuery and Trino use lowercase. When adbc_catalog and adbc_schema are not set, fully qualified names can be specified in the from field.

name

The dataset name, used as the table name within Spice.

datasets:
- from: adbc:my_table
name: my_table
params:
adbc_driver: bigquery
adbc_uri: "bigquery:///my-gcp-project"
adbc_driver_options: >-
adbc.bigquery.sql.dataset_id=my_dataset
SELECT COUNT(*) FROM my_table;
+----------+
| count(*) |
+----------+
| 6001215 |
+----------+

The dataset name cannot be a reserved keyword.

params

Parameter NameDescription
adbc_driverRequired. The ADBC driver name (e.g., bigquery, trino, snowflake, redshift).
adbc_uriRequired. Database URI or connection string for the ADBC driver. In-memory URIs (e.g., :memory:) are not supported.
adbc_driver_pathOptional. Absolute path to the ADBC driver shared library. When omitted, the driver is loaded by name from the system library path.
adbc_usernameOptional. Username for database authentication. Supports Secrets Stores.
adbc_passwordOptional. Password for database authentication. Supports Secrets Stores.
adbc_driver_optionsOptional. Semicolon-delimited key-value pairs of driver-specific options. See Driver Options.
adbc_catalogOptional. Sets the default catalog for the connection.
adbc_schemaOptional. Sets the default schema for the connection.
connection_pool_sizeOptional. Maximum number of connections in the connection pool. Default: 5.
connection_pool_min_idleOptional. Minimum number of idle connections in the pool. Default: 1.
In-memory databases

In-memory database URIs (e.g., :memory: or URIs containing mode=memory) are not supported.

Driver Options (adbc_driver_options)

The adbc_driver_options parameter passes driver-specific configuration as semicolon-delimited key=value pairs. Each key is automatically prefixed with adbc. if it does not already start with that prefix.

For example, the following two configurations are equivalent:

# Without adbc. prefix (prefix is added automatically)
adbc_driver_options: bigquery.sql.project_id=my-project;bigquery.sql.dataset_id=my_dataset

# With explicit adbc. prefix
adbc_driver_options: adbc.bigquery.sql.project_id=my-project;adbc.bigquery.sql.dataset_id=my_dataset

Trailing semicolons are permitted. Entries without an = sign or with an empty key are ignored.

For multi-line readability, use YAML's >- folded block scalar:

adbc_driver_options: >-
adbc.bigquery.sql.project_id=my-project;
adbc.bigquery.sql.dataset_id=my_dataset

Driver-specific options are documented at ADBC Driver Foundry.

BigQuery Driver Options

The BigQuery ADBC driver accepts the following options through adbc_driver_options.

Connection

Option KeyDescription
adbc.bigquery.sql.project_idThe Google Cloud project ID.
adbc.bigquery.sql.dataset_idThe BigQuery dataset ID.
adbc.bigquery.sql.table_idThe BigQuery table ID.
adbc.bigquery.sql.locationThe BigQuery location (e.g., US).

Authentication

Option KeyDescription
adbc.bigquery.sql.auth_typeAuthentication type. One of the values below. Default: adbc.bigquery.sql.auth_type.auth_bigquery.
adbc.bigquery.sql.auth_credentialsCredential data: file path or JSON string, depending on auth_type.
adbc.bigquery.sql.auth.client_idOAuth client ID (for oauth_client_ids auth type).
adbc.bigquery.sql.auth.client_secretOAuth client secret (for oauth_client_ids auth type).
adbc.bigquery.sql.auth.refresh_tokenOAuth refresh token (for user_authentication auth type).
adbc.bigquery.sql.auth.quota_projectQuota project for billing.

Supported auth_type values:

ValueDescription
adbc.bigquery.sql.auth_type.auth_bigqueryDefault BigQuery authentication.
adbc.bigquery.sql.auth_type.json_credential_fileService account JSON key file.
adbc.bigquery.sql.auth_type.json_credential_stringService account JSON as a string.
adbc.bigquery.sql.auth_type.json_credentialsJSON credentials as a byte array.
adbc.bigquery.sql.auth_type.user_authenticationUser-based OAuth authentication.
adbc.bigquery.sql.auth_type.app_default_credentialsGoogle Application Default Credentials (ADC).
adbc.bigquery.sql.auth_type.oauth_client_idsOAuth client ID credentials.

Service Account Impersonation

Option KeyDescription
adbc.bigquery.sql.impersonate.target_principalService account email to impersonate.
adbc.bigquery.sql.impersonate.delegatesDelegation chain (comma-separated service account emails).
adbc.bigquery.sql.impersonate.scopesOAuth 2.0 scopes (comma-separated).
adbc.bigquery.sql.impersonate.lifetimeToken lifetime duration (e.g., 3600s).

Query

Option KeyTypeDescriptionDefault
adbc.bigquery.sql.query.parameter_modestringQuery parameter mode: positional (?) or named (@p).positional
adbc.bigquery.sql.query.destination_tablestringDestination table for query results.
adbc.bigquery.sql.query.default_project_idstringDefault project ID for queries.
adbc.bigquery.sql.query.default_dataset_idstringDefault dataset ID for queries.
adbc.bigquery.sql.query.create_dispositionstringTable creation behavior (e.g., CREATE_IF_NEEDED).
adbc.bigquery.sql.query.write_dispositionstringTable write behavior (e.g., WRITE_TRUNCATE).
adbc.bigquery.sql.query.disable_query_cacheboolDisable query cache.false
adbc.bigquery.sql.query.disable_flattened_resultsboolDisable flattened results.false
adbc.bigquery.sql.query.allow_large_resultsboolAllow large query results.false
adbc.bigquery.sql.query.prioritystringQuery priority (BATCH or INTERACTIVE).
adbc.bigquery.sql.query.max_billing_tierintMaximum billing tier.
adbc.bigquery.sql.query.max_bytes_billedintMaximum bytes billed.
adbc.bigquery.sql.query.use_legacy_sqlboolUse legacy SQL syntax.false
adbc.bigquery.sql.query.dry_runboolExecute a dry run (no data returned).false
adbc.bigquery.sql.query.create_sessionboolCreate a session for the query.false
adbc.bigquery.sql.query.job_timeoutintJob timeout in milliseconds.
adbc.bigquery.sql.query.result_buffer_sizeintResult buffer size.200
adbc.bigquery.sql.query.prefetch_concurrencyintNumber of concurrent prefetch operations.10

BigQuery also supports connection string URIs as the adbc_uri value:

bigquery:///my-project-123
bigquery:///my-project-123?OAuthType=1&AuthCredentials=/path/to/key.json
bigquery:///my-project-123?DatasetId=analytics&Location=US

See the BigQuery ADBC driver documentation for the full reference.

Trino Driver Options

The Trino ADBC driver uses a connection string URI as the adbc_uri value:

trino://[user[:password]@]host[:port][/catalog[/schema]][?param=value&...]

Examples:

trino://trino.example.com:8080/hive/default
trino://user:pass@trino.example.com:8443/postgresql/public?SSL=true
trino://user@localhost:8443/memory/default?SSLVerification=NONE

By default, connections use HTTPS. Add SSL=false to use HTTP. For self-signed certificates, add SSLVerification=NONE.

See the Trino ADBC driver documentation for the full reference.

Catalog and Schema

The adbc_catalog and adbc_schema parameters set connection-level defaults that apply to all queries on the connection. These map to the ADBC standard connection options adbc.connection.catalog and adbc.connection.db_schema.

How these map to source database concepts varies by driver:

Driveradbc_catalogadbc_schema
BigQueryGCP project IDBigQuery dataset
TrinoTrino catalogSchema within catalog
RedshiftRedshift databaseRedshift schema
SnowflakeSnowflake databaseSnowflake schema
datasets:
- from: adbc:my_table
name: my_table
params:
adbc_driver: bigquery
adbc_uri: "bigquery:///my-gcp-project"
adbc_catalog: my-gcp-project
adbc_schema: my_dataset

Connection Pooling

The ADBC connector maintains a pool of database connections for concurrent query execution. The pool is configured with:

  • connection_pool_size: The maximum number of connections. Increase for workloads with many concurrent queries.
  • connection_pool_min_idle: The minimum number of idle connections kept open to reduce connection setup latency.

Both values must be positive integers. A connection_pool_min_idle greater than connection_pool_size is coerced to connection_pool_size.

Query Pushdown

The ADBC connector pushes SQL operations down to the source database when possible, reducing the amount of data transferred:

  • Filter pushdown: WHERE clauses are pushed to the source.
  • Projection pushdown: Only the columns referenced in the query are fetched.
  • Limit pushdown: LIMIT clauses are applied at the source.

No special configuration is required. Pushdown happens automatically when the source database supports the operation.

Auth

Authentication varies by driver. Credentials can be provided through adbc_username, adbc_password, adbc_driver_options, the connection URI, or through Secrets Stores.

For BigQuery, authentication typically uses Google Cloud Application Default Credentials or a service account JSON file passed through adbc_driver_options.

For Trino, credentials are embedded in the connection URI (trino://user:pass@host/catalog).

For drivers that accept adbc_username and adbc_password (e.g., Redshift, Snowflake), these can be set through environment variables:

SPICE_SECRET_ADBC_USERNAME=myuser \
SPICE_SECRET_ADBC_PASSWORD=mypassword \
spice run
datasets:
- from: adbc:my_table
name: my_table
params:
adbc_driver: redshift
adbc_uri: "redshift://my-cluster.region.redshift.amazonaws.com:5439/dev"
adbc_username: ${secrets:ADBC_USERNAME}
adbc_password: ${secrets:ADBC_PASSWORD}

Examples

BigQuery

Connect to a BigQuery table using Application Default Credentials. Spice translates federated queries into BigQuery-compatible SQL automatically.

# Install the driver
dbc install bigquery

# Authenticate with Google Cloud
gcloud auth application-default login
datasets:
- from: adbc:my_table
name: my_table
params:
adbc_driver: bigquery
adbc_uri: "bigquery:///my-gcp-project?DatasetId=my_dataset"

BigQuery with Service Account

Authenticate with a service account JSON key file:

datasets:
- from: adbc:my_table
name: my_table
params:
adbc_driver: bigquery
adbc_uri: "bigquery:///my-gcp-project"
adbc_catalog: my-gcp-project
adbc_schema: my_dataset
adbc_driver_options: >-
adbc.bigquery.sql.auth_type=adbc.bigquery.sql.auth_type.json_credential_file;
adbc.bigquery.sql.auth_credentials=/path/to/service-account.json

BigQuery with Service Account JSON Secret

Authenticate using a service account JSON string stored in a Secrets Store:

datasets:
- from: adbc:my_table
name: my_table
params:
adbc_driver: bigquery
adbc_uri: "bigquery:///my-gcp-project"
adbc_driver_options: >-
adbc.bigquery.sql.auth_type=adbc.bigquery.sql.auth_type.json_credential_string;
adbc.bigquery.sql.auth_credentials=${secrets:BIGQUERY_SERVICE_ACCOUNT_JSON};
adbc.bigquery.sql.dataset_id=my_dataset

Trino

Connect to a Trino cluster:

dbc install trino
datasets:
- from: adbc:my_table
name: my_table
params:
adbc_driver: trino
adbc_uri: "trino://trino.example.com:8080/hive/default?SSL=false"

Trino with Authentication

Connect over HTTPS with HTTP Basic authentication:

datasets:
- from: adbc:my_table
name: my_table
params:
adbc_driver: trino
adbc_uri: "trino://user:${secrets:TRINO_PASSWORD}@trino.example.com:8443/hive/default"
connection_pool_size: 10

For self-signed certificates, add SSLVerification=NONE:

datasets:
- from: adbc:my_table
name: my_table
params:
adbc_driver: trino
adbc_uri: "trino://user@trino.internal:8443/hive/default?SSLVerification=NONE"

Amazon Redshift

Connect to a Redshift Provisioned cluster:

dbc install redshift
datasets:
- from: adbc:my_table
name: my_table
params:
adbc_driver: redshift
adbc_uri: "redshift://admin:${secrets:REDSHIFT_PASSWORD}@my-cluster.region.redshift.amazonaws.com:5439/dev"

Redshift Serverless with IAM

datasets:
- from: adbc:my_table
name: my_table
params:
adbc_driver: redshift
adbc_uri: "redshift:///dev?cluster_type=redshift-serverless&workgroup_name=my-workgroup"

Custom Driver Path

When the ADBC driver shared library is not on the system library path (e.g., not installed via dbc), specify its location with adbc_driver_path:

datasets:
- from: adbc:my_table
name: my_table
params:
adbc_driver: bigquery
adbc_driver_path: /opt/drivers/libadbc_driver_bigquery.so
adbc_uri: "bigquery:///my-gcp-project?DatasetId=my_dataset"