8 posts tagged with "postgres"

PostgreSQL database topics and usage

View All Tags

Spice v2.0-rc.5 (May 27, 2026)

May 27, 2026 · 30 min read

Jack Eadie

Token Plumber at Spice AI

Spice v2.0-rc.5 is now available! 🔥

v2.0.0-rc.5 is the fifth release candidate for advanced testing of v2.0, building on v2.0.0-rc.4.

This release completes the mTLS implementation across server endpoints and outbound connectors, adds MongoDB Change Streams and durable Kafka offset persistence as new CDC sources, expands DML write-back to PostgreSQL, Snowflake, and Arrow, promotes DuckLake to Beta, introduces user-defined functions, on-demand dataset loading, unified query cancellation, dynamic HTTP request headers and subquery-driven request parameters, provider-aware LLM prompt caching, and a long list of Cayenne performance improvements.

Highlights in this release candidate include:

Spice Cayenne — CDC throughput, compaction and scan caching, synchronized partition commits, join filter propagation, parallel Vortex writes, lock-free deletion caches
Mutual TLS (mTLS) — TLS cert hot-reload, public mTLS for HTTP and Flight (channel + identity modes), mTLS client certs for FlightSQL and Spice.ai connectors
MongoDB Change Streams — native real-time CDC for MongoDB, no Debezium or Kafka required
Kafka CDC offsets — offsets persisted in sidecar tables for durable, resumable Kafka CDC
PostgreSQL DML — INSERT, UPDATE, DELETE write-back on PostgreSQL datasets
Snowflake DML — INSERT, UPDATE, DELETE write-back on Snowflake datasets
Arrow Primary Key Upserts — native upsert path using primary key matching
DuckLake promoted to Beta — with INSERT support on catalog tables
User-Defined Functions — define SQL UDFs in spicepods, plus remote UDFs over HTTP (Spice.ai Enterprise)
Spatial SQL UDFs — optional geospatial UDFs (ST_*) for geometry workloads
On-Demand Dataset Loading — datasets can be deferred and loaded on first reference
Unified Query Cancellation — Ctrl-C and HTTP request cancellation propagate across all execution paths
Dynamic HTTP Connector — pass-through request headers, subquery-driven params, and JSON schema decomposition
HTTP Rate-Control persistence — rate-limit state persisted in object storage across restarts
refresh_mode: snapshot — point-in-time snapshot acceleration with SQLite/Turso WAL flushing
Storage-profile accelerator tuning — accelerators auto-tune defaults based on local SSD, EBS-class disk, or tmpfs
Provider-Aware LLM Prompt Caching — automatic prompt caching for OpenAI-compatible providers that support it
Responses API — support across all model providers with streaming response.output_text.delta, plus Authorization: Bearer header support

What's New in v2.0.0-rc.5

Cayenne Improvements

Significant performance work across Spice Cayenne-backed catalogs and accelerators.

Ingest throughput: End-to-end improvements to CDC ingest, background compaction, and a new scan-result cache for hot reads; parallel Vortex partition writes; lock-free deletion caches with bloom-prefiltered probes; background retention with CDC pipelining; SQLite metastore pool scaled to 32 for high-concurrency mutation workloads.
Data inlining: Small writes are serialized as Arrow IPC and committed directly into the Cayenne metastore (cayenne_inlined_data), bypassing the staged Vortex write path for low-latency ingest. Inline upserts atomically rewrite existing inline rows instead of emitting side delete markers, and inline data remains query-visible via an in-memory union scan with a generation-keyed decode cache. Inline rows are checkpointed to Vortex when row, segment, or byte thresholds are reached. Defaults are refresh-mode aware: inline writes are enabled by default for high-frequency caching, changes, and fast append workloads and disabled for full, snapshot, and slower append.
Query planning: Join filter propagation across equi-join keys (gated behind runtime.params.cayenne_filter_propagation), range fallback for large join filters, hot-path clone elimination, and IN-list rewrites for large filter lists.
Correctness: Synchronized partition commits across partitions, correct NULL-sentinel handling for nullable partition expressions (e.g. bucket(N, col)), Vortex panic fix on highly compressible data, and live reads through expired protected snapshots.
Catalog and platform: Refresh-mode-aware compaction defaults, rejection of non-distributed Cayenne catalog configurations, and a vendored Vortex DataFusion integration for faster iteration on the Cayenne planner.

Mutual TLS (mTLS)

Spice.ai Enterprise feature. See Enterprise Security.

Spice now supports full mutual TLS for both HTTP and Arrow Flight endpoints.

TLS cert hot-reload (#10727): The Spice runtime watches for SIGHUP and reloads TLS certificates without restarting, enabling cert rotation with zero downtime.

Public mTLS for HTTP and Flight (#10753): Two client_auth_mode values control how the server handles client certificates:

request — optional mTLS: the server requests a client cert but accepts connections without one (useful for migration windows).
required — strict mTLS: the server requires a valid client cert signed by the configured CA.

mTLS client certs for FlightSQL and Spice.ai connectors (#10764): Outbound connections from the FlightSQL and Spice.ai data connectors can now present client certificates for mutual authentication with upstream services.

Example configuration:

runtime:
  tls:
    enabled: true
    certificate_file: /etc/spice/tls/server.crt
    key_file: /etc/spice/tls/server.key
    client_auth_mode: required
    client_auth_ca_file: /etc/spice/tls/client-ca.crt

MongoDB Change Streams

MongoDB datasets configured with refresh_mode: changes now stream changes from MongoDB Change Streams into any local accelerator (#10813), providing real-time CDC without Debezium or Kafka.

Example configuration:

datasets:
  - from: mongodb:my_collection
    name: my_collection
    params:
      host: my-cluster.mongodb.net
      db: mydb
    acceleration:
      enabled: true
      engine: duckdb
      refresh_mode: changes

CDC Improvements

See Change Data Capture (CDC) for an overview of CDC in Spice.

Kafka CDC offset persistence (#10823): Kafka CDC offsets are persisted in sidecar tables for durable, resumable streams. On restart or failover, Spice resumes from the last committed offset.
Pipelined CDC ingestion (#10676): Source reads overlap with batch apply, with additional batching, envelope coalescing, and nullability propagation improvements across the apply pipeline.
Debezium schema evolution fix (#10144): Schema changes in Debezium-sourced datasets no longer break dataset initialization on reload (fixes #9782).

PostgreSQL DML Support

The PostgreSQL data connector now supports write-back via INSERT, UPDATE, and DELETE operations (#10446). Combined with the existing read-side federation, PostgreSQL-backed datasets can serve as full read/write tables. The PostgreSQL Catalog connector additionally exposes foreign-key metadata for NSQL and query planning (#10849).

Snowflake DML Support

The Snowflake data connector now supports write-back via INSERT, UPDATE, and DELETE operations (#10747), complementing its existing read capabilities.

Arrow Primary Key Upserts

Arrow-accelerated tables now support native upsert operations using primary key matching (#10749), providing efficient update-or-insert semantics for in-memory datasets.

DuckLake Promoted to Beta

The DuckLake Catalog and Data Connector are promoted to Beta quality (#10743).

DuckLake catalog tables with read_write access now support INSERT operations (#10744), enabling full read/write workflows against DuckLake-backed catalogs. The DuckLake connector also gains a series of correctness fixes for downcast, module registration, schema discovery, and S3 credentials (#10650).

User-Defined Functions

Spice now supports user-defined functions (UDFs) as a first-class spicepod component (#10571), letting you define reusable SQL functions in the spicepod or invoke remote functions over HTTP. The runtime also gains table user functions with HTTP server gating (#10675).

A security fix closes a remote-UDF SSRF vector (#10757).

Spatial SQL UDFs

Spice now ships an optional set of geospatial SQL UDFs (ST_*) for geometry workloads (#10833). The functions are gated behind a build feature and can be invoked from any SQL surface.

On-Demand Dataset Loading

Datasets can now be marked for on-demand loading (#10629). Deferred datasets are registered with a declared schema at startup (#10669) and only fully resolve when first referenced, reducing startup time and memory footprint for spicepods with many seldom-used datasets.

Spicepods also gain columns[].type and columns[].nullable (#10661) with a lenient type parser for declaring schemas inline.

Unified Query Cancellation

All query execution paths — HTTP, Flight, FlightSQL, MCP, and internal — now honour a unified cancellation signal (#10390). When a client disconnects, presses Ctrl-C in the REPL, or cancels an in-flight HTTP request, the corresponding query is cancelled end-to-end, freeing resources promptly.

Dynamic HTTP Connector

The HTTP data connector gains dynamic request headers parameterised from query predicates (#10604), subquery-driven request parameters for fan-out queries (#10636), HTTP response metadata as queryable columns via JSON schema decomposition (#10679), no-limit pagination (#10673), and shared rate-control across HTTP-based connectors using the same backend host (#10648).

HTTP Rate-Control Persistence

The HTTP rate-control state (per-endpoint throttle counters) is now persisted in object storage (#10697), ensuring rate limits survive restarts and are consistent across replicas. Rate-control metrics now use an origin label rather than the connector name for cleaner aggregation (#10689).

The metrics HTTP endpoint (/metrics) is also independently rate-limited (#10162) to prevent scraping from impacting query serving.

`refresh_mode: snapshot`

Spice.ai Enterprise feature. See Acceleration Snapshots.

A new refresh_mode: snapshot provides point-in-time snapshot acceleration (#10651), with SQLite and Turso WAL flushing and a Cayenne metastore slice integration so accelerated readers see a consistent snapshot while writes continue.

Storage-Profile Accelerator Tuning

Acceleration configs gain a new storage_profile field (#10913) with values auto (default), local_ssd, ebs, and tmpfs. Under auto, the runtime detects whether the acceleration store is backed by local SSD, EBS-class network disk, or tmpfs, and applies storage-aware defaults across DuckDB, partitioned DuckDB, SQLite, Turso, and Cayenne file-mode accelerators. Explicit per-accelerator parameters always override the profile defaults.

Provider-Aware LLM Prompt Caching

LLM calls automatically use provider-aware prompt caching (#10645) when the configured model provider supports it (e.g., Anthropic, OpenAI). System prompts and tool descriptions are marked for caching so repeated invocations within the cache window reuse the provider-side cached prefix, reducing latency and cost.

A new searchable registry mode for LLM tools (#10647) lets agents discover tools by semantic search rather than enumerating all tools in the system prompt, which scales to large tool inventories.

Responses API Improvements

The Responses API is now supported across all configured model providers (#10724). Streaming delta events via response.output_text.delta are also supported (#10828). The runtime now also accepts Authorization: Bearer headers in addition to x-api-key, bumps async-openai, and stops populating FunctionToolCall.id so OpenAI-compatible servers can assign the ID themselves (#10911).

Distributed Cluster Improvements

Spice.ai Enterprise feature. See High Availability.

Per-request executor readiness gate (#10860): /v1/ready on schedulers waits for a configurable quorum of executors before returning healthy, enabling proper rolling deployments.
Ballista S3 shuffle reads under cluster mode (#10910): The shuffle reader builds its S3 client from the executor pod's environment, matching the writer. Async queries with runtime.params.shuffle_location: s3://... now complete instead of failing with AccessDenied on shuffle fetches.
Flattened scheduler config (#10450): runtime.scheduler.partition_management.* fields are flattened directly onto runtime.scheduler and renamed under the canonical "partition assignment" terminology. See Breaking Changes.

Caching & Search

Improvements across Caching and Search:

Per-principal cache namespacing (#10702): SQL, search, and caching-accelerator caches are now namespaced per authenticated principal, so cached results never cross identity boundaries.
DuckDB HNSW vector indexes (#10695, #10674, #10668): DuckDB-accelerated views support HNSW vector indexes for vector search, vector search SQL is rewritten to activate HNSW_INDEX_SCAN, and HNSW indexes are preserved across data refresh.

Security Improvements

See Authentication and TLS for configuring Spice security.

API key timing-position leak and remote-UDF SSRF (#10757): Closed a timing-based position-disclosure leak in API key comparison and blocked SSRF via remote UDF endpoint parameters.
Configurable allowed_hosts for MCP (#10638): MCP servers can be restricted to an explicit allowlist of upstream hosts.

SQL, Query, and Developer Experience

See the SQL Reference for the full SQL surface area.

SQL REPL expanded view (#10797): Toggle \x in the REPL for a vertical key-value layout on wide result sets.
FlightSQL Substrait plan support (#10761): The Spice runtime now implements CommandStatementSubstraitPlan, enabling clients that submit plans as Substrait-encoded protobuf.
MCP auth for streamable HTTP tools (#10927): Streamable HTTP MCP tools support native authentication via mcp_auth_token and mcp_headers, both with full Spice secret expansion.
Elasticsearch FTS engine config and index lifecycle (#10672): Direct FTS engine configuration plus index lifecycle and ingestion controls for the Elasticsearch connector.
Self-hosted Spice connector (#10546): Connect Spice to another self-hosted Spice runtime as a federated source.

Connector Bug Fixes

Notable correctness fixes across the Data Connectors: DynamoDB Streams retry on transient errors (#10794) and typed-NULL handling in DML (#10511); ScyllaDB physical filter pushdown disabled to fix incorrect results (#10772); MSSQL TOP N pushdown for non-nullable sort columns (#10621); DuckLake include filter applied (#10738); DuckDB DELETE/UPDATE on full and caching refresh modes (#10632); checked arithmetic for Turso integer-millis and timestamp-to-nanosecond conversions (#10786, #10666); and Flight GetFlightInfo/DoGet schema parity (#10864). See the Changelog for the full list.

Dependency Updates

Dependency / Component	Version
DuckDB	v1.5.2
Iceberg	v0.9.1
Turso	v0.6.0
Vortex	v0.69.0

Contributors

Breaking Changes

Flattened runtime.scheduler configuration (#10450): The nested runtime.scheduler.partition_management block has been flattened and renamed to use the canonical "partition assignment" terminology. Migrate as follows:

# Before
runtime:
  scheduler:
    partition_management:
      interval: 30s
      max_assignments_per_cycle: 16
      discovery_timeout: 10s

# After
runtime:
  scheduler:
    partition_assignment_interval: 30s
    max_assignments_per_interval: 16
    partition_discovery_timeout: 10s

Cookbook Updates

No new cookbook recipes.

The Spice Cookbook includes 86 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v2.0.0-rc.5, use one of the following methods:

CLI:

spice upgrade v2.0.0-rc.5

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:2.0.0-rc.5 image:

docker pull spiceai/spiceai:2.0.0-rc.5

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai --version 2.0.0-rc.5

AWS Marketplace:

Spice is available in the AWS Marketplace.

What's Changed

Changelog

Enable DML support for PostgreSQL data connector by @phillipleblanc in #10446
feat(postgres): support inline PEM sslrootcert by @claudespice in #10578
Add foreign key metadata discovery to PostgreSQL Catalog by @sgrebnov in #10849
Add Snowflake DML support by @lukekim in #10747
Add MongoDB Change Streams support by @lukekim in #10813
Add user-defined functions by @lukekim in #10571
Add table user functions and gate HTTP servers by @lukekim in #10675
feat: add on-demand dataset loading by @phillipleblanc in #10629
feat(runtime): declared-schema deferred datasets by @phillipleblanc in #10669
feat(spicepod, runtime): add columns[].type / nullable + lenient type parser by @phillipleblanc in #10661
Replace external smb crate with internal SMB 3.1.1 client by @phillipleblanc in #10516
Add unified query cancellation across all paths by @lukekim in #10390
Add dynamic HTTP request headers by @lukekim in #10604
feat(http): Support dynamic HTTP connector request params from subqueries by @lukekim in #10636
feat(http): pass through HTTP metadata columns with JSON schema decomposition by @lukekim in #10679
Add nolimit HTTP pagination max pages by @lukekim in #10673
Add shared HTTP rate control for connectors by @lukekim in #10648
Use origin label instead of name for HTTP rate control metrics by @lukekim in #10689
fix(http): reject OR across different HTTP filter columns by @lukekim in #10625
Add provider-aware LLM prompt caching by @lukekim in #10645
Add searchable registry mode for LLM tools by @lukekim in #10647
feat: refresh_mode: snapshot + SQLite/Turso WAL flush + Cayenne metastore slice by @phillipleblanc in #10651
feat: per-principal cache namespacing for SQL/search/caching-accelerator by @lukekim in #10702
Add self-hosted Spice connector support by @phillipleblanc in #10546
Add Delta Lake Azure tenant parameter by @phillipleblanc in #10671
Support OAuth2 client credentials in 'spice cloud login' by @ewgenius in #10586
Add configurable allowed_hosts for MCP by @lukekim in #10638
fix: make Helm chart probes configurable by @peasee in #10696
Strip high-cardinality datasets dim from anonymous telemetry by @lukekim in #10711
feat(elasticsearch): direct FTS engine config + index lifecycle and ingestion controls by @lukekim in #10672
Add DuckDB HNSW vector index support for accelerated views by @sgrebnov in #10695
Rewrite DuckDB vector search SQL to activate HNSW_INDEX_SCAN by @sgrebnov in #10674
Fix DuckDB HNSW vector indexes lost after data refresh by @sgrebnov in #10668
Fix DuckDB DELETE/UPDATE on full and caching refresh mode datasets by @phillipleblanc in #10632
Fix DuckLake connector: downcast, module registration, schema discovery, and S3 credentials by @sgrebnov in #10650
Fix federation pushing denied functions inside subqueries to remote engines by @phillipleblanc in #10692
fix(caching): honour refresh_on_startup: always in caching mode by @phillipleblanc in #10594
fix(iceberg): rebuild storage factory when Hadoop catalog scheme is inferred by @sgrebnov in #10601
Pipeline CDC ingestion: overlap source reads with batch apply by @lukekim in #10676
fix: add NULL check to CDC primary key extraction by @lukekim in #10684
Properly handle nullability during CDC processing by @krinart in #10803
Flatten scheduler config and rename partition management → partition assignment by @lukekim in #10450
Improve NSQL UX and harden internal LLM tools by @lukekim in #10715
Support Responses API across model providers by @lukekim in #10724
Update xAI default model and handle Grok model retirements by @Jeadie in #10723
Improve cli table layout by @krinart in #10725
TLS cert hot-reload (mTLS plan M1) by @phillipleblanc in #10727
Fix DuckLake catalog include filter being ignored by @phillipleblanc in #10738
Promote DuckLake Catalog and Data Connector to Beta quality by @sgrebnov in #10743
feat(ducklake): Support INSERT on catalog tables with read_write access by @sgrebnov in #10744
perf(cdc): coalesce envelopes and overlap commits in apply pipeline by @lukekim in #10745
feat: Allow full version tags in spicepod version by @peasee in #10748
Add Arrow primary key upserts by @lukekim in #10749
fix(snapshot): keep refresh_mode snapshot read-only by @phillipleblanc in #10752
feat(tls): public mTLS for HTTP and Flight (channel + identity modes) by @phillipleblanc in #10753
perf(cayenne): lock-free deletion caches with bloom-prefiltered probe by @lukekim in #10756
fix(security): close API key timing-position leak and remote-UDF SSRF by @lukekim in #10757
Fix 'wait_until_dependent_tables_are_ready' for catalogs by @phillipleblanc in #10758
Fixes for views and resolved tables on 'spice refresh' CLI by @phillipleblanc in #10759
Implement FlightSQL CommandStatementSubstraitPlan support by @lukekim in #10761
feat(connectors): mTLS client cert support for flightsql and spiceai connectors by @phillipleblanc in #10764
Allow arbitrary filenames when specifying spicepod path + kind validation by @krinart in #10777
fix: ignore field metadata in schema compatibility check in index_table_scan by @Jeadie in #10778
Display pushed-down limits in EXPLAIN TREE output by @lukekim in #10779
fix: enable streaming append for Kafka with Cayenne accelerator by @lukekim in #10780
fix: bound chunked-index intermediate batch size to prevent OOM by @phillipleblanc in #10783
fix: label all columns in spice cloud metrics table output by @claudespice in #10784
fix: use checked arithmetic for Turso integer-millis timestamp read path by @claudespice in #10786
fix: use checked arithmetic in timestamp-to-nanosecond conversions by @claudespice in #10666
Upgrade to DuckDB v1.5.2 by @sgrebnov in #10788
Improve CDC ingestion performance by @lukekim in #10789
Fix tool_search/tool_invoke spans by @lukekim in #10791
Add Cayenne inline mutations and benchmark coverage by @lukekim in #10792
Ensure we always resolve table names in distributed mode/metadata by @Jeadie in #10793
Remove permanent errors from DynamoDB Streams by @krinart in #10794
Add expanded view mode for wide table display in SQL REPL by @lukekim in #10797
Fix Cayenne CDC schema mismatch error by @sgrebnov in #10800
Executors should create catalog tables on join by @Jeadie in #10807
Add compressed file support for listing connectors by @lukekim in #10809
Improve Cayenne mutation, scan, and inline memtable scaling by @lukekim in #10811
Add range fallback for large join filters by @lukekim in #10816
Improve Cayenne join filter pushdown by @lukekim in #10818
Synchronize Cayenne partition commits across partitions by @phillipleblanc in #10819
fix: Deny nondistributed cayenne catalog by @peasee in #10821
Enable parallel Cayenne Vortex writes by @lukekim in #10822
Expand Arrow type handling in formatting and Elasticsearch by @lukekim in #10825
Add response.output_text.delta to responses API by @krinart in #10828
feat(cayenne): add join filter propagation and no-spill Q21 planning by @lukekim in #10840
Upgrade Turso to v0.6.0 by @sgrebnov in #10843
feat(cli): add spice feedback command to open community Slack by @lukekim in #10856
Upgrade iceberg to v0.9.1 by @sgrebnov in #10859
feat(cluster): per-request executor readiness gate on /v1/ready by @phillipleblanc in #10860
fix: Require dim-side statistics for CayennePropagateFilterAcrossEquiJoinKeys by @sgrebnov in #10863
fix: Debezium schema evolution breaks dataset init on reload by @claudespice in #10144
fix(mssql): Push topK limit to SQL Server for non-nullable sort columns by @Jeadie in #10621
fix(ScyllaDB): disable physical filter pushdown by @sgrebnov in #10772
fix: handle typed NULLs and prevent overflow in DynamoDB DML type conversions by @krinart in #10511
fix: use InsertOp::Overwrite in DynamoDB bootstrap scan_and_overwrite_accelerator by @krinart in #10639
Improve DynamoDB Bootstrap performance by @krinart in #10616
fix: preserve field and schema metadata in Vortex type transformation by @lukekim in #10628
fix: GH connector - explicitly use AWS LC RS crypto provider for jwt by @phillipleblanc in #10619
fix: add snapshot mode guards to delete_from/update and delegate DML in SwappableTableProvider by @phillipleblanc in #10685
Persist HTTP rate-control state in object storage by @lukekim in #10697
Rate limit metrics HTTP endpoint by @lukekim in #10162
feat(geo): add optional spatial SQL UDF support by @lukekim in #10833
feat(cayenne): CDC throughput, compaction, scan caching, and benchmarks by @lukekim in #10852
fix(cayenne): fix Vortex panic on highly compressible data by @sgrebnov in #10855
fix(cayenne): Read live protected snapshots after cleanup grace period by @sgrebnov in #10901
fix: Disable Cayenne HashJoin rewriter optimizer by @sgrebnov in #10882
Fix GetFlightInfo vs DoGet Flight Schema by @krinart in #10864
fix(search): preserve column casing in /v1/search primary key plumbing by @claudespice in #10909
fix(object-store): dedupe s3 url style auto-detection log by @phillipleblanc in #10898
Improve Spice CLI manifest editing and direct command modes by @lukekim in #10815
Persist Kafka CDC offsets in sidecar tables by @lukekim in #10823
feat(task-history): record Ballista stages for distributed queries by @phillipleblanc in #10831
Add '#[deny(clippy::missing_trait_methods)]' to wrapper/delegation trait impls by @Jeadie in #10795
Optimize Cayenne catalog maintenance paths by @lukekim in #10904
Centralize DuckDB settings for accelerator by @ewgenius in #10895
deps(ballista): bump to 47e2b494 to fix S3 shuffle reads under cluster mode by @phillipleblanc in #10910
Authorization header + Bump async-openai + responses_adapter fix by @krinart in #10911
Tune accelerators by storage profile by @lukekim in #10913
feat: add dataset-level on_schema_change config by @lukekim in #10908
Handle NULL sentinel for nullable partition expressions by @Jeadie in #10880
fix: Remove Cayenne Catalog from catalog registration by @peasee in #10914
Add catalog name to foreign key metadata in postgres catalog by @Jeadie in #10917
Cayenne perf: eliminate redundant clones, PK point-lookup fanout fix, IN-list rewrite + microbench coverage by @lukekim in #10916
fix(turso-shared): retry on Turso BEGIN CONCURRENT "Write-write conflict" by @lukekim in #10946
Vendor Vortex DataFusion for Cayenne by @lukekim in #10933
perf(cayenne): background retention + enable CDC pipelining for retention-configured tables by @lukekim in #10936
feat(cayenne): scale metastore pool to 32 + vs_duckdb_scaling benches (1→128 concurrency, sqlite + turso lanes) by @lukekim in #10943
feat(mcp): support auth for streamable HTTP tools by @phillipleblanc in #10927
Explicit error if v1/search requests a table without search index by @Jeadie in #10968
Fix spicepod loading failure when directory name contains dots by @sgrebnov in #10958
Extend append tests with arrow engine configurations by @sgrebnov in #10959
Remove dataset on_schema_change Policy from rc.5 release notes by @sgrebnov in #10964
Skip tpcds_q78 for Cayenne engine at SF100 by @sgrebnov in #10966
fix: Update benchmark snapshots May-20 by @app/github-actions in #10952
Fix #10951: UdtfExec invariant Vec lengths must match children count by @phillipleblanc in #10953
docs(release): update v2.0.0-rc.5 notes with latest trunk PRs by @lukekim in #10949
Remove eval related things for v2.0.0 by @Jeadie in #10945
build(deps): bump ubuntu from 24.04 to 26.04 in the docker-dependencies group by @app/dependabot in #10883
fix: Add publish = false to chbench-driver by @sgrebnov in #10939

Full Changelog: https://github.com/spiceai/spiceai/compare/v2.0.0-rc.4...v2.0.0-rc.5

Spice v2.0-rc.4 (Apr 30, 2026)

May 1, 2026 · 22 min read

William Croxson

Senior Software Engineer at Spice AI

Announcing the release of Spice v2.0-rc.4! 🚀

v2.0.0-rc.4 is the fourth release candidate for advanced testing of v2.0, building on v2.0.0-rc.3.

Highlights in this release candidate include:

Elasticsearch Data Connector (Alpha) with native hybrid search (BM25 full-text + kNN vector + RRF)
PostgreSQL Native CDC via WAL logical replication, eliminating the need for Debezium or Kafka
Multi-vector Embeddings with MaxSim for ColBERT-style late-interaction retrieval
Rerank UDTF for hybrid search pipelines with automatic query propagation
HashiCorp Vault and Azure Key Vault Secret Stores for enterprise secret management
DuckDB Vector Engine with HNSW index support
Azure Cosmos DB Connector (RC), Git Connector promoted to RC
MCP Streamable HTTP transport
Read-only API Key Enforcement on Flight DoGet and async query paths

What's New in v2.0.0-rc.4

Elasticsearch Data Connector (Alpha, Spice.ai Enterprise)

The new Elasticsearch data connector enables querying Elasticsearch indexes as SQL tables with full hybrid search support. Currently available in Spice.ai Enterprise.

Key capabilities:

SQL Table Access: Query any Elasticsearch index with standard SQL via a native DataFusion TableProvider.
kNN Vector Search: Use the vector_search() UDTF against Elasticsearch-backed vector fields.
BM25 Full-Text Search: Use the text_search() UDTF for native Elasticsearch full-text queries.
Hybrid Search: Combine kNN and BM25 results with the rrf() UDTF for reciprocal rank fusion.
Elasticsearch as a Vector Engine: Accelerated datasets can use Elasticsearch as the backing vector engine for embedding storage and retrieval.

Example configuration:

datasets:
  - from: elasticsearch:my_index
    name: my_data
    params:
      elasticsearch_endpoint: https://my-cluster.es.io:9200
      elasticsearch_username: ${secrets:es_user}
      elasticsearch_password: ${secrets:es_password}

PostgreSQL Native Replication via WAL

Postgres datasets configured with refresh_mode: changes can now stream changes directly from PostgreSQL logical replication (WAL) into any local accelerator without Debezium or Kafka required.

Key capabilities:

Native Logical Replication: Uses pgoutput decoding to stream INSERT/UPDATE/DELETE events.
Automatic Slot Management: Each Spice replica creates a distinct replication slot (spice_<dataset>_<hash>), so multi-replica deployments work automatically. Publications are shared.
Bootstrap Snapshot: An initial REPEATABLE READ snapshot seeds the accelerator before replication begins.
LSN Acknowledgement: The LsnCommitter sends durable LSN back to Postgres so WAL segments are reclaimed.
All Accelerators Supported: Works with DuckDB, SQLite, Postgres, Cayenne, and Arrow accelerators.

Example configuration:

datasets:
  - from: postgres:my_table
    name: my_table
    params:
      pg_host: localhost
      pg_port: 5432
      pg_db: mydb
      pg_publication: my_publication   # optional; auto-created if omitted
    acceleration:
      enabled: true
      engine: duckdb
      refresh_mode: changes

Multi-vector Embeddings with MaxSim (Late Interaction)

Column-level embeddings now support list-of-string columns, producing one embedding vector per list element and enabling ColBERT-style late-interaction retrieval.

Key capabilities:

Multi-vector per Row: Columns of type List<String> produce List<FixedSizeList<F32, D>> — one embedding per list element.
MaxSim / Mean / Sum Scoring: Per-row score is the max, mean, or sum cosine over the list elements. Default is MaxSim (ColBERT).
_match Column: Returns the specific list element that produced the highest cosine similarity.
No Schema Changes Required: Works with existing embedding configurations; activates automatically for list-type columns.

Rerank UDTF for Hybrid Search

A new rerank() table-valued function reorders scored results from vector_search, text_search, or rrf by a reranker model's relevance judgements. See Search Functionality for an overview of search UDTFs.

Key capabilities:

Auto Query Propagation: The query string is automatically inherited from a nested search UDTF — no repetition required.
Any Chat Model as Reranker: Any registered chat/completion model can serve as a reranker via the built-in LlmRerank adapter (listwise prompt by default; pointwise available).
Filter and Projection Pushdown: The RerankExec physical node supports pushdown, reducing data movement.
Extensible: A new RerankerModelStore sits alongside ChatModelStore and EmbeddingModelStore; native providers (Cohere, Voyage, BGE) can be added without runtime plumbing changes.

SELECT * FROM rerank(
    rrf(vector_search('my_table', 'query text'), text_search('my_table', 'query text')),
    document => content
) LIMIT 10;

New Secret Stores: HashiCorp Vault and Azure Key Vault

Two new enterprise-grade Secret Stores are now available.

HashiCorp Vault (hashicorp_vault):

KV v2 (default) and KV v1 mount support.
Auth methods: token, approle, kubernetes, jwt.
Token leases are cached and automatically re-acquired on expiry.

secrets:
  - from: hashicorp_vault:secret/my-app
    name: my_secrets
    params:
      hashicorp_vault_addr: https://vault.example.com
      hashicorp_vault_auth_method: approle
      hashicorp_vault_role_id: ${env:VAULT_ROLE_ID}
      hashicorp_vault_secret_id: ${secrets:vault_secret_id}

Azure Key Vault (azure_keyvault):

Per-key caching with single-flight fetch coalescing.
Auth methods: service principal, managed identity, workload identity, Azure CLI, or auto-detect.
Supports sovereign clouds via endpoint parameter.

secrets:
  - from: azure_keyvault:my-vault
    name: my_secrets
    params:
      azure_keyvault_auth_method: managed_identity

DuckDB Vector Engine

DuckDB-accelerated tables can now use DuckDB's HNSW index for vector search via the vector_engine: duckdb option, enabling fast approximate nearest-neighbor search without an external vector store.

Example configuration:

datasets:
  - from: postgres:public.documents
    name: documents
    columns:
      - name: content
        embeddings:
          - from: hf_minilm
            row_id: id
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
    vectors:
      enabled: true
      engine: duckdb
      params:
        duckdb_distance_metric: cosine
        duckdb_hnsw_m: 16
        duckdb_hnsw_ef_construction: 64
        duckdb_hnsw_ef_search: 32

embeddings:
  - from: huggingface:huggingface.co/minishlab/potion-base-2M
    name: hf_minilm

New and Promoted Connectors

Azure Cosmos DB (Alpha):

A new read-only Azure Cosmos DB NoSQL / Core SQL API connector built on the azure_data_cosmos 0.30 SDK. Supports cross-partition scans, schema inference from document samples, and key-based auth (connection string or account endpoint + key).

Git Connector (RC):

The Git data connector is promoted to RC status with HTTPS/SSH auth (git_token, git_username/git_password, git_ssh_key), Git LFS support (enable_lfs), and per-repo connection resilience (semaphore, bounded retries with exponential backoff, permanent-error circuit breaking).

DynamoDB Write Support (DML)

DynamoDB datasets now support write-back via INSERT, UPDATE, and DELETE operations, complementing the existing read and CDC streaming capabilities.

MCP Streamable HTTP Transport

The MCP server has been upgraded to rmcp 1.5.0 and switched to the Streamable HTTP transport (/v1/mcp), replacing the previous SSE-based endpoint. The client-side transport is updated to StreamableHttpClientTransport.

Security Improvements

Read-only API Key Enforcement: API keys with read-only scope are now strictly enforced on the Flight DoGet path and on async query endpoints, preventing write operations from being issued under a read-only key.

GitHub Workflow Hardening: CI workflows have been hardened with improved security posture to reduce supply-chain risk.

Developer Experience Improvements

Actionable Config Errors: Parameter typos, missing secret references, and unknown engine names now produce specific, actionable error messages with Levenshtein-based suggestions, rather than silent drops or generic "missing required parameter" messages.
spice init Improvements: Written spicepods now include a yaml-language-server: $schema=... directive for IDE completions. Creation messages print regardless of log level.
REPL Improvements: Log filter honors RUST_LOG when -v is not passed; version banner moves to stderr and prints only on an interactive TTY.
403 / 401 Routing: HTTP 403 responses route to a new PermissionDenied variant; 401 messages point at spice login / SPICE_API_KEY.

OpenTelemetry Improvements

See Observability & Monitoring and the runtime.telemetry reference for full configuration details.

Metric Name Prefix: Configure a prefix for all exported OTLP metric names via runtime.telemetry.metric_prefix.
Delta Temporality Default: The OTLP push exporter now defaults to delta temporality, matching Prometheus and most backends.
Resource Attributes: runtime.telemetry.properties are applied as OTLP resource attributes on exported metrics.

Full-text Search Performance

Tantivy full-text search ingestion performance is significantly improved with better batch handling and a rollback-on-error path.

SQL and Query Engine

DataFusion Upgrade: Updated to a newer DataFusion revision with additional bug fixes and performance improvements.
Views on DDL Catalogs: DDL-defined catalogs (e.g., Unity Catalog) can now expose and query views.
flatten_json / json_tree / expand_maps UDTFs: New table-valued functions for JSON transformation, map expansion, and schema decomposition in query pipelines. See JSON Functions and Operators.
cosine_distance Pushdown to DuckDB: cosine_distance is now pushed down to DuckDB accelerators via array_cosine_distance.
Snowflake Type Support: Added support for OBJECT, MAP, GEOGRAPHY, GEOMETRY, VECTOR, and TIMESTAMP_LTZ types in the Snowflake connector.
MySQL Zero-Date Behavior: The MySQL connector adds a new mysql_zero_date_behavior parameter (null or error) controlling how MySQL zero-date values (0000-00-00) are handled.
Databricks Timeouts: The Databricks connector adds new connect_timeout and client_timeout parameters for sql_warehouse mode.

Dependency Updates

Dependency / Component	Version / Update
DataFusion	Updated
rmcp	v1.5.0 (from fork pin)
mistral.rs	Updated
openssl	0.10.78

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No new cookbook recipes.

The Spice Cookbook includes 86 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v2.0.0-rc.4, use one of the following methods:

CLI:

spice upgrade v2.0.0-rc.4

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:2.0.0-rc.4 image:

docker pull spiceai/spiceai:2.0.0-rc.4

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai --version 2.0.0-rc.4

AWS Marketplace:

Spice is available in the AWS Marketplace.

What's Changed

Changelog

Integrate spiceio and makefile_targets into pr.yml by @lukekim in #10357
ci: skip artifact compression for test binaries/archives by @lukekim in #10381
chore(deps): bump spiceai/candle, spiceai/mistral.rs, aws-lc-rs, tantivy, rand by @lukekim in #10379
Bump datafusion-table-providers (#10375) by @lukekim in #10384
fix: Update Search integration test snapshots by @app/github-actions in #10376
v2.0.0-rc.3 preparation by @ewgenius in #10382
fix(spicepod): JSON schema accepts string or {name: expr} for partition_by by @lukekim in #10352
fix: Use ROUND for Turso decimal BETWEEN comparisons (fixes #9872) by @claudespice in #10360
Revert "v2.0.0-rc.3 preparation" from trunk by @ewgenius in #10386
Add on_schema_resolved dataset ready state by @lukekim in #10368
feat: Add Elasticsearch data connector with hybrid search support by @lukekim in #10258
ci: bump test archive upload compression-level to 1 by @lukekim in #10388
feat(git-connector): promote Git connector to RC status by @lukekim in #10385
feat(postgres): stream WAL directly to Spice accelerators by @lukekim in #10364
Add schema decomposition to the HTTP connector by @lukekim in #10393
fix(cayenne): Skip catalog refresh state reload for existing providers by @sgrebnov in #10396
Make cayenne-flightsql tool by @Jeadie in #10356
build(deps): bump the github-actions-dependencies group with 2 updates by @app/dependabot in #10398
Update openapi.json by @app/github-actions in #10272
Merge develop to trunk — 2026-04-19 by @claudespice in #10407
feat(otel): default OTLP push exporter to delta temporality by @phillipleblanc in #10412
fix: Restore analyzer rule ordering to run federation before type coercion by @sgrebnov in #10415
fix: Map Utf8/LargeUtf8 to STRING in Databricks/Spark SQL dialects by @sgrebnov in #10420
feat(otel): add metric name prefix at runtime.telemetry.metric_prefix by @phillipleblanc in #10418
fix: Map LargeUtf8 to VARCHAR in Athena ODBC dialect by @sgrebnov in #10419
feat(cluster): connector-driven object store registration on executors by @phillipleblanc in #10414
build(deps): bump ubuntu from 22.04 to 24.04 in the docker-dependencies group by @app/dependabot in #10397
fix: Update benchmark snapshots Apr 20 by @app/github-actions in #10417
feat(otel): apply runtime.telemetry.properties as resource attributes on exported metrics by @phillipleblanc in #10416
Publish RC releases to DockerHub; upgrade runners to ubuntu-24.04 by @lukekim in #10428
feat: Add Azure Cosmos DB (NoSQL) data connector (RC) by @lukekim in #10392
feat(datafusion): flatten_json_properties + json_tree UDTFs by @lukekim in #10406
Harden /v1/tools and /v1/nsql against unauthenticated / LLM-driven SQL by @lukekim in #10365
feat(embeddings): multi-vector embeddings with MaxSim + late-interaction by @lukekim in #10408
Update GH runners for CUDA builds by @ewgenius in #10432
fix(delta_lake): register object stores on cluster executors by @phillipleblanc in #10436
DF-native DML by @krinart in #10327
ci: run Build and Test on spiceai-macos; split install jobs by profile by @lukekim in #10434
Improve search UDTFs: text_search, vector_search, rrf by @lukekim in #10387
fix(model2vec): Improve robustness of model loading for sentence-transformers layouts by @sgrebnov in #10444
Merge develop to trunk — 2026-04-21 by @claudespice in #10448
Enable filter pushdown for vector_search UDTF by @sgrebnov in #10447
Support Snowflake OBJECT, MAP, GEOGRAPHY, GEOMETRY, VECTOR, TIMESTAMP_LTZ types by @lukekim in #10451
Fix Databricks tests by @krinart in #10449
fix(cluster): forward register_object_stores through connector wrappers by @phillipleblanc in #10460
Fixes for vector-search by @krinart in #10455
Add expand_maps option and flatten_json UDTF by @lukekim in #10452
fix: Update Search integration test snapshots by @app/github-actions in #10458
Fix physical codec decode ambiguity for empty protobuf messages by @sgrebnov in #10466
chore(logging): demote s3_single_file_cached skip refresh log to debug by @phillipleblanc in #10467
Enable filter pushdown for rrf UDTF by @sgrebnov in #10465
feat(cluster): consolidate distributed state into cluster.json by @phillipleblanc in #10463
feat(cayenne): Add column statistics and data inlining by @lukekim in #10314
docs(copilot): flag missing wrapper delegation when adding default trait methods by @phillipleblanc in #10461
Wire Elasticsearch vector engine write path through acceleration by @lukekim in #10453
Add helm lint CI by @ewgenius in #10468
Fix Azure and GCS acceleration snapshot object store credential handling by @phillipleblanc in #10486
Update spicepod.schema.json by @app/github-actions in #10485
fix(secrets): harden AWS Secrets Manager secret store by @lukekim in #10478
Update datafusion-ballista crate by @sgrebnov in #10488
feat(secrets): add ParameterSpec and more params for AWS secrets manager by @phillipleblanc in #10487
Add rerank UDTF for hybrid search with query auto-propagation by @lukekim in #10469
Fix flatten_json_properties by @krinart in #10475
fix: preserve field and schema metadata in expand_views_schema by @claudespice in #10494
Upgrade rmcp to upstream 1.5.0; switch MCP server to Streamable HTTP by @lukekim in #10491
fix: handle Snowflake TIMESTAMP_LTZ wire format and prevent nanosecond overflow by @claudespice in #10493
Lint parity in Makefile by @krinart in #10492
Add connect_timeout/client_timeout params to Databricks sql_warehouse mode by @lukekim in #10495
fix(tracing): suppress opentelemetry INFO logs at all verbosity levels by @lukekim in #10497
DynamoDB DML by @krinart in #10470
feat(cayenne): native vector search via SIMD similarity UDFs by @lukekim in #10456
fix(cli): suppress banner for all JSON-producing cloud subcommands (fixes #10498) by @claudespice in #10510
fix(deps): bump openssl to 0.10.78 by @phillipleblanc in #10509
fix(s3): quiet AWS SDK credential probe when no region is configured by @phillipleblanc in #10506
fix(cdc): emit ready signal on caught-up Kafka/Debezium streams (#5201) by @phillipleblanc in #10504
runtime-cluster crate + Run partition discovery before forwarding refresh to executors by @krinart in #10490
Update lint-rust target to use --keep-going by @Jeadie in #10508
Add TPC-H SF100 s3[parquet]-duckdb[file] benchmark spicepod by @lukekim in #10524
Remove dev-profile install steps from pr.yml by @Jeadie in #10507
fix: add missing NULL check on Timestamp path in append refresh by @claudespice in #10518
fix: return error on Decimal128/256 overflow instead of silently dropping scale by @claudespice in #10519
fix: delegate update and delete_from in IndexedTableProvider and EmbeddingTable by @claudespice in #10520
feat(devx): make config errors, CLI, and REPL lead users to success by @lukekim in #10489
fix(rerank): defer execution to RerankExec, enable filters and projection pushdown by @sgrebnov in #10514
fix(llms): support Gemma models with missing attention_bias config field by @lukekim in #10523
Fix vector_search silently ignoring named limit/column/include_score args by @sgrebnov in #10527
fix: split unsupported filters locally in scan() for UseSource mode by @ewgenius in #10528
feat(secrets): add Azure Key Vault secret store by @lukekim in #10496
Bump mistralrs by @krinart in #10532
Fix benchmark configurations and CI build issues by @sgrebnov in #10535
Fix catalog query overrides for MySQL and MSSQL benchmarks by @sgrebnov in #10543
For Cayenne, preserve matched columns for MERGE ... ON <cols> by @Jeadie in #10340
build(deps): bump the aws-sdk group across 1 directory with 5 updates by @app/dependabot in #10538
docs: update AI agent instructions (git workflow + Rust 1.94) by @lukekim in #10544
fix: Update tpch benchmark snapshots by @app/github-actions in #10529
fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-duckdb[file].yaml by @app/github-actions in #10525
Extract runtime-datafusion from runtime by @krinart in #10545
Use generic DML extension planner for Cayenne by @Jeadie in #10437
fix: Update Search integration test snapshots by @app/github-actions in #10552
Fix security and correctness audit issues by @lukekim in #10526
fix(MySQL): revert MySQL result column reorder to fix federated query failures by @sgrebnov in #10557
Fix protoc installation by @krinart in #10566
fix: Disable Ballista dynamic filters on HashJoinExec by @peasee in #10548
Support views on DDL catalogs by @Jeadie in #10554
Update datafusion by @Jeadie in #10422
Improve full-text search indexing performance by @sgrebnov in #10464
feat(mysql): add mysql_zero_date_behavior parameter (null|error) by @phillipleblanc in #10573
fix(snowflake): declare private_key in connector PARAMETERS (fixes #10517) by @claudespice in #10559
Honour CARGO_TARGET_DIR in Makefiles by @Jeadie in #10569
Enable cosine_distance pushdown to DuckDB accelerator via array_cosine_distance by @sgrebnov in #10564
fix: Update test snapshots by @app/github-actions in #10570
fix: Update tpch benchmark snapshots by @app/github-actions in #10560
feat(snapshots): make snapshots an optional feature by @phillipleblanc in #10574
Enforce read-only API key restrictions on Flight DoGet and async query paths by @Jeadie in #10551
Improved security posture on Github workflows by @Jeadie in #10556
fix: Update datafusion-table-providers to improve SqlTable filter pushdown by @sgrebnov in #10595
feat(secrets): add HashiCorp Vault secret store by @phillipleblanc in #10561
fix: delegate update() in UpsertDedupTableProvider to inner provider by @claudespice in #10593
Add DuckDB vector engine support by @lukekim in #10562
Sharepoint - add object-store listing connector with expanded auth and write support by @lukekim in #10473
fix: Install protoc from source by @peasee in #10597

Full Changelog: https://github.com/spiceai/spiceai/compare/v2.0.0-rc.3...v2.0.0-rc.4

Spice v1.5.2 (Aug 11, 2025)

August 12, 2025 · 7 min read

Kevin Zimmerman

Principal Software Engineer at Spice AI

Announcing the release of Spice v1.5.2! 🛠️

Spice v1.5.2 introduces a new Amazon Bedrock Models Provider for converse API (Nova) compatible models, AWS Redshift support using the Postgres data connector, and Hadoop Catalog Support for Iceberg tables along with several bug fixes and improvements.

What's New in v1.5.2

Amazon Bedrock Models Provider: Adds a new Amazon Bedrock LLM Provider. Models compatible with the Converse API (Nova) are supported.

Amazon Bedrock provides access to a range of foundation models for generative AI. Spice supports using Bedrock-hosted models by specifying the bedrock prefix in the from field and configuring the required parameters.

Supported Model IDs:

amazon.nova-lite-v1:0
amazon.nova-micro-v1:0
amazon.nova-premier-v1:0
amazon.nova-pro-v1:0

Refer to the Amazon Bedrock documentation for details on available models and cross-region inference profiles.

Example Spicepod.yaml:

models:
  - from: bedrock:us.amazon.nova-lite-v1:0
    name: novash
    params:
      aws_region: us-east-1
      aws_access_key_id: ${ secrets:AWS_ACCESS_KEY_ID }
      aws_secret_access_key: ${ secrets:AWS_SECRET_ACCESS_KEY }
      bedrock_guardrail_identifier: arn:aws:bedrock:abcdefg012927:0123456789876:guardrail/hello
      bedrock_guardrail_version: DRAFT
      bedrock_trace: enabled
      bedrock_temperature: 42

For more information, see the Amazon Bedrock Documentation.

AWS Redshift Support for Postgres Data Connector: Spice now supports connecting to Amazon Redshift using the PostgreSQL data connector. Redshift is a columnar OLAP database compatible with PostgreSQL, allowing you to use the same connector and configuration parameters.

To connect to Redshift, use the format postgres:schema.table in your Spicepod and set the connection parameters to match your Redshift cluster settings.

Example Spicepod.yaml:

# Example datasets for Redshift TPCH tables
datasets:
  - from: postgres:public.customer
    name: customer
    params:
      pg_host: ${secrets:PG_HOST}
      pg_port: 5439
      pg_sslmode: prefer
      pg_db: dev
      pg_user: ${secrets:PG_USER}
      pg_pass: ${secrets:PG_PASS}
  - from: postgres:public.lineitem
    name: lineitem
    params:
      pg_host: ${secrets:PG_HOST}
      pg_port: 5439
      pg_sslmode: prefer
      pg_db: dev
      pg_user: ${secrets:PG_USER}
      pg_pass: ${secrets:PG_PASS}

Redshift types are mapped to PostgreSQL types. See the PostgreSQL connector documentation for details on supported types and configuration.

Hadoop Catalog Support for Iceberg: The Iceberg Data and Catalog connectors now support connecting to Hadoop catalogs on filesystem (file://) or S3 object storage (s3://, s3a://). This enables connecting to Iceberg catalogs without a separate catalog provider service.

Example Spicepod.yaml:

catalogs:
  - from: iceberg:file:///tmp/hadoop_warehouse/
    name: local_hadoop
  - from: iceberg:s3://my-bucket/hadoop_warehouse/
    name: s3_hadoop

  # Example datasets
  - from: iceberg:file:///data/hadoop_warehouse/test/my_table_1
    name: local_hadoop
  - from: iceberg:s3://my-bucket/hadoop_warehouse/test/my_table_2
    name: s3_hadoop

For more details, see the Iceberg Data Connector documentation and the Iceberg Catalog Connector documentation.

Parquet Reader: Optional Parquet Page Index: Fixed an issue where the Parquet reader, using arrow-rs and DataFusion, errored on files missing page indexes, despite the Parquet spec allowing optional indexes. The Spice team contributed optional page index support to arrow-rs (PR #6) and configurable handling in DataFusion (PR #93). A new runtime parameter, parquet_page_index, makes Parquet Page Indexes configurable in Spice:

runtime:
  params:
    parquet_page_index: required # Options: required, skip, auto

required: (Default) Errors if page indexes are absent.
skip: Ignores page indexes, potentially reducing query performance.
auto: Uses page indexes if available; skips otherwise.

This improves compatibility and query flexibility for Parquet datasets.

Contributors

Breaking Changes

Amazon S3 Vectors Vector Engine: Amazon S3 Vectors is currently a preview AWS service. A recent update to the Amazon S3 Vectors service API introduced a breaking change that affects the integration when projecting (selecting) the embedding column. This results in the following error:

Json error: whilst decoding field 'data': expected [ got nullReceived only partial JSON payload from QueryVectors

The issue is expected to be resolved in the next release of Spice. A current workaround is to limit queries to non-embedding columns.

i.e. instead of:

SELECT url, title, scored, body_embedding
FROM vector_search(pulls, 'bugs in DuckDB', 4)
WHERE state = 'OPEN'
ORDER BY score DESC
LIMIT 4;

Remove the *_embedding column from the projection. E.g.

SELECT url, title, scored
FROM vector_search(pulls, 'bugs in DuckDB', 4)
WHERE state = 'OPEN'
ORDER BY score DESC
LIMIT 4;

This issue and workaround also applies to SELECT * FROM vector_search(..). E.g.

SELECT *
FROM vector_search(pulls, 'bugs in DuckDB', 4)
WHERE state = 'OPEN'
ORDER BY score DESC
LIMIT 4;

Cookbook Updates

Added Amazon Redshift Support to the Postgres Data Connector cookbook: Connect to tables in Amazon Redshift.

The Spice Cookbook includes 75 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.5.2, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.5.2 image:

docker pull spiceai/spiceai:1.5.2

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is also now available in the AWS Marketplace!

What's Changed

Dependencies

No major dependency updates.

Changelog

fixes for databricks OpenAI compatibility (#6629) by @Jeadie in #6629
Update spicepod.schema.json (#6632) by @app/github-actions in #6632
Remove 'stream_options' from databricks LLMs (#6637) by @Jeadie in #6637
Move retry and rate limiting logic for Amazon bedrock out of embeddings. (#6626) by @Jeadie in #6626
Disable Metal precomplation in integration_llms.yml (#6649) by @Jeadie in #6649
fix: Hadoop integration test (#6660) by @peasee in #6660
feat: Add Hadoop Catalog Data Component (#6658) by @peasee in #6658
update datafusion-table-providers to latest spiceai tag (#6661) by @mach-kernel in #6661
feat: Add Hadoop Catalog connectors for Iceberg (#6659) by @peasee in #6659
Make FullTextSearchExec robust to RecordBatch column ordering. (#6675) by @Jeadie in #6675
Make 'runtime-object-store' crate (#6674) by @Jeadie in #6674
fix: Support include for Iceberg (#6663) by @peasee in #6663
feat: Add Hadoop TPCH benchmark (#6678) by @peasee in #6678
feat: Add Hadoop metadata_path parameter (#6680) by @peasee in #6680
fix: Automatically infer Hadoop warehouse scheme (#6681) by @peasee in #6681
Amazon Bedrock, specifically Nova models (#6673) by @Jeadie in [#6673](https://github.com/spiceai/spiceai/pull/6673
fix perplexity_auth_token parameters for web_search (#6685) by @Jeadie in #6685
Fix AWS Auth issue (#6699) by @Advayp in #6699
Limit Concurrent Requests for GitHub (#6672) by @Advayp in #6672
Add runtime parameter to enable more permissive parquet reading when page indexes are missing (#6716) by @phillipleblanc in #6716
Improve Flight REPL error messages (#6696) by @lukekim in #6696
Fixes from search tests (#6710) by @Jeadie in #6710

Spice v1.0.3 (Feb 10, 2025)

February 10, 2025 · 3 min read

Phillip LeBlanc

Co-Founder and CTO of Spice AI

Announcing the release of Spice v1.0.3 🛠️

Spice v1.0.3 provides several bug fixes, including a fix for the initial data load period when a retention policy has been set, and a new unsupported_type_action: string parameter to auto-convert unsupported types to strings.

Highlights in v1.0.3

PostgreSQL Data Connector: New unsupported_type_action: string parameter that auto-converts unsupported types such as JSONB to strings.

Contributors

@phillipleblanc
@Sevenannn
@sgrebnov
@peasee
@Jeadie
@lukekim

Breaking Changes

No breaking changes.

Cookbook Updates

Updated Kubernetes Deployment Recipe
Updated Data Retention Recipe

Upgrading

To upgrade to v1.0.3, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.0.3 image:

docker pull spiceai/spiceai:1.0.3

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

No major dependency changes.

Changelog

- For local models, use 'content=""' instead of None by @Jeadie and @phillipleblanc in https://github.com/spiceai/spiceai/pull/4646
- Perplexity Sonar LLM component by @Jeadie and @lukekim in https://github.com/spiceai/spiceai/pull/4673
- Update async openai fork & support reasoning effort parameter by @Sevenannn and @phillipleblanc in https://github.com/spiceai/spiceai/pull/4679
- Web search tool by @Jeadie and @lukekim in https://github.com/spiceai/spiceai/pull/4687
- Setup tpc-extension by @ewgenius and @phillipleblanc in https://github.com/spiceai/spiceai/pull/4690
- fix: Use PostgreSQL interval style for Spice.ai by @peasee and @phillipleblanc in https://github.com/spiceai/spiceai/pull/4716
- Fix spice upgrade command by @Sevenannn and @sgrebnov in https://github.com/spiceai/spiceai/pull/4699
- Fix bug: Ensure refresh only retrieves data within the retention period by @sgrebnov and @phillipleblanc in https://github.com/spiceai/spiceai/pull/4717
- Implement unsupported_type_action: string for Postgres JSONB support by @phillipleblanc in https://github.com/spiceai/spiceai/pull/4719
- Fix the get latest release logic by @Sevenannn and @phillipleblanc in https://github.com/spiceai/spiceai/pull/4721
- add 'accelerated_refresh' to 'spice trace' allowlist by @Jeadie and @phillipleblanc in https://github.com/spiceai/spiceai/pull/4711
- Update version to 1.0.3 by @phillipleblanc in https://github.com/spiceai/spiceai/pull/4731
- Truncate embedding columns within sampling tool by @Jeadie in https://github.com/spiceai/spiceai/pull/4722
- Validate primary key columns during accelerated dataset initialization by @sgrebnov in https://github.com/spiceai/spiceai/pull/4736

**Full Changelog**: https://github.com/spiceai/spiceai/compare/v1.0.2...v1.0.3

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Slack or by email to get involved.

Twitter: @spice_ai
Slack: spiceai.org/slack
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: hey@spice.ai

Spice v1.0-stable (Jan 20, 2025)

January 20, 2025 · 11 min read

William Croxson

Senior Software Engineer at Spice AI

🎉 After 47 releases, Spice.ai OSS has reached production readiness with the 1.0-stable milestone!

The core runtime and features such as query federation, query acceleration, catalog integration, search and AI-inference have all graduated to stable status along with key component graduations across data connectors, data accelerators, catalog connectors, and AI model providers.

Highlights in v1.0-stable

Stable Data Connectors: The following data connectors have graduated to Stable:
- Delta Lake
- MySQL
- Dremio
- PostgreSQL
- Databricks (mode: delta_lake)
- DuckDB
- S3
Stable Data Accelerators: The following data accelerators have graduated to Stable:
- DuckDB
- Arrow
Unity Catalog Connector: Graduated to Stable.
Databricks (mode: spark_connect) Data Connector: Graduated to Beta.
Beta Catalog Connectors: The Iceberg and Databricks catalog connectors graduated to Beta.
OpenAI Model & Embeddings Provider: Graduated to Release Candidate (RC).
Alpha Model Providers: The Anthropic and xAI (Grok) model providers graduated to Alpha.

Breaking Changes

Default Runtime Version: The CLI will install the GPU accelerated AI-capable Runtime by default (if supported), when running spice install or spice run. To force-install the non-GPU version, run spice install ai --cpu.
Default OpenAI Model: The default OpenAI model has updated to gpt-4o-mini.
Identifier Normalization: Unquoted identifiers such as table names are no longer normalized to lowercase. Identifiers will now retain their exact case as provided.
Sandboxed Docker Image: The Runtime Docker Image now runs the spiced process as the nobody user in a minimal chroot sandbox.
Insecure S3 and ABFS endpoints: The S3 and ABFS connectors now enforce insecure endpoint checks, preventing HTTP endpoints unless allow_http is explicitly enabled. Refer to the documentation for details.

Dependencies

No major dependency changes.

Upgrading

To upgrade to v1.0.0, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.0.0 image:

docker pull spiceai/spiceai:1.0.0

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

Contributors

@peasee
@ewgenius
@Jeadie
@Sevenannn
@lukekim
@phillipleblanc
@sgrebnov

What's Changed

- feat: Update load test criteria, testoperator updates by @peasee in <https://github.com/spiceai/spiceai/pull/4311>
- Update helm for v1.0.0-rc.5 by @ewgenius in <https://github.com/spiceai/spiceai/pull/4313>
- Update spicepod.schema.json by @github-actions in <https://github.com/spiceai/spiceai/pull/4318>
- Bump version to v1.0.0, update SECURITY.md by @ewgenius in <https://github.com/spiceai/spiceai/pull/4314>
- Initial criteria for models, embeddings by @Jeadie in <https://github.com/spiceai/spiceai/pull/4223>
- Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/4321>
- Add dremio param for running load test by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4315>
- Promote Databricks (mode: delta_lake) connector to stable by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4328>
- Handle failed query in load test by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4327>
- feat: Use load test hours for baseline query sets by @peasee in <https://github.com/spiceai/spiceai/pull/4334>
- Fix typo in 1.0.0-rc.5 release notes by @ewgenius in <https://github.com/spiceai/spiceai/pull/4329>
- feat: add testoperator data consistency by @peasee in <https://github.com/spiceai/spiceai/pull/4319>
- docs: Release DuckDB connector stable by @peasee in <https://github.com/spiceai/spiceai/pull/4335>
- Fix DocumentDB -> DynamoDB by @lukekim in <https://github.com/spiceai/spiceai/pull/4339>
- Update benchmark snapshots by @github-actions in <https://github.com/spiceai/spiceai/pull/4337>
- fix: Download hits.parquet from MinIO for benchmark by @peasee in <https://github.com/spiceai/spiceai/pull/4338>
- Update openapi.json by @github-actions in <https://github.com/spiceai/spiceai/pull/4341>
- Remove evil averages by @lukekim in <https://github.com/spiceai/spiceai/pull/4343>
- Don't run builds on non-code changes by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4344>
- Remove streaming requirement from Databricks spark Beta and Spark connector Beta by @ewgenius in <https://github.com/spiceai/spiceai/pull/4345>
- Update s3 tpcds spicepods by @ewgenius in <https://github.com/spiceai/spiceai/pull/4346>
- Explicitly set required scale factor for throughput and load tests by @ewgenius in <https://github.com/spiceai/spiceai/pull/4347>
- Fix s3 tpcds dataset name by @ewgenius in <https://github.com/spiceai/spiceai/pull/4348>
- Promote Iceberg Catalog Connector to Beta by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4350>
- Update s3 clickbench benchmark snapshots by @ewgenius in <https://github.com/spiceai/spiceai/pull/4351>
- fix: DuckDB clickbench on zero results by @peasee in <https://github.com/spiceai/spiceai/pull/4349>
- Add integration test with snapshots for databricks catalog connector by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4353>
- refactor: Remove on zero results from benchmarks, add data consistency workflow by @peasee in <https://github.com/spiceai/spiceai/pull/4354>
- Fix Bug: No field named body_embedding when do vector search with refresh sql containing subset of columns by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4297>
- docs: Update roadmap by @peasee in <https://github.com/spiceai/spiceai/pull/4364>
- feat: Release accelerators stable by @peasee in <https://github.com/spiceai/spiceai/pull/4361>
- Add TPCH/TPCDS test spicepods for MySQL by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4365>
- Catch when an insecure (http) S3 and ABFS data connectors endpoint is used without specifying the `allow_http` parameter by @ewgenius in <https://github.com/spiceai/spiceai/pull/4363>
- Update ROADMAP - Iceberg catalog alpha for v1.0 by @ewgenius in <https://github.com/spiceai/spiceai/pull/4367>
- Promote databricks catalog and databricks (spark_connect) connector to beta by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4369>
- Update Roadmap - Iceberg beta by @ewgenius in <https://github.com/spiceai/spiceai/pull/4373>
- Build CUDA binaries for Linux by @Jeadie in <https://github.com/spiceai/spiceai/pull/4320>
- Promote Nvidia NIM as Alpha by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4380>
- Promote xai to alpha by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4381>
- Update stable criteria for object store based connectors by @ewgenius in <https://github.com/spiceai/spiceai/pull/4383>
- Testoperator: http consistency and overhead tests, fixes and ci by @ewgenius in <https://github.com/spiceai/spiceai/pull/4382>
- Promote S3 Data Connector to Stable by @ewgenius in <https://github.com/spiceai/spiceai/pull/4385>
- Download platform-supported CUDA binary version on Linux by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4356>
- Fix http consistency test workflow, add overhead workflow by @ewgenius in <https://github.com/spiceai/spiceai/pull/4387>
- feat: Add Postgres test spicepods by @peasee in <https://github.com/spiceai/spiceai/pull/4388>
- Fix typos + specific in model criteria; Make explicit alpha/beta tests for LLMS in `crates/llms/tests`.  by @Jeadie in <https://github.com/spiceai/spiceai/pull/4377>
- Fix federation bug for correlated subqueries of deeply nested Dremio tables by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4389>
- Fix http overhead workflow by @ewgenius in <https://github.com/spiceai/spiceai/pull/4390>
- Tweak model tests, fix embedding input by @ewgenius in <https://github.com/spiceai/spiceai/pull/4391>
- Promote Dremio to Stable quality by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4392>
- Add beta functionality tests for embedding models. by @Jeadie in <https://github.com/spiceai/spiceai/pull/4352>
- docs: Release postgres connector stable by @peasee in <https://github.com/spiceai/spiceai/pull/4398>
- Increase timeout for model response in E2E tests by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4399>
- Disable ident normalization (i.e. `SELECT MyColumn from table` works) by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4400>
- Preserve schema metadata by @ewgenius in <https://github.com/spiceai/spiceai/pull/4402>
- Make models integration tests tracing less verbose by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4403>
- Fix `cuda` feature build on Windows by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4404>
- Promote MySQL to Stable by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4406>
- docs: Release Delta Lake and Unity catalog by @peasee in <https://github.com/spiceai/spiceai/pull/4405>
- Use `gpt-4o-mini` as a default model for openai provider by @ewgenius in <https://github.com/spiceai/spiceai/pull/4410>
- Fix streaming for Openai and Anthropic by @Jeadie in <https://github.com/spiceai/spiceai/pull/4409>
- Tweak model loading and missing tool errors messages by @ewgenius in <https://github.com/spiceai/spiceai/pull/4412>
- Spice CLI: fallback to CPU build for unsupported GPU Compute Capability by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4407>
- Build Windows CUDA binaries as part of `build_and_release` workflow by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4386>
- Update docs link by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4416>
- feat: Add CPU models install escape hatch by @peasee in <https://github.com/spiceai/spiceai/pull/4419>
- Handle OpenAI API Errors by @ewgenius in <https://github.com/spiceai/spiceai/pull/4417>
- Update spice cli to use `GH_TOKEN` or `GITHUB_TOKEN` env variables when calling releases api by @ewgenius in <https://github.com/spiceai/spiceai/pull/4175>
- Implement secure sandboxing for Docker image by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4411>
- Automatically install supported CUDA binary on Windows by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4420>
- Metrics for LLMs+ embeddings by @Jeadie in <https://github.com/spiceai/spiceai/pull/4418>
- Jeadie/25 01 17/beta perf by @Jeadie in <https://github.com/spiceai/spiceai/pull/4397>
- Pass GitHub token to all CI steps calling spice run by @ewgenius in <https://github.com/spiceai/spiceai/pull/4423>
- Run the models integration tests on PRs by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4421>
- Run CUDA builds in a separate workflow by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4430>
- Promote OpenAI models and embeddings providers to RC by @ewgenius in <https://github.com/spiceai/spiceai/pull/4432>
- Update link to retrieval-augmented generation (RAG) details by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4433>
- Unity catalog should strip parameter prefix before passing parameters to delta lake factory by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4436>
- Update quickstart traces to match current version by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4435>
- Update Supported Embeddings Providers Readme section by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4434>
- Local models can stream tools by @Jeadie in <https://github.com/spiceai/spiceai/pull/4429>
- fix: Use MetricsCollector::show() for HTTP testoperator commands by @peasee in <https://github.com/spiceai/spiceai/pull/4442>
- Fix run query action by @ewgenius in <https://github.com/spiceai/spiceai/pull/4444>
- Default to AI-enabled runtime for `spice run`/`spice install` by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4443>
- Change no spicepod.yaml log to warning by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4447>
- refactor: Update Catalog Connector error messages by @peasee in <https://github.com/spiceai/spiceai/pull/4441>
- Fix panic when converting OTel metrics by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4449>
- refactor: Update model errors by @peasee in <https://github.com/spiceai/spiceai/pull/4446>
- Update spiceai/mistral.rs to silence metadata logs by @ewgenius in <https://github.com/spiceai/spiceai/pull/4452>
- fix xAI; don't use openai defaults by @Jeadie in <https://github.com/spiceai/spiceai/pull/4450>
- Improves the UX of using huggingface models by @phillipleblanc in <https://github.com/spiceai/spiceai/pull/4451>
- Add GH Workflow to test `spice ai` runtime installation by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4448>
- fix: Use specific model errors where available by @peasee in <https://github.com/spiceai/spiceai/pull/4454>
- Detect and report unsupported embedding column type during dataset registration by @sgrebnov in <https://github.com/spiceai/spiceai/pull/4456>
- Handle Errors by @Jeadie in <https://github.com/spiceai/spiceai/pull/4455>
- Catch and report negative openai_temperature error by @Sevenannn in <https://github.com/spiceai/spiceai/pull/4453>
- Clarify release check error message if it is caused by wrong GH token by @ewgenius in <https://github.com/spiceai/spiceai/pull/4458>

**Full Changelog**: <https://github.com/spiceai/spiceai/compare/v1.0.0-rc.5...v1.0.0>

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Slack or by email to get involved.

Twitter: @spice_ai
Slack: spiceai.org/slack
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: hey@spice.ai

Spice v0.15.1-alpha (July 8, 2024)

July 8, 2024 · 5 min read

Luke Kim

Founder and CEO of Spice AI

The v0.15.1-alpha minor release focuses on enhancing stability, performance, and usability. Memory usage has been significantly improved for the postgres and duckdb acceleration engines which now use stream processing. A new Delta Lake Data Connector has been added, sharing a delta-kernel-rs based implementation with the Databricks Data Connector supporting deletion vectors.

Highlights

Improved memory usage for PostgreSQL and DuckDB acceleration engines: Large dataset acceleration with PostgreSQL and DuckDB engines has reduced memory consumption by streaming data directly to the accelerated table as it is read from the source.

Delta Lake Data Connector: A new Delta Lake Data Connector has been added for using Delta Lake outside of Databricks.

ODBC Data Connector Streaming: The ODBC Data Connector now streams results, reducing memory usage, and improving performance.

GraphQL Object Unnesting: The GraphQL Data Connector can automatically unnest objects from GraphQL queries using the unnest_depth parameter.

Breaking Changes

None.

Contributors

What's Changed

Dependencies

The MySQL, PostgreSQL, SQLite and DuckDB DataFusion TableProviders developed by Spice AI have been donated to the datafusion-contrib/datafusion-table-providers community repository.

From the v0.15.1-alpha release, a new dependency is taken on datafusion-contrib/datafusion-table-providers

Commits

Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/1842
Update ROADMAP.md - Remove v0.15.0-alpha roadmap items. by @digadeesh in https://github.com/spiceai/spiceai/pull/1843
update helm chart for v0.15.0-alpha by @y-f-u in https://github.com/spiceai/spiceai/pull/1845
update cargo.toml and version.txt to 0.15.1-alpha (for next release) by @digadeesh in https://github.com/spiceai/spiceai/pull/1844
Fix check for outdated Cargo.lock & update Cargo.lock by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1846
Add Debezium to README by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1847
use snmalloc as global allocator by @y-f-u in https://github.com/spiceai/spiceai/pull/1848
Various improvements for mistral.rs by @Jeadie in https://github.com/spiceai/spiceai/pull/1831
Enable streaming for accelerated tables refresh (common logic) by @sgrebnov in https://github.com/spiceai/spiceai/pull/1863
Use in-memory DB pool for DuckDB functions by @Jeadie in https://github.com/spiceai/spiceai/pull/1849
Generate Spicepod JSON Schema by @ewgenius in https://github.com/spiceai/spiceai/pull/1865
Update http param names by @Jeadie in https://github.com/spiceai/spiceai/pull/1872
Replace DuckDB, PostgreSQL, Sqlite and MySQL providers with the datafusion-table-providers crate by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1873
Remove more dead code moved to datafusion-table-providers by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1874
feat: Optimize ODBC for streaming results by @peasee in https://github.com/spiceai/spiceai/pull/1862
Fix how models uses secrets by @Jeadie in https://github.com/spiceai/spiceai/pull/1875
fix: Add support for varying duplicate columns behavior in GraphQL unnesting by @peasee in https://github.com/spiceai/spiceai/pull/1876
fix: Remove GraphQL duplicate rename support by @peasee in https://github.com/spiceai/spiceai/pull/1877
fix: Remove Overwrite GraphQL duplicates behavior by @peasee in https://github.com/spiceai/spiceai/pull/1882
fix: Use tokio mpsc channels for ODBC streaming by @peasee in https://github.com/spiceai/spiceai/pull/1883
Upgrade table providers to enable DuckDB streaming write by @sgrebnov in https://github.com/spiceai/spiceai/pull/1884
Update ROADMAP.md - Add debezium (alpha) to connector list. by @digadeesh in https://github.com/spiceai/spiceai/pull/1880
Allow defining user for mysql data connector via secrets by @sgrebnov in https://github.com/spiceai/spiceai/pull/1886
Replace delta-rs with delta-kernel-rs and add new delta data connector. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1878
Update README images by @lukekim in https://github.com/spiceai/spiceai/pull/1890
Handle deletion vectors for delta tables by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1891
Rename delta to delta_lake by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1892
Add where is the AI to the FAQ. by @lukekim in https://github.com/spiceai/spiceai/pull/1885
update df table providers rev version by @y-f-u in https://github.com/spiceai/spiceai/pull/1889
Enable other cloud providers for delta_lake integration by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1893
Add CLI parameters for logging into Databricks with Azure/GCP cloud storage by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1894
Bump zerovec from 0.10.2 to 0.10.4 by @dependabot in https://github.com/spiceai/spiceai/pull/1896
Add 'Content-Type' to metrics exporter to be prometheus exposition format compliant by @sgrebnov in https://github.com/spiceai/spiceai/pull/1897
Update enforce-labels.yml so it accepts depdenabot updates with kind/… by @digadeesh in https://github.com/spiceai/spiceai/pull/1898

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.15.0-alpha...v0.15.1-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Slack or by email to get involved.

Twitter: @spice_ai
Slack: spiceai.org/slack
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: hey@spice.ai

Spice v0.14.1-alpha (June 24, 2024)

June 24, 2024 · 5 min read

Luke Kim

Founder and CEO of Spice AI

The v0.14.1-alpha release is focused on quality, stability, and type support with improvements in PostgreSQL, DuckDB, and GraphQL data connectors.

Highlights

PostgreSQL acceleration and data connector: Support for Composite Types and UUID data types.
DuckDB acceleration and data connector: Support for LargeUTF8 and DuckDB functions.
GraphQL data connector: Improved error handling on invalid query syntax.
Refresh SQL: Improved stability when overwriting STRUCT data types.

Breaking Changes

None.

New Contributors

@phungleson made their first contribution in https://github.com/spiceai/spiceai/pull/1750
@peasee made their first contribution in https://github.com/spiceai/spiceai/pull/1769

Contributors

@lukekim
@y-f-u
@ewgenius
@phillipleblanc
@Jeadie
@sgrebnov
@gloomweaver
@phungleson
@peasee
@digadeesh

What's Changed

Dependencies

No major dependency updates.

Commits

Update Helm to v0.14.0-alpha by @sgrebnov in https://github.com/spiceai/spiceai/pull/1720
Update version to 0.14.1-alpha by @sgrebnov in https://github.com/spiceai/spiceai/pull/1721
Use spiceai/async-openai to solve Deserialize issue in v1/embed by @Jeadie in https://github.com/spiceai/spiceai/pull/1707
Add greatest least user defined functions by @y-f-u in https://github.com/spiceai/spiceai/pull/1722
default timeunit to be seconds when time column is a numeric column by @y-f-u in https://github.com/spiceai/spiceai/pull/1727
use system conf to construct dns resolver by @y-f-u in https://github.com/spiceai/spiceai/pull/1728
fix a bug that dataset refresh api does not work for table with schema by @y-f-u in https://github.com/spiceai/spiceai/pull/1729
Move secret crate to runtime module by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1723
Return schema in get_flight_info_simple by @gloomweaver in https://github.com/spiceai/spiceai/pull/1724
Refactor vector search component of v1/assist into a VectorSearch struct by @Jeadie in https://github.com/spiceai/spiceai/pull/1699
Update ROADMAP.md. Fix a broken link for the "Get in touch" link. by @digadeesh in https://github.com/spiceai/spiceai/pull/1725
Secret keys in params should be case insensitive by @ewgenius in https://github.com/spiceai/spiceai/pull/1737
expose error log when refresh encountered some issue, also add more debug logs by @y-f-u in https://github.com/spiceai/spiceai/pull/1739
Support Struct in PostgreSQL accelerator by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1733
rewrite refresh append update dedup logic using arrow comparators by @y-f-u in https://github.com/spiceai/spiceai/pull/1743
Add health checks when loading (llms, embeddings) by @Jeadie in https://github.com/spiceai/spiceai/pull/1738
Support DuckDB function in DuckDB datasets by @Jeadie in https://github.com/spiceai/spiceai/pull/1742
Update version of spiceai/duckdb-rs, support LargeUTF8 by @Jeadie in https://github.com/spiceai/spiceai/pull/1746
Split refresh into coordination and execution layers by @sgrebnov in https://github.com/spiceai/spiceai/pull/1744
bump duckdb rs git sha to resolve duckdb incorrect null value issue by @y-f-u in https://github.com/spiceai/spiceai/pull/1747
cargo.lock file update with #1747 duckdb-rs sha by @y-f-u in https://github.com/spiceai/spiceai/pull/1748
Fix error when GraphQL error locations is missing by @phungleson in https://github.com/spiceai/spiceai/pull/1750
Tweak refresh scheduling logic by @sgrebnov in https://github.com/spiceai/spiceai/pull/1749
Ensure tonic package is in duckdb feature by @Jeadie in https://github.com/spiceai/spiceai/pull/1756
Change tonic::async_trait -> async_trait::async_trait by @Jeadie in https://github.com/spiceai/spiceai/pull/1757
Streaming in v1/chat/completion by @Jeadie in https://github.com/spiceai/spiceai/pull/1741
Add refresh_retry_enabled/max_attempts acceleration params by @sgrebnov in https://github.com/spiceai/spiceai/pull/1753
Implement refresh retry based on fibonacci backoff (not enabled) by @sgrebnov in https://github.com/spiceai/spiceai/pull/1752
Add VSCode debug target to debug runtime benchmark test by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1760
update spiceai datafusion to include more unparser rules by @y-f-u in https://github.com/spiceai/spiceai/pull/1764
Show UUID types as String instead of base64 binary. by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1767
docs: Add linux contributor guide for setup by @peasee in https://github.com/spiceai/spiceai/pull/1769
Do not expose connection url on object store error by @ewgenius in https://github.com/spiceai/spiceai/pull/1761
Support secrets in llm and embeddings params by @ewgenius in https://github.com/spiceai/spiceai/pull/1770
Bump github.com/hashicorp/go-retryablehttp from 0.7.1 to 0.7.7 by @dependabot in https://github.com/spiceai/spiceai/pull/1775
Update ROADMAP.md with latest roadmap changes for v0.15.0 by @digadeesh in https://github.com/spiceai/spiceai/pull/1773
Update acknowledgements by @github-actions in https://github.com/spiceai/spiceai/pull/1776
Strip kwarg '=' in DuckDB function parsing by @Jeadie in https://github.com/spiceai/spiceai/pull/1777

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.14.0-alpha...v0.14.1-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Slack or by email to get involved.

Twitter: @spice_ai
Slack: spiceai.org/slack
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: hey@spice.ai

Spice v0.11-alpha (April 15, 2024)

April 15, 2024 · 4 min read

Sergei Grebnov

Senior Software Engineer at Spice AI

The Spice v0.11-alpha release significantly improves the Databricks data connector with Databricks Connect (Spark Connect) support, adds the DuckDB data connector, and adds the AWS Secrets Manager secret store. In addition, enhanced control over accelerated dataset refreshes, improved SSL security for MySQL and PostgreSQL connections, and overall stability improvements have been added.

Highlights in v0.11-alpha

DuckDB data connector: Use DuckDB databases or connections as a data source.

AWS Secrets Manager Secret Store: Use AWS Secrets Managers as a secret store.

Custom Refresh SQL: Specify a custom SQL query for dataset refresh using refresh_sql.

Dataset Refresh API: Trigger a dataset refresh using the new CLI command spice refresh or via API.

Expanded SSL support for Postgres: SSL mode now supports disable, require, prefer, verify-ca, verify-full options with the default mode changed to require. Added pg_sslrootcert parameter for setting a custom root certificate and the pg_insecure parameter is no longer supported.

Databricks Connect: Choose between using Spark Connect or Delta Lake when using the Databricks data connector for improved performance.

Improved SSL support for Postgres: ssl mode now supports disable, require, prefer, verify-ca, verify-full options with default mode changed to require. Added pg_sslrootcert parameter to allow setting custom root cert for postgres connector, pg_insecure parameter is no longer supported as redundant.

Internal architecture refactor: The internal architecture of spiced was refactored to simplify the creation data components and to improve alignment with DataFusion concepts.

New Contributors

@edmondop's first contribution github.com/spiceai/spiceai/pull/1110!

Contributors

@phillipleblanc
@Jeadie
@ewgenius
@sgrebnov
@y-f-u
@lukekim
@digadeesh
@Sevenannn
@gloomweaver
@ahirner

New in this release

Fixes MySQL NULL values by @gloomweaver in https://github.com/spiceai/spiceai/pull/1067
Fixes PostgreSQL NULL values for NUMERIC by @gloomweaver in https://github.com/spiceai/spiceai/pull/1068
Adds Custom Refresh SQL support by @lukekim and @phillipleblanc in https://github.com/spiceai/spiceai/pull/1073
Adds DuckDB data connector by @Sevenannn in https://github.com/spiceai/spiceai/pull/1085
Adds AWS Secrets Manager secret store by @sgrebnov in https://github.com/spiceai/spiceai/pull/1063, https://github.com/spiceai/spiceai/pull/1064
Adds Dataset refresh API by @sgrebnov in https://github.com/spiceai/spiceai/pull/1075, https://github.com/spiceai/spiceai/pull/1078, https://github.com/spiceai/spiceai/pull/1083
Adds spice refresh CLI command for dataset refresh by @sgrebnov in https://github.com/spiceai/spiceai/pull/1112
Adds TEXT and DECIMAL types support and properly handling NULL for MySQL by @gloomweaver in https://github.com/spiceai/spiceai/pull/1067
Adds MySQL DATE and TINYINT types support for MySQL by @ewgenius in https://github.com/spiceai/spiceai/pull/1065
Adds ssl_rootcert_path parameter for MySql data connector by @ewgenius in https://github.com/spiceai/spiceai/pull/1079
Adds LargeUtf8 support and explicitly passing the schema to data accelerator SqlTable by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1077
Adds Ability to configure data retention for accelerated datasets by @y-f-u in https://github.com/spiceai/spiceai/issues/1086
Adds Custom SSL certificates for PostgreSQL data connector by @ewgenius in https://github.com/spiceai/spiceai/pull/1081
Adds Conditional compile for Dremio by @ahirner in https://github.com/spiceai/spiceai/pull/1100
Adds Ability for Databricks connector to use spark-connect-rs as the mechanism to execute queries against the Databricks by @edmondop in https://github.com/spiceai/spiceai/pull/1110
Adds Ability to choose between Spark Connect and Delta Lake implementation for Databricks by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1115/files
Updates Databricks login parameters by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1113
Updates Architecture to simplify data components development by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1040
Updates Improved readability of GitHub Actions test job names by @lukekim in https://github.com/spiceai/spiceai/pull/1071
Updates Upgrade Arrow, DataFusion, Tonic dependencies by @phillipleblanc in https://github.com/spiceai/spiceai/pull/1097
Updates Handling non-string spicepod params by @ewgenius in https://github.com/spiceai/spiceai/pull/1098
Updates Optional features compile: duckdb, databricks by @ahirner in https://github.com/spiceai/spiceai/pull/1100
Updates Helm version to 0.1.3 by @Jeadie in https://github.com/spiceai/spiceai/pull/1120
Removes pg_insecure parameter support from Postgres by ewgenius in https://github.com/spiceai/spiceai/pull/1081

Full Changelog: https://github.com/spiceai/spiceai/compare/v0.10.2-alpha...v0.11.0-alpha

Resources

Community

Spice.ai started with the vision to make AI easy for developers. We are building Spice.ai in the open and with the community. Reach out on Slack or by email to get involved.

Twitter: @spice_ai
Slack: spiceai.org/slack
Telegram: Spice AI Discussion
Reddit: https://www.reddit.com/r/spiceai
Email: hey@spice.ai

What's New in v2.0.0-rc.5​

Cayenne Improvements​

Mutual TLS (mTLS)​

MongoDB Change Streams​

CDC Improvements​

PostgreSQL DML Support​

Snowflake DML Support​

Arrow Primary Key Upserts​

DuckLake Promoted to Beta​

User-Defined Functions​

Spatial SQL UDFs​

On-Demand Dataset Loading​

Unified Query Cancellation​

Dynamic HTTP Connector​

HTTP Rate-Control Persistence​

refresh_mode: snapshot​

Storage-Profile Accelerator Tuning​

Provider-Aware LLM Prompt Caching​

Responses API Improvements​

Distributed Cluster Improvements​

Caching & Search​

Security Improvements​

SQL, Query, and Developer Experience​

Connector Bug Fixes​

Dependency Updates​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​

What's New in v2.0.0-rc.4​

Elasticsearch Data Connector (Alpha, Spice.ai Enterprise)​

PostgreSQL Native Replication via WAL​

Multi-vector Embeddings with MaxSim (Late Interaction)​

Rerank UDTF for Hybrid Search​

New Secret Stores: HashiCorp Vault and Azure Key Vault​

DuckDB Vector Engine​

New and Promoted Connectors​

DynamoDB Write Support (DML)​

MCP Streamable HTTP Transport​

Security Improvements​

Developer Experience Improvements​

OpenTelemetry Improvements​

Full-text Search Performance​

SQL and Query Engine​

Dependency Updates​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​

What's New in v1.5.2​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Dependencies​

Changelog​

Highlights in v1.0.3​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Dependencies​

Changelog​

Resources​

Community​

Highlights in v1.0-stable​

Breaking Changes​

Dependencies​

Upgrading​

Contributors​

What's Changed​

Resources​

Community​

Highlights​

What's New in v2.0.0-rc.5

Cayenne Improvements

Mutual TLS (mTLS)

MongoDB Change Streams

CDC Improvements

PostgreSQL DML Support

Snowflake DML Support

Arrow Primary Key Upserts

DuckLake Promoted to Beta

User-Defined Functions

Spatial SQL UDFs

On-Demand Dataset Loading

Unified Query Cancellation

Dynamic HTTP Connector

HTTP Rate-Control Persistence

`refresh_mode: snapshot`

Storage-Profile Accelerator Tuning

Provider-Aware LLM Prompt Caching

Responses API Improvements

Distributed Cluster Improvements

Caching & Search

Security Improvements

SQL, Query, and Developer Experience

Connector Bug Fixes

Dependency Updates

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Changelog

What's New in v2.0.0-rc.4

Elasticsearch Data Connector (Alpha, Spice.ai Enterprise)

PostgreSQL Native Replication via WAL

Multi-vector Embeddings with MaxSim (Late Interaction)

Rerank UDTF for Hybrid Search

New Secret Stores: HashiCorp Vault and Azure Key Vault

DuckDB Vector Engine

New and Promoted Connectors

DynamoDB Write Support (DML)

MCP Streamable HTTP Transport

Security Improvements

Developer Experience Improvements

OpenTelemetry Improvements

Full-text Search Performance

SQL and Query Engine

Dependency Updates

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Changelog

What's New in v1.5.2

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Dependencies

Changelog

Highlights in v1.0.3

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Dependencies

Changelog

Resources

Community

Highlights in v1.0-stable

Breaking Changes

Dependencies

Upgrading

Contributors

What's Changed

Resources

Community

Highlights