DuckDB Data Connector Deployment Guide
Production operating guide for the DuckDB data connector (used to federate queries against an existing DuckDB database file).
Authentication & Secrets
DuckDB is an embedded engine; the connector reads a local DuckDB database file. No network authentication is involved.
| Parameter | Description |
|---|---|
duckdb_open | Absolute path to the DuckDB database file. If omitted, uses in-memory mode. |
Protect the DuckDB file with filesystem permissions. Store it on encrypted storage (LUKS/dm-crypt, EBS encryption, etc.) for data-at-rest protection. For data loaded from cloud object stores inside DuckDB, configure AWS/Azure/GCS credentials via DuckDB extensions rather than Spice parameters.
Resilience Controls
File Concurrency
DuckDB supports a single writer with many readers per database file. If the file is shared with another process that holds a write lock, the connector returns an I/O error on open. Co-locate the writer and the Spice reader on the same host, or use DuckDB's read-only mode (access_mode: read_only) when federating a file produced by an upstream ETL job.
Crash Recovery
DuckDB's WAL provides crash recovery for any process that wrote to the file. The Spice connector does not itself write (the data connector is read-only; the DuckDB accelerator is distinct and handles write paths).
Capacity & Sizing
- Memory: DuckDB's default memory limit is self-managed based on system memory. For constrained environments, set a
memory_limitpragma via the connection string. - Disk: Plan for 1.5–2× the raw data size to accommodate DuckDB's internal compression, WAL, and temporary spill files during query execution.
- Temporary spill: Large queries spill to DuckDB's temp directory; ensure adequate disk and set
temp_directoryto a fast local volume if the default (same as the database file) is on slow storage.
Metrics
The DuckDB connector does not register connector-specific instruments. Monitor via Spice's query metrics (query_duration_ms, query_processed_rows). See Component Metrics for general configuration.
For DuckDB-internal metrics, use DuckDB's duckdb_memory() and pragma database_size via a SQL query against the connector.
Task History
DuckDB queries participate in task history through DataFusion's execution-plan spans.
Known Limitations
- Read-only via the data connector: For a writable, Spice-managed DuckDB, use the DuckDB accelerator instead.
- Single-writer: A DuckDB file cannot be written by two processes concurrently. Coordinate writers out-of-band.
- Version compatibility: DuckDB files are tied to the DuckDB binary version. Upgrading DuckDB in Spice may require regenerating older database files.
Troubleshooting
| Symptom | Likely cause | Resolution |
|---|---|---|
IO Error: Could not set lock on file | Another process holds the DuckDB write lock. | Ensure only one writer; open in read-only mode if Spice should not hold a write lock. |
Catalog Error: Table ... does not exist | Table name mismatch or database not at the expected path. | Query SELECT * FROM information_schema.tables via the connector to list tables. |
| Queries spill aggressively, slow performance | Working set exceeds memory. | Increase system memory or set a smaller batch size; direct temp to faster storage. |
Serialization Error: Failed to deserialize ... database ... not a valid database | DuckDB version mismatch. | Upgrade/downgrade Spice's DuckDB version to match the file producer. |
