Datavor v3.1: SQL Server CDC and streaming into Snowflake

Real-time change data capture is one of Datavor's most-used features — tail a source database's transaction log, stream every insert, update, and delete to a target as it happens. Until now, that meant Postgres (via logical WAL) and MySQL (via binlog). v3.1 widens both ends of the pipe: a new source and a smarter target.

SQL Server joins as a CDC source

SQL Server has had native Change Data Capture built in for years, and v3.1 puts Datavor on top of it. Enable CDC on your database and tables once, and Datavor can stream changes from SQL Server just like it does from Postgres and MySQL.

Setup is three steps, all standard SQL Server administration:

-- 1. Enable CDC on the database (once, as sysadmin)
EXEC sys.sp_cdc_enable_db;

-- 2. Enable CDC per table
EXEC sys.sp_cdc_enable_table
  @source_schema = N'dbo',
  @source_name  = N'orders',
  @role_name    = NULL;

With the SQL Server Agent capture job running, that's it. When you start a CDC stream, Datavor reads changes the way Microsoft recommends — by polling the CDC table-valued functions, not by reading the underlying change tables directly:

SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_orders(
  @from_lsn, @to_lsn, N'all update old'
) ORDER BY __$start_lsn, __$seqval;

There are good reasons to go through the functions rather than the raw change tables:

The (@from_lsn, @to_lsn) bracket lets Datavor resume cleanly from the last checkpoint — no gaps, no replays
N'all update old' returns both halves of an UPDATE (the old row and the new row), which Datavor pairs into a single change event
The function is Microsoft's stable, supported interface — the change-table layout can shift between SQL Server versions, but the function signature doesn't

Datavor polls on a configurable interval and resumes from the last LSN it successfully processed. SQL Server is now a first-class CDC source alongside Postgres and MySQL — see the full setup on the CDC page.

Streaming CDC into Snowflake, without the bill

The other half of v3.1 is about where change data lands. Snowflake is a fantastic warehouse and a terrible place to write one row at a time — every single-row INSERT is expensive, and a naive CDC stream that applies one change per statement will burn credits and crawl. So Datavor's Snowflake target doesn't do that.

Instead, it adaptively batches changes based on volume and applies each batch with a single set-based operation:

Small batches (under ~100 rows) go in as a multi-row INSERT INTO <temp> VALUES (…),(…)
Larger batches stage a compressed CSV — PUT file://… with AUTO_COMPRESS=TRUE, then COPY INTO <temp>
Either way, the batch is applied to the target table with one MERGE INTO target USING temp
Deletes resolve as a set-based DELETE … WHERE (pk_cols) IN (…)

Everything chunks at 10,000 rows to stay within Snowflake's guidance on MERGE source size, and it all runs over the standard Snowflake Node SDK on a single warehouse session. The batch window adapts to load: quiet streams flush promptly so freshness stays high, busy streams batch larger so throughput stays efficient.

⬡ THE PAYOFF

You get near-real-time freshness in Snowflake without the per-row credit burn — which means streaming CDC into a columnar warehouse actually makes economic sense, not just technical sense.

Two ends of the same pipe

Taken together, v3.1 is a reach release. More sources can feed Datavor's CDC engine, and one of the most popular warehouse targets now accepts streaming changes in a way that respects how it's actually billed.

Fivetran moves your data.
Datavor understands it — and now reaches further.

Both features ship in v3.1, free to use. Dig into the mechanics on the CDC page, or just tell your AI: "stream the orders table from SQL Server to Snowflake with CDC." That's the whole setup.

SQL Server joins as a CDC source

Streaming CDC into Snowflake, without the bill

Two ends of the same pipe

Try it in one command.