Sub-second replication driven by Postgres logical WAL, MySQL binlog, and SQL Server's native CDC. Tail your source's transaction log, stream INSERTs, UPDATEs, and DELETEs to any target as they happen. Set up in one sentence through any MCP-compatible AI tool.
CDC isn't always the right answer. But when freshness matters — dashboards, ML features, alerting, replication for failover — batch sync's overnight or hourly cadence stops being acceptable.
The moment a row changes in your source, it's in your target. Dashboards reflect reality. Alerts fire on the change that just happened, not the one from this morning.
Hourly batches mean your analytics is between 0 and 60 minutes behind. Daily syncs mean 0 to 24 hours behind. Some questions can't be answered with stale data — fraud detection, capacity planning, ops monitoring.
Batch syncs based on updated_at miss rows where updates rolled back, or where the same row updated twice between batches. WAL/binlog captures every transaction, in order, exactly once.
For Postgres and MySQL, Datavor doesn't poll your tables or install triggers — it reads the same transaction log the database uses for its own crash-recovery and replication (Postgres's write-ahead log, MySQL's binlog) and converts the raw entries into MCP-friendly change events. For SQL Server, it polls Microsoft's native CDC functions on an interval — the recommended, version-stable path. Either way, you get ordered change events with a resumable checkpoint.
Open a replication slot (Postgres) or register as a replica (MySQL). The source streams new transactions to Datavor over a long-lived connection.
WAL records and binlog events use database-specific binary formats. Datavor parses each into a normalized JSON event with table, op, before, after, and LSN.
Events are batched and applied to the target in transaction order. UPSERT for INSERTs, conditional UPDATE for UPDATEs, DELETE WHERE for DELETEs.
Datavor records the last LSN successfully applied. If the stream restarts, it resumes from exactly there — no duplicates, no gaps.
Snowflake hates row-by-row writes — every single-row INSERT is expensive, and a naïve CDC stream that applies one change at a time will melt your credits and crawl. Datavor's Snowflake target doesn't do that.
Instead, it adaptively batches changes based on volume and lands them with a single set-based apply. Small batches go in as a multi-row INSERT into a temp table; larger ones stage a compressed CSV via PUT + COPY INTO. Either way, the batch is then applied to the target with one MERGE INTO — and deletes resolve as a set-based DELETE … WHERE pk IN (…).
You get near-real-time freshness without the per-row credit burn, so streaming CDC into Snowflake actually makes economic sense. Everything runs over the standard Snowflake SDK on a single warehouse session, chunked to stay within Snowflake's MERGE source-size guidance — quiet streams flush promptly, busy streams batch larger.
Datavor reads transaction logs that your source database already produces — but most databases ship with logging optimized for crash recovery, not replication. A small config change unlocks CDC. We tell you exactly what.
postgresql.conf. Restart required.
wal_level = logical max_replication_slots = 4 -- room for Datavor + others max_wal_senders = 4
-- One-off, as superuser CREATE ROLE datavor_cdc REPLICATION LOGIN PASSWORD '****'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO datavor_cdc;
host replication datavor_cdc 10.0.0.0/8 md5That's it. Datavor creates the replication slot automatically when you call
start_cdc.
my.cnf under [mysqld]. Restart required.
server-id = 1 log_bin = mysql-bin binlog_format = ROW binlog_row_image = FULL expire_logs_days = 7 -- keep enough for catch-up
CREATE USER 'datavor_cdc'@'%' IDENTIFIED BY '****'; GRANT REPLICATION SLAVE, REPLICATION CLIENT, SELECT ON *.* TO 'datavor_cdc'@'%';
SHOW VARIABLES LIKE 'log_bin'; -- expect: ONDatavor registers as a replica when
start_cdc runs.
sysadmin.
EXEC sys.sp_cdc_enable_db;
EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'orders', @role_name = NULL;
start_cdc runs, Datavor polls SQL Server's CDC functions (cdc.fn_cdc_get_all_changes_*) on a configurable interval and resumes from the last LSN.
-- New in v3.1: SQL Server is now a CDC source, -- not just a sync source/target.
These are measured numbers from real CDC streams, not synthetic benchmarks. Conditions stated honestly below.
Conditions: Postgres 16 source on AWS db.m6i.large, target on db.m6i.xlarge, same VPC. Workload: ~5k transactions/sec on a 12-column orders table. Datavor v3.1 on a single npx datavor process. p99 spikes correspond to source-side checkpoint flushes. Your numbers will vary with table width, network, and concurrent load.
CDC is exposed as three MCP tools. Tell your AI to start a stream and it does — no YAML, no UI clicking. The Web UI shows live status; the tools are the control surface. Full reference in the docs.
| Tool | Purpose |
|---|---|
start_cdc |
Open a CDC stream from a source database's WAL/binlog to a target. Creates the replication slot or registers as a replica; resumes from saved LSN if one exists. |
stop_cdc |
End a running CDC stream cleanly. Flushes pending events, saves the LSN checkpoint, releases the replication slot. |
cdc_status |
Inspect a stream: events processed, current source-to-target lag, last event timestamp, apply failures. Same data the Web UI's CDC monitor shows. |
CDC adds complexity. Anyone selling you "set and forget" replication is lying. Datavor handles these well — but you should know about them.
If your CDC consumer falls behind, Postgres retains WAL until you catch up. Run out of disk before catching up → outage.
cdc_status always shows slot state.
If expire_logs_days rolls binlogs faster than you consume, your stream's checkpoint becomes invalid and the stream can't resume.
get_suggestions.
Adding a column on the source mid-stream means later events have data the target doesn't have a place to put.
If apply fails on some rows (FK violations, type mismatches), naïve CDC either stops or silently drops events. Both are bad.
Don't want to watch a dashboard? Datavor's External Alerting pushes cdc_error and cdc_stopped events to Slack or any webhook the moment a stream has trouble.
| Capability | Datavor | Fivetran | Airbyte | Debezium |
|---|---|---|---|---|
| Sub-second latency | ✓ | ~min | ~min | ✓ |
| Postgres WAL · MySQL binlog · SQL Server CDC | ✓ | ✓ | ✓ | ✓ |
| Set up via natural language | ✓ | — | — | — |
| Runs locally — no cloud account needed | ✓ | — | OSS | ✓ |
| No Kafka required | ✓ | ✓ | ✓ | — |
| Schema-change auto-suggestion | ✓ | ✓ | partial | — |
| Per-event fault tolerance + quarantine | ✓ | — | — | — |
| Pricing scales with rows? (it shouldn't) | No | MAR-based | rows | No |
"Stream the orders table from prod-pg to analytics-pg with CDC, starting now." That's the whole setup, once Datavor is installed.