Home Product Docs Pricing AI Integrations Blog About 𝕏 @Datavor_ai ▶ YouTube
Get Started — Free →
3 MCP tools · sub-second latency · WAL · binlog · SQL Server

Change Data Capture.
Real-time, real-simple.

Sub-second replication driven by Postgres logical WAL, MySQL binlog, and SQL Server's native CDC. Tail your source's transaction log, stream INSERTs, UPDATEs, and DELETEs to any target as they happen. Set up in one sentence through any MCP-compatible AI tool.

prod-postgres → analytics-warehouse
0 events lag 240ms last live

Why not just batch sync?

CDC isn't always the right answer. But when freshness matters — dashboards, ML features, alerting, replication for failover — batch sync's overnight or hourly cadence stops being acceptable.

240ms
CDC latency

Real-time.

The moment a row changes in your source, it's in your target. Dashboards reflect reality. Alerts fire on the change that just happened, not the one from this morning.

1hr+
Batch sync cadence

Stale.

Hourly batches mean your analytics is between 0 and 60 minutes behind. Daily syncs mean 0 to 24 hours behind. Some questions can't be answered with stale data — fraud detection, capacity planning, ops monitoring.

100%
Change capture

Lossless.

Batch syncs based on updated_at miss rows where updates rolled back, or where the same row updated twice between batches. WAL/binlog captures every transaction, in order, exactly once.

How Datavor's CDC actually works.

For Postgres and MySQL, Datavor doesn't poll your tables or install triggers — it reads the same transaction log the database uses for its own crash-recovery and replication (Postgres's write-ahead log, MySQL's binlog) and converts the raw entries into MCP-friendly change events. For SQL Server, it polls Microsoft's native CDC functions on an interval — the recommended, version-stable path. Either way, you get ordered change events with a resumable checkpoint.

SOURCE DB Postgres / MySQL WAL / binlog transaction log streaming replication DATAVOR CDC Engine Parse WAL/binlog Map to events Checkpoint LSN change events in order TARGET Warehouse / DB apply changes INSERT / UPDATE / DELETE
① TAIL

Connect to the log

Open a replication slot (Postgres) or register as a replica (MySQL). The source streams new transactions to Datavor over a long-lived connection.

② PARSE

Decode entries

WAL records and binlog events use database-specific binary formats. Datavor parses each into a normalized JSON event with table, op, before, after, and LSN.

③ APPLY

Write to target

Events are batched and applied to the target in transaction order. UPSERT for INSERTs, conditional UPDATE for UPDATEs, DELETE WHERE for DELETEs.

④ CHECKPOINT

Save position

Datavor records the last LSN successfully applied. If the stream restarts, it resumes from exactly there — no duplicates, no gaps.

⬡ NEW IN v3.1 CDC → Snowflake batch writer

Streaming changes into a columnar warehouse.

Snowflake hates row-by-row writes — every single-row INSERT is expensive, and a naïve CDC stream that applies one change at a time will melt your credits and crawl. Datavor's Snowflake target doesn't do that.

Instead, it adaptively batches changes based on volume and lands them with a single set-based apply. Small batches go in as a multi-row INSERT into a temp table; larger ones stage a compressed CSV via PUT + COPY INTO. Either way, the batch is then applied to the target with one MERGE INTO — and deletes resolve as a set-based DELETE … WHERE pk IN (…).

You get near-real-time freshness without the per-row credit burn, so streaming CDC into Snowflake actually makes economic sense. Everything runs over the standard Snowflake SDK on a single warehouse session, chunked to stay within Snowflake's MERGE source-size guidance — quiet streams flush promptly, busy streams batch larger.

What your source database needs.

Datavor reads transaction logs that your source database already produces — but most databases ship with logging optimized for crash recovery, not replication. A small config change unlocks CDC. We tell you exactly what.

PostgreSQL 12+ · including Supabase, Neon, RDS
1. Set wal_level = logical In postgresql.conf. Restart required.
wal_level = logical
max_replication_slots = 4     -- room for Datavor + others
max_wal_senders = 4
2. Grant REPLICATION
-- One-off, as superuser
CREATE ROLE datavor_cdc REPLICATION LOGIN PASSWORD '****';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO datavor_cdc;
3. Add to pg_hba.conf
host  replication  datavor_cdc  10.0.0.0/8  md5
That's it. Datavor creates the replication slot automatically when you call start_cdc.
MySQL 5.7+ · also MariaDB 10.x
1. Enable row-based binlog In my.cnf under [mysqld]. Restart required.
server-id = 1
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
expire_logs_days = 7          -- keep enough for catch-up
2. Grant REPLICATION privileges
CREATE USER 'datavor_cdc'@'%' IDENTIFIED BY '****';
GRANT REPLICATION SLAVE, REPLICATION CLIENT, SELECT
  ON *.* TO 'datavor_cdc'@'%';
3. Verify binlog is on
SHOW VARIABLES LIKE 'log_bin';  -- expect: ON
Datavor registers as a replica when start_cdc runs.
SQL Server 2017+ · also Azure SQL MI
1. Enable CDC on the database SQL Server has native CDC built in — enable it once per database, as a member of sysadmin.
EXEC sys.sp_cdc_enable_db;
2. Enable CDC per table
EXEC sys.sp_cdc_enable_table
  @source_schema = N'dbo',
  @source_name  = N'orders',
  @role_name    = NULL;
3. Confirm SQL Server Agent is running CDC capture relies on the Agent's capture job. When start_cdc runs, Datavor polls SQL Server's CDC functions (cdc.fn_cdc_get_all_changes_*) on a configurable interval and resumes from the last LSN.
-- New in v3.1: SQL Server is now a CDC source,
-- not just a sync source/target.

What you can actually expect.

These are measured numbers from real CDC streams, not synthetic benchmarks. Conditions stated honestly below.

240
ms
p50 end-to-end latency
~800
ms
p99 latency
15k
events/sec
throughput, single stream
<1
% CPU
source DB overhead

Conditions: Postgres 16 source on AWS db.m6i.large, target on db.m6i.xlarge, same VPC. Workload: ~5k transactions/sec on a 12-column orders table. Datavor v3.1 on a single npx datavor process. p99 spikes correspond to source-side checkpoint flushes. Your numbers will vary with table width, network, and concurrent load.

The 3 MCP tools.

CDC is exposed as three MCP tools. Tell your AI to start a stream and it does — no YAML, no UI clicking. The Web UI shows live status; the tools are the control surface. Full reference in the docs.

ToolPurpose
start_cdc Open a CDC stream from a source database's WAL/binlog to a target. Creates the replication slot or registers as a replica; resumes from saved LSN if one exists.
stop_cdc End a running CDC stream cleanly. Flushes pending events, saves the LSN checkpoint, releases the replication slot.
cdc_status Inspect a stream: events processed, current source-to-target lag, last event timestamp, apply failures. Same data the Web UI's CDC monitor shows.

The honest stuff: how CDC fails.

CDC adds complexity. Anyone selling you "set and forget" replication is lying. Datavor handles these well — but you should know about them.

Replication slot fills up postgres

If your CDC consumer falls behind, Postgres retains WAL until you catch up. Run out of disk before catching up → outage.

⬡ DATAVOR Monitors slot size, warns at 60% disk. Auto-pauses & alerts at 85% rather than risk source outage. cdc_status always shows slot state.

Binlog rotation outpaces consumer mysql

If expire_logs_days rolls binlogs faster than you consume, your stream's checkpoint becomes invalid and the stream can't resume.

⬡ DATAVOR Recommends min retention based on your throughput. Detects rolled-out binlogs at startup, surfaces a clear recovery suggestion via get_suggestions.

Schema change mid-stream both

Adding a column on the source mid-stream means later events have data the target doesn't have a place to put.

⬡ DATAVOR Detects ALTER TABLE events, pauses the stream, surfaces "mirror this change?" via SuggestionEngine. Accept to apply, dismiss to skip the new column.

Target falls behind, becomes inconsistent both

If apply fails on some rows (FK violations, type mismatches), naïve CDC either stops or silently drops events. Both are bad.

⬡ DATAVOR Per-event fault tolerance. Failed events go to a quarantine table in the target, the stream keeps running, the ErrorLearner records the pattern for next time.

Don't want to watch a dashboard? Datavor's External Alerting pushes cdc_error and cdc_stopped events to Slack or any webhook the moment a stream has trouble.

How Datavor's CDC stacks up.

Capability Datavor Fivetran Airbyte Debezium
Sub-second latency ~min~min
Postgres WAL · MySQL binlog · SQL Server CDC
Set up via natural language
Runs locally — no cloud account neededOSS
No Kafka required
Schema-change auto-suggestion partial
Per-event fault tolerance + quarantine
Pricing scales with rows? (it shouldn't)NoMAR-basedrowsNo

Real-time replication in one sentence.

"Stream the orders table from prod-pg to analytics-pg with CDC, starting now." That's the whole setup, once Datavor is installed.