Home Product Docs Pricing AI Integrations Blog About 𝕏 @Datavor_ai ▶ YouTube
Get Started — Free →
6 MCP tools · per-record fault tolerance · saved recipes

Sync. Transform.
In any direction.

Datavor's ETL core. Six sync modes — full, partial, incremental, with-transforms, plus query-level read tools — covering the full spectrum from "copy this table" to "sync only yesterday's orders, transform timestamps to UTC, skip test rows."

Pick your sync mode.

Most ETL tools give you one or two ways to move data. Datavor gives you six, each tuned for a specific situation. Your AI tool picks the right one based on what you say — but it helps to know what they do.

⬡ sync_table

Full sync — every row, every time

Full
When to use

Small-to-medium tables where every row matters. Reference data (countries, currencies, products). Initial loads before incremental kicks in. Anything under ~1M rows where snapshot semantics beat tracking complexity.

  • Truncate target, reload from source
  • Atomic — target is consistent at end of sync
  • Idempotent — running twice == running once
In conversation
You: "Copy the products table from prod-pg to analytics-pg, fresh every night."
Claude: Calls scheduler_create_job wrapping sync_table with daily cron.

// Generated tool call: sync_table( source="prod-pg", target="analytics-pg", table="products", mode="full" )
⬡ sync_table_partial

Filtered sync — only rows matching a WHERE clause

WHERE
When to use

You want a slice, not the whole table. Yesterday's orders, customers in a region, the top 10 % most active users. Anything expressible as SQL WHERE.

  • SQL WHERE clause defines the slice
  • Combines with full or upsert modes
  • Useful for dev/staging environment seeding
In conversation
You: "Copy yesterday's orders from prod to staging."
Claude: Recognizes the time predicate, calls sync_table_partial.

// Generated tool call: sync_table_partial( source="prod-pg", target="staging-pg", table="orders", where="created_at >= CURRENT_DATE - INTERVAL '1 day'" )
⬡ sync_table_incremental

Incremental sync — only new + updated rows

Cursor-based
When to use

Large tables that grow over time. Datavor remembers the last sync's max updated_at (or any timestamp/sequence column you pick) and fetches only what's changed since. Orders of magnitude faster than full sync for log-shaped data.

  • Cursor column auto-detected when possible
  • Overlap window for late-arriving updates
  • Falls back to full sync if cursor is missing
In conversation
You: "Sync the orders table from prod to analytics every 15 minutes."
Claude: Sees high-volume table, picks incremental.

// Generated tool call: sync_table_incremental( source="prod-pg", target="analytics-pg", table="orders", cursor_column="updated_at", overlap_minutes=5 )
⬡ sync_table_with_transforms

Sync + inline column transforms

Recipes
When to use

The data crosses a boundary where it needs to change shape. Anonymize PII before warehouse load. Normalize phone numbers. Cents-to-dollars. Lower-case emails. Combine first+last into a single name. Anything column-level, while syncing.

  • Inline transforms — no separate stage
  • Apply saved recipes by name
  • Context Engine stores recipes for reuse
In conversation
You: "Sync orders to analytics, but mask the customer_email and convert amount_cents to amount_usd."
Claude: Calls sync_table_with_transforms.

// Generated tool call: sync_table_with_transforms( source="prod-pg", target="analytics-pg", table="orders", transforms=[ {column: "customer_email", op: "hash_sha256"}, {column: "amount_cents", op: "divide", by: 100, as: "amount_usd"} ] )
⬡ execute_query

Raw SQL — when only SQL will do

SQL
When to use

One-off transformations, schema migrations, custom joins. When the answer requires more than table-to-table movement. Datavor doesn't try to abstract SQL away — it embraces it for the cases where it's the right tool.

  • SELECT, INSERT, UPDATE, DELETE, DDL
  • Parameterized — no string concatenation injection risk
  • Result limits configurable (default 10k rows)
In conversation
You: "Add a tier column to customers, with values from the new tiers table."
Claude: Issues two execute_query calls.

// 1. Add the column execute_query( connection="prod-pg", sql="ALTER TABLE customers ADD COLUMN tier varchar(20)" ) // 2. Backfill it execute_query( connection="prod-pg", sql="UPDATE customers c SET tier = t.tier FROM tiers t WHERE c.id = t.customer_id" )
⬡ get_table_data

Read-only fetch — peek without copying

Read
When to use

Inspecting data before deciding what to do with it. Sample a few rows. Spot-check after a sync. Most-recent log entries. Quick "what does this column actually look like" lookups during conversation.

  • Optional WHERE clause and ORDER BY
  • Default LIMIT 100 to keep tokens manageable
  • Returns rows as JSON — easy for the AI to reason over
In conversation
You: "Show me the latest 5 orders."
Claude: Calls get_table_data, summarizes inline.

// Generated tool call: get_table_data( connection="prod-pg", table="orders", order_by="created_at DESC", limit=5 )

A transform recipe, end-to-end.

One concrete example, top to bottom. Source data is messy — leading whitespace, mixed-case emails, cents-as-integers, raw timestamps. Target needs it clean. Here's the recipe, and what it produces.

① SOURCE orders @ prod-pg
-- 3 rows, raw from production
id: 4821
email: "  Alice@ACME.COM"
amount_cents: 12950
created_at: "2026-05-19 14:23:11"
  -- no TZ specified

id: 4822
email: "BOB@example.org"
amount_cents: 8400
created_at: "2026-05-19 14:23:42"

id: 4823
email: "  carol@test.io  "
amount_cents: 22500
created_at: "2026-05-19 14:24:08"
② RECIPE prod_to_warehouse
// Saved by save_recipe
// Reused via apply_recipe
{
  "name": "prod_to_warehouse",
  "version": 3,
  "transforms": [
    {
      "column": "email",
      "ops": ["trim", "lowercase"]
    },
    {
      "column": "amount_cents",
      "op": "divide",
      "by": 100,
      "rename_to": "amount_usd"
    },
    {
      "column": "created_at",
      "op": "to_utc",
      "source_tz": "America/New_York"
    }
  ]
}
③ TARGET orders @ warehouse
-- After applying recipe
id: 4821
email: "alice@acme.com"
amount_usd: 129.50
created_at: "2026-05-19 18:23:11Z"

id: 4822
email: "bob@example.org"
amount_usd: 84.00
created_at: "2026-05-19 18:23:42Z"

id: 4823
email: "carol@test.io"
amount_usd: 225.00
created_at: "2026-05-19 18:24:08Z"

Use transform_preview to see the target output on sample data before running the sync. No more "let's see what happens" with production data.

Per-record fault tolerance, actually demonstrated.

A 100,000-row sync hits a malformed row at position 47,213. What happens? With most ETL tools, the answer is "everything stops." With Datavor, it's "one row fails, the other 99,999 succeed, you see exactly what broke."

Typical ETL tool FAILS HARD

RESULT Job aborts at row 47,213. Target left in partial state. You re-run, hit the same row, abort again. You manually find and fix the bad row, then re-run the entire 100k-row sync from the beginning.

Datavor FAILS GRACEFULLY

!

RESULT Bad row gets quarantined with its error. Sync continues to the end. Final report: {success: 99,999, quarantined: 1, error: "varchar overflow on email column"}. ErrorLearner records the pattern. Re-run skips the row until you fix or override.

Quarantined rows write to ~/.datavor/quarantine/<job_id>.jsonl with full row content and error context. Your AI can read them, suggest fixes, and re-attempt — all from inside the same conversation.

Schema-aware column mapping.

Source and target tables rarely have identical schemas. Different column names, different types, missing columns. Datavor reconciles automatically — and when it can't, it asks instead of guessing.

SOURCE · prod-pg customers id int customer_email varchar first_name varchar last_name varchar total_cents bigint created_at timestamp tier varchar TARGET · warehouse dim_customers customer_id bigint email varchar full_name varchar — derived — amount_usd numeric created_at_utc timestamptz — missing — direct derived transformed unmapped ⬡ "tier" not in target — SuggestionEngine pings you

Datavor reads both schemas via describe_table, matches by name first, then by name-similarity (customer_email → email), applies type coercion where safe (int → bigint), and prompts only for ambiguous cases. Columns it can't map (like a brand-new tier) get surfaced as a SuggestionEngine recommendation — not silently dropped.

The 6 MCP tools.

The Sync & Transform tools are exposed through MCP. Your AI tool reads the conversation, picks the right one, fills in the parameters. Full reference in the docs.

ToolPurpose
sync_tableFull sync — truncate target, reload from source. Idempotent.
sync_table_partialSync only rows matching a SQL WHERE clause.
sync_table_incrementalSync only new or updated rows, using a cursor column.
sync_table_with_transformsSync with inline column-level transforms or a named recipe.
execute_queryRun raw SQL — SELECT, INSERT, UPDATE, DELETE, DDL.
get_table_dataFetch rows from a table with optional WHERE / ORDER BY / LIMIT.

Real conversations, real syncs.

Six things people actually say to Datavor every day, and which sync mode each maps to. None of them require knowing the mode names.

Nightly warehouse load
Sync the orders, customers, and products tables from prod to the warehouse every night at 2am. Skip test rows.
uses: sync_table_incremental · scheduler_create_job · add_rule
Staging seed
Copy yesterday's orders to staging, but anonymize the customer emails.
uses: sync_table_partial · sync_table_with_transforms
One-off backfill
Backfill the new tier column on customers from the subscriptions table for everyone signed up before March.
uses: execute_query (UPDATE with JOIN)
Quick lookup mid-conversation
What did the last 10 failed orders look like? Show me their statuses.
uses: get_table_data with WHERE + ORDER BY
Reference data refresh
Update the products table fresh from prod into staging every Monday morning.
uses: sync_table · scheduler_create_job
Cross-cloud migration
Move the events table from our old Cloud SQL to the new Snowflake warehouse, with timestamps converted to UTC.
uses: sync_table_with_transforms · save_recipe

Sync is the easy part.

What's hard is doing it reliably, transformatively, and from natural language. Datavor's six modes cover the spectrum without forcing you to learn yet another DSL.