Context Engine — Datavor

⬡ EVERY OTHER ETL TOOL

You explain the same things, every time.

Fivetran, Airbyte, integrate.io — they're all stateless toolchains. The agent or operator comes in fresh each session. Your rules ("never sync test rows"), your conventions ("orders use created_at, not timestamp"), your past mistakes — none of it persists. You re-explain them, every time, to every agent.

⬡ DATAVOR

Datavor remembers. Forever, locally.

A single SQLite file at ~/.datavor/context.db accumulates everything the AI learns. Schemas, rules, recipes, errors and how you fixed them, suggestions you accepted, suggestions you dismissed. Next session — even months later — your agent picks up exactly where the last one left off.

Four engines, one memory.

The Context Engine isn't a single store — it's four cooperating subsystems, each accumulating a different kind of knowledge.

RuleStore

business rules / data quality / policy

Persistent rules the AI applies to every sync, every query, every suggestion. "Never sync rows where status = 'test'." "Always exclude PII columns from analytics warehouse." Rules are scoped to tables, databases, or globally.

RecipeManager

named, versioned transforms

Save any transform configuration as a named recipe — "trim_whitespace_lowercase," "decimal_to_cents," "anonymize_emails." Apply by name in future syncs. Recipes version themselves; if you change one, old sync jobs keep using the version they were created with.

ErrorLearner

past failures + fixes

Every sync error gets logged with its full context — what was attempted, why it failed, what fixed it. Next time a similar error shows up, Datavor surfaces the past fix proactively. Time-zone bugs, encoding mismatches, lock conflicts — all things you fix once.

SuggestionEngine

proactive what-ifs

Watches your data and pipelines, surfaces ideas. "New column tier appeared in customers — add it to the warehouse sync?" "Orders sync has been failing for 3 days — restart with checkpoint?" Accept, dismiss, or modify. Accepted suggestions become rules.

How it compounds.

Three sessions across one month — same user, same database, same AI. Watch what Datavor learns and how each session starts smarter.

SESSION 01 Day 1

First contact. Datavor learns the schema.

You: "Sync the orders table from production Postgres to my analytics warehouse, daily at 2am."
Claude: Calls describe_table, then asks: "I see orders has created_at and updated_at — which should drive incremental sync?"
You: "Use updated_at. And skip rows where status = 'test' — those are QA fixtures."
Claude: Creates scheduler_create_job, adds add_rule.

⬡ STORED schema fingerprint · incremental column choice (updated_at) · rule: orders.status != 'test'

SESSION 07 Day 8

You add a new sync. Datavor doesn't have to ask.

You: "Add the customers table to the nightly sync too."
Claude: Reads the context: same source, same target, same nightly schedule pattern, your test-row preference. Suggests: "I'll mirror the orders setup — updated_at for incremental, excluding test rows. Run after the orders sync. OK?"
You: "Yes." — No re-asking. No re-explaining the test-row rule.

⬡ STORED customers schema · sync recipe applied: "production_to_warehouse_nightly" · dependency added: customers → orders

SESSION 23 Day 30

Datavor catches a problem you'd have missed.

Claude: "Two things from overnight:
1. Postgres schema changed — orders now has a tier column (varchar(20)). Want me to mirror it to the warehouse? Looks like enum data — I'd suggest a CHECK constraint.
2. The orders sync from Day 12 is your only failure in 30 days. Root cause was a timezone mismatch on created_at. I've added a rule to coerce timestamps to UTC before incremental comparison — should prevent recurrence." — Both pulled from the SuggestionEngine + ErrorLearner. No prompt needed.

⬡ STORED new column suggestion · timezone fix codified as rule · error pattern: PG-WH-TZ-mismatch

What's actually in the context.db.

The Context Engine is a single SQLite file. No proprietary format, no encryption layer, no service to call — just SQL tables you can sqlite3 into and read directly. Here's the schema, simplified.

~/.datavor/context.db — simplified schema sqlite 3

-- Schemas Datavor has seen, fingerprinted for change detection
CREATE TABLE schemas (
  id TEXT PRIMARY KEY,           -- connection_id + table
  schema_json JSON,              -- columns, types, FKs
  fingerprint TEXT,              -- hash for diff detection
  first_seen TIMESTAMP,
  last_seen TIMESTAMP
);

-- Business rules — applied automatically by relevant tools
CREATE TABLE rules (
  id TEXT PRIMARY KEY,
  scope TEXT,                    -- 'global', 'database:X', 'table:X.Y'
  predicate TEXT,                -- SQL or DSL
  description TEXT,
  created_at TIMESTAMP,
  source TEXT                    -- 'user' or 'accepted_suggestion'
);

-- Named, versioned transform recipes
CREATE TABLE recipes (
  id TEXT PRIMARY KEY,
  name TEXT UNIQUE,
  version INTEGER,
  transforms_json JSON,
  tags TEXT,
  created_at TIMESTAMP
);

-- Errors with their fixes, for proactive recall
CREATE TABLE errors (
  id TEXT PRIMARY KEY,
  pattern_hash TEXT,             -- for similarity matching
  context_json JSON,             -- what was attempted
  error_message TEXT,
  resolution_json JSON,          -- what fixed it
  occurred_at TIMESTAMP
);

-- Suggestions surfaced to user, with their disposition
CREATE TABLE suggestions (
  id TEXT PRIMARY KEY,
  type TEXT,                     -- 'schema_change', 'sync_recovery', etc.
  payload_json JSON,
  status TEXT,                   -- 'pending', 'accepted', 'dismissed'
  created_at TIMESTAMP,
  resolved_at TIMESTAMP
);

Run sqlite3 ~/.datavor/context.db .schema on any Datavor install to see your actual tables — these and a few internal ones for indexing.

What the Context Engine stores. And what it doesn't.

The Context Engine is on your machine. It stays on your machine. Free tier is 100% local — nothing leaves. Pro sends a tiny daily license-validation heartbeat (aggregate counts, no content). Here's the strict line, drawn:

Stored

Database schemas — table names, column names, types, foreign keys
Rules you've defined or accepted (predicates only, not data)
Recipe definitions — transform configurations by name
Error patterns — what failed, why, how it was fixed
Connection metadata — host (hashed), database name, last connected
Job history — what synced, when, how many rows, success/fail
Suggestion log — what was suggested, what you accepted/dismissed

Never stored

Row data. Never. Not in errors, not in logs, not in suggestions.
SQL parameter values. Queries are recorded as templates, not with bound values.
Database passwords. Credentials never touch disk inside Datavor.
PII columns. Even column names matching PII patterns get hashed.
External keys — API keys, OAuth tokens, secrets in env vars.
Personal data of any kind beyond what's needed for the schema fingerprint.

The 11 MCP tools that talk to the Context Engine.

The Context Engine is exposed entirely through MCP — your AI tool reads and writes it through these 11 tools. Full reference in the docs.

Tool	Purpose	Component
`get_context`	Everything Datavor knows: databases, rules, relationships, recipes, recent suggestions.	All
`add_rule`	Save a business rule with scope and predicate.	Rules
`update_rule`	Modify an existing rule's predicate or scope.	Rules
`remove_rule`	Delete a rule. Past job runs that used it remain unaffected.	Rules
`save_recipe`	Save a transform configuration as a named, versioned recipe.	Recipes
`apply_recipe`	Apply a saved recipe to a new sync configuration by name.	Recipes
`list_recipes`	List saved recipes, optionally filtered by connection, table, or tags.	Recipes
`get_suggestions`	Get pending suggestions for review — schema changes, sync recoveries, optimizations.	Suggest
`accept_suggestion`	Apply a suggestion. May silently create rules, recipes, or schedule jobs.	Suggest
`dismiss_suggestion`	Reject a suggestion. It won't be re-surfaced for the same pattern.	Suggest
`transform_preview`	Preview what transforms will produce on sample data before running.	Recipes

Why no other tool has this.

Capability	Datavor	Fivetran	Airbyte	integrate.io
Persistent rules that apply across syncs	✓	—	—	—
Named, versioned, reusable transform recipes	✓	partial	—	✓
Error patterns surfaced from past failures	✓	—	—	—
Proactive suggestions based on schema drift	✓	—	—	—
State lives entirely on your machine	✓	—	—	—
Inspectable as a plain SQL file	✓	—	—	—
AI tool can read it without permissions	✓	—	—	—

Competitors that have some of these features keep them in their cloud, behind their UI. None expose them as a flat SQLite file your AI can query as freely as it queries your databases. That gap is the Context Engine.

The Context Engine.
Datavor that gets smarter.

You explain the same things, every time.

Datavor remembers. Forever, locally.

Four engines, one memory.

RuleStore

RecipeManager

ErrorLearner

SuggestionEngine

How it compounds.

First contact. Datavor learns the schema.

You add a new sync. Datavor doesn't have to ask.

Datavor catches a problem you'd have missed.

What's actually in the context.db.

What the Context Engine stores. And what it doesn't.

Stored

Never stored

The 11 MCP tools that talk to the Context Engine.

Why no other tool has this.

Start with fresh memory.
End the month with hindsight.

The Context Engine.Datavor that gets smarter.

You explain the same things, every time.

Datavor remembers. Forever, locally.

Four engines, one memory.

RuleStore

RecipeManager

ErrorLearner

SuggestionEngine

How it compounds.

First contact. Datavor learns the schema.

You add a new sync. Datavor doesn't have to ask.

Datavor catches a problem you'd have missed.

What's actually in the context.db.

What the Context Engine stores. And what it doesn't.

Stored

Never stored

The 11 MCP tools that talk to the Context Engine.

Why no other tool has this.

Start with fresh memory.End the month with hindsight.

The Context Engine.
Datavor that gets smarter.

Start with fresh memory.
End the month with hindsight.