Ingest — CDC patterns
"How do I get my database's changes into SemiLayer?" Five patterns, one API shape. Pick the one that matches the data source you already run.
The common contract: your CDC pipeline collects { id, action } pairs
and POSTs them to /v1/ingest/:lens. Batch them, debounce them, and
retry on 429 — see Webhook ingest for the API.
This page is about the upstream half — getting changes out of your database reliably.
Postgres — logical replication (preferred)
The cleanest path. Create a publication, subscribe with wal2json or
pgoutput, stream changes as JSON.
1. Enable logical replication
2. Stream the slot and batch
Node.js example using pg-logical-replication:
Why this is the default choice: deletes are explicit, ordering is exact, there's no "missed write" window. Works on every managed Postgres that exposes logical replication (RDS, Aurora, Supabase, Neon, Cloud SQL).
Caveats: requires wal_level=logical, which some heavily-customized
Postgres setups don't allow. The replication slot holds WAL until your
consumer catches up — if you stop your CDC worker, disk usage grows.
Postgres — triggers + tombstone table
If logical replication isn't available, install a trigger on each
watched table that writes to a lightweight semilayer_changes log.
1. Create the log + trigger
2. Drain in a worker
Why triggers over polling the source table directly: tombstone rows
mean you see deletes. updated_at-based polling can't.
Caveats: the trigger runs inside every write transaction. On a
write-heavy table, this adds a tiny but nonzero latency cost. The
semilayer_changes table grows unless your worker drains it reliably —
monitor its row count.
MySQL — binlog via Debezium
MySQL's binlog is the equivalent of Postgres logical replication. The canonical reader is Debezium, which publishes changes as JSON to Kafka or (with the Debezium Server + HTTP sink) directly to a webhook.
Debezium Server config (application.properties)
Adapter
Debezium emits one HTTP POST per change. Your adapter batches them:
Why Debezium: battle-tested, handles MySQL's quirks (binlog format, DDL changes, GTID-based restarts). Pluggable into any CDC target.
Caveats: operationally heavier than the Postgres options —
Debezium Server is a separate process you run. For a light setup, a
handwritten mysql-binlog-connector-java wrapper is simpler.
AWS — DMS + Kinesis + Lambda
When your source lives in RDS and your stack is all-AWS, DMS into Kinesis is the shortest production path.
DMS task
- Source endpoint: RDS with CDC enabled (
rds.logical_replication = 1for Postgres). - Target endpoint: Kinesis Data Stream.
- Task type: "Replicate data changes only" (CDC).
- Table mappings: just the tables you care about.
Lambda handler
Lambda batch size controls the natural batch into SemiLayer — 500–1000 is a good starting point.
Why this pattern: no servers to manage. DMS handles the source-side complexity (WAL retention, reconnects, DDL awareness). Lambda autoscales.
Caveats: DMS has cold-start overhead on CDC resumption after long pauses. For always-on workloads it's fine; for bursty workloads budget a few minutes of catch-up after quiet periods.
DIY — cron + updated_at polling
When CDC infra is overkill or unavailable. Your worker polls the source every N seconds for rows updated since last check and forwards them.
Caveats — important:
- Deletes are invisible. If your app supports deletes, add a
deleted_atcolumn and either (a) include it in the lens config as a filter field and soft-delete only, or (b) run a second reconciliation pass that compares source id sets to indexed id sets. updated_atdiscipline required. Every write to every watched row must touchupdated_at. Forgetting to bump it on a specific code path silently drops changes.- Polling burns work when the source is quiet. A 30s cadence × 24h × a query that returns nothing = 2,880 empty round-trips daily. CDC is cheaper for anything above trivial scale.
Good enough for: prototypes, low-write internal tools, append-only data (where deletes don't happen). Not recommended for anything production-shaped.
Choosing
| Your stack | Recommended |
|---|---|
Postgres (any flavor with wal_level=logical) | Logical replication + your own worker |
Postgres, can't touch wal_level | Triggers + tombstone table |
| MySQL | Debezium |
| RDS + AWS all the way | DMS → Kinesis → Lambda |
| No CDC, low write rate, append-only | syncInterval on the lens (Keeping data fresh) |
| No CDC, need deletes handled automatically | smartSyncInterval: '24h' — scheduled full scan with tombstone detection, no webhook code to write (Keeping data fresh) |
| No CDC, need deletes and precise timing | Triggers even on MySQL — AFTER DELETE into a changes table |
The webhook endpoint doesn't care where the changes came from. Pick the upstream that matches what you already run; the integration is the same 20 lines of "batch + POST + retry on 429."
Next: Troubleshooting — what to do when changes aren't landing.