Ingest
Once a lens is pushed, keeping its index aligned with your source is its own engineering problem. SemiLayer gives you four paths — each with a different setup cost, latency profile, and delete-handling story. Pick one (or combine them — smart sync + records webhook is the production sweet spot).
The four paths
| Path | Who triggers | Deletes | Best for |
|---|---|---|---|
| Manual push | You, once | via --rebuild | Initial load, schema changes — Keeping data fresh |
syncInterval | SemiLayer, auto | No | Append-heavy tables with clean updated_at — fast + cheap |
smartSyncInterval | SemiLayer, auto | Yes — tombstone | Low-write tables where you can't wire webhooks. Catches deletes without CDC. |
| Records webhook | Your app, push | Yes — explicit | Production freshness, sub-second, handles deletes precisely |
Most production apps end up on the records webhook for live freshness, optionally paired with smartSyncInterval: '24h' as a nightly safety net. The webhook is what this section focuses on; the two scheduled paths are documented under Keeping data fresh and the Realtime Sync guide.
The canonical call
mode:'records'for per-row deltas,'full'to re-ingest everything,'incremental'to advance the cursor (equivalent topush --resume-ingestin the CLI).changes: required forrecordsmode. Up to 10,000 per request. Each entry is{ id, action: 'upsert' | 'delete' }.ik_keys: ingest-only, env-scoped API keys. They can trigger ingest but cannot query. Safe to store in your CDC pipeline.
Response:
status is "queued" on first request, "deduplicated" when an
earlier job for the same lens is still processing — repeated bursts
don't create backlogs.
The four pages
- Overview — you're here. The 90-second pitch.
- Webhook ingest — full API, payload shapes, rate limits, what
ik_keys can and can't do. - CDC patterns — real wiring for Postgres logical replication, Postgres triggers, MySQL binlog, AWS DMS / Kinesis, and DIY polling. The thoughtful one.
- Troubleshooting — dead-letter queues,
semilayer status, stuck ingests, delete semantics.
Which mode when
| Your situation | Mode |
|---|---|
| Initial load, schema changed | full (equiv. push --rebuild) |
| Big backlog catch-up | incremental (equiv. push --resume-ingest) |
| Production freshness | records + your CDC pipeline |
| No CDC pipeline, append-heavy table | Declare syncInterval on the lens (Keeping data fresh) |
| No CDC pipeline, need deletes handled | Declare smartSyncInterval — scheduled full sweep with tombstones (Keeping data fresh) |
Most apps run records in production with a full reserved for
backfills. incremental is rare once CDC is wired — it's mostly useful
for the first ingest catch-up after a quiet period. smartSyncInterval
is the go-to for teams who can't wire CDC but still need deletes
propagated automatically — no webhook code to write, just one config
line.
Why webhook (not polling)
Polling wastes work: every syncInterval tick reads rows the worker
has already indexed. With a large table and a low write rate, most of
that I/O returns nothing new.
Webhooks flip it: the worker only touches rows you tell it touched. A 100-million-row lens with 1,000 writes a day pays for 1,000 embeddings, not 100M scans.
Second: deletes. Polling can't see a deleted row unless your schema
includes a tombstone or soft-delete column. Webhooks carry deletes
explicitly as { action: 'delete' }, regardless of how the source
records removal.
The feedback loop
Every webhook call returns a jobId. You can watch it land:
The Console's ingest-jobs view does the same thing visually, with the most recent webhook calls highlighted. See Troubleshooting for failure modes.