SemiLayerDocs

Ingest

Once a lens is pushed, keeping its index aligned with your source is its own engineering problem. SemiLayer gives you four paths — each with a different setup cost, latency profile, and delete-handling story. Pick one (or combine them — smart sync + records webhook is the production sweet spot).

The four paths

PathWho triggersDeletesBest for
Manual pushYou, oncevia --rebuildInitial load, schema changes — Keeping data fresh
syncIntervalSemiLayer, autoNoAppend-heavy tables with clean updated_at — fast + cheap
smartSyncIntervalSemiLayer, autoYes — tombstoneLow-write tables where you can't wire webhooks. Catches deletes without CDC.
Records webhookYour app, pushYes — explicitProduction freshness, sub-second, handles deletes precisely

Most production apps end up on the records webhook for live freshness, optionally paired with smartSyncInterval: '24h' as a nightly safety net. The webhook is what this section focuses on; the two scheduled paths are documented under Keeping data fresh and the Realtime Sync guide.

The canonical call

curl -X POST https://api.semilayer.com/v1/ingest/products \
  -H "Authorization: Bearer ik_production_..." \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "records",
    "changes": [
      { "id": "p_7712", "action": "upsert" },
      { "id": "p_0451", "action": "delete" }
    ]
  }'
  • mode: 'records' for per-row deltas, 'full' to re-ingest everything, 'incremental' to advance the cursor (equivalent to push --resume-ingest in the CLI).
  • changes: required for records mode. Up to 10,000 per request. Each entry is { id, action: 'upsert' | 'delete' }.
  • ik_ keys: ingest-only, env-scoped API keys. They can trigger ingest but cannot query. Safe to store in your CDC pipeline.

Response:

{
  "jobId": "j_abc123",
  "status": "queued",
  "changesBuffered": 2
}

status is "queued" on first request, "deduplicated" when an earlier job for the same lens is still processing — repeated bursts don't create backlogs.

The four pages

  • Overview — you're here. The 90-second pitch.
  • Webhook ingest — full API, payload shapes, rate limits, what ik_ keys can and can't do.
  • CDC patterns — real wiring for Postgres logical replication, Postgres triggers, MySQL binlog, AWS DMS / Kinesis, and DIY polling. The thoughtful one.
  • Troubleshooting — dead-letter queues, semilayer status, stuck ingests, delete semantics.

Which mode when

Your situationMode
Initial load, schema changedfull (equiv. push --rebuild)
Big backlog catch-upincremental (equiv. push --resume-ingest)
Production freshnessrecords + your CDC pipeline
No CDC pipeline, append-heavy tableDeclare syncInterval on the lens (Keeping data fresh)
No CDC pipeline, need deletes handledDeclare smartSyncInterval — scheduled full sweep with tombstones (Keeping data fresh)

Most apps run records in production with a full reserved for backfills. incremental is rare once CDC is wired — it's mostly useful for the first ingest catch-up after a quiet period. smartSyncInterval is the go-to for teams who can't wire CDC but still need deletes propagated automatically — no webhook code to write, just one config line.

Why webhook (not polling)

Polling wastes work: every syncInterval tick reads rows the worker has already indexed. With a large table and a low write rate, most of that I/O returns nothing new.

Webhooks flip it: the worker only touches rows you tell it touched. A 100-million-row lens with 1,000 writes a day pays for 1,000 embeddings, not 100M scans.

Second: deletes. Polling can't see a deleted row unless your schema includes a tombstone or soft-delete column. Webhooks carry deletes explicitly as { action: 'delete' }, regardless of how the source records removal.

The feedback loop

Every webhook call returns a jobId. You can watch it land:

semilayer status --lens products
# products  indexing  cursor=2026-04-20T14:02:03Z  pending=2  rate=18 rows/s

The Console's ingest-jobs view does the same thing visually, with the most recent webhook calls highlighted. See Troubleshooting for failure modes.