Ingest — Webhook
One endpoint, three modes, tier-aware rate limiting. This page is the full
API reference for POST /v1/ingest/:lens.
Don't want to wire webhooks? If you can tolerate periodic freshness
and want deletes handled without writing any CDC code, add
smartSyncInterval: '24h' to your lens config — scheduled full scan
with tombstone detection, zero infrastructure. See
Keeping data fresh.
The endpoint
Response: 202 Accepted. The worker processes asynchronously —
the response body tells you the job was queued, not that it's done.
Mode — records
For per-row deltas. This is the shape you call from a CDC pipeline.
idis the source row id (primary key on the source table, as a string). Must resolve to a row the bridge can look up.action: 'upsert'tells the worker to fetch the full row via the bridge and re-embed. Works for both new rows and updates.action: 'delete'removes the vector without a source fetch. Safe to send for rows that may already be absent — the worker is idempotent.- Batch limit: 10,000 changes per request. Split larger bursts.
What happens next:
- The service appends every change to the ingest buffer for this lens.
- It enqueues one
ingest.recordsjob, keyedrec:<envId>:<lensName>with a 2-second debounce window — rapid-fire calls to the same lens collapse into one job. - The worker picks up the job, claims the buffered changes, and processes them in batches: upserts fan out to the bridge for the current row state, deletes go straight to the vector store.
jobIdisnullwhen a previous job for this lens is still pending — your new changes just get added to its buffer.changesBufferedtells you the new buffer size.
Mode — full
Equivalent to semilayer push --rebuild. Drops every vector for the
lens and re-ingests the whole source table.
- Singleton-keyed
full:<envId>:<lensName>. Sending it while a full ingest is already running returns{ status: 'deduplicated', jobId: null }. - No
changesfield. The worker walks the source with the configured bridge's cursor. - Blows away the current index. Queries may return fewer results during the rebuild. If you need a zero-downtime rebuild, promote a new environment instead.
Mode — incremental
Equivalent to semilayer push --resume-ingest. Picks up from the last
ingest cursor (the max changeTrackingColumn value the worker saw).
- Debounced
inc:<envId>:<lensName>with a 5-second window. - No
changesfield. The worker reads rows whereupdated_at > cursor(or your declaredchangeTrackingColumn). - Right for catch-up, not for live freshness. Missed deletes won't be seen —
updated_at > cursorcan't detect removed rows.
Authentication with ik_ keys
ik_ keys are created per-environment and can only trigger ingest.
They can't query, can't read, can't subscribe. Leaking one means a
stranger can force re-ingest on your lens — annoying, not catastrophic.
Create one:
Format: ik_<envSlug>_<random> — the env slug is literal in the
prefix, so a leaked key is attributable to an environment at a glance.
Store them the same way you store webhook-receiving secrets: in your CDC pipeline's secret manager, rotated on a cadence you own. The Console's API Keys page lists every key with last-used timestamps.
Rate limiting
SaaS-only. A 60-second sliding window per (env, lens) pair, tier-aware:
| Tier | ingestWebhooksPerMinute |
|---|---|
| Free | 10 |
| Pro | 60 |
| Team | 300 |
| Enterprise | Custom |
Exceeded: 429 Too Many Requests with headers:
The Beam SDK's CDC helper (when you use one) retries automatically on
429 with exponential backoff. Raw HTTP callers: implement
Retry-After-aware retry.
Enterprise deployments have no rate limit.
Error codes
| Code | Meaning |
|---|---|
400 bad_request | mode missing, invalid JSON, changes missing for records mode, duplicate ids within a single call |
400 too_many_changes | More than 10,000 entries in changes |
400 invalid_action | An entry's action isn't 'upsert' or 'delete' |
404 lens_not_found | Lens doesn't exist in the env (or deleted) |
429 rate_limited | Tier rate limit hit. Retry per Retry-After. |
5xx | Worker unavailable. Retry with backoff — the endpoint is idempotent. |
Idempotency
The webhook is safe to retry. Duplicate { id, action } entries within
the buffer are processed once. Sending the same deltas twice is a no-op
on the final state.
For at-least-once CDC pipelines (which is most of them), this is the critical property: you can retry on network flap without worrying about double-indexing.
Example — minimal CDC worker
Batch in memory up to 10k changes or for a few seconds (whichever hits
first), then flushBatch(batch). That's the entire integration.
Next: CDC patterns — how to wire this into Postgres logical replication, Postgres triggers, MySQL binlog, and AWS DMS.