SemiLayerDocs

Ingest — Webhook

One endpoint, three modes, tier-aware rate limiting. This page is the full API reference for POST /v1/ingest/:lens.

💡

Don't want to wire webhooks? If you can tolerate periodic freshness and want deletes handled without writing any CDC code, add smartSyncInterval: '24h' to your lens config — scheduled full scan with tombstone detection, zero infrastructure. See Keeping data fresh.

The endpoint

POST /v1/ingest/:lens
Authorization: Bearer ik_<envSlug>_...         (or sk_, but ik_ is preferred)
Content-Type: application/json

Response: 202 Accepted. The worker processes asynchronously — the response body tells you the job was queued, not that it's done.

interface IngestBody {
  mode:     'full' | 'incremental' | 'records'
  changes?: Array<{ id: string; action: 'upsert' | 'delete' }>   // required for records
}

interface IngestResponse {
  jobId:            string | null             // null on dedup
  status:           'queued' | 'deduplicated'
  changesBuffered?: number                    // records mode only
}

Mode — records

For per-row deltas. This is the shape you call from a CDC pipeline.

{
  "mode": "records",
  "changes": [
    { "id": "p_7712", "action": "upsert" },
    { "id": "p_0451", "action": "delete" },
    { "id": "p_6340", "action": "upsert" }
  ]
}
  • id is the source row id (primary key on the source table, as a string). Must resolve to a row the bridge can look up.
  • action: 'upsert' tells the worker to fetch the full row via the bridge and re-embed. Works for both new rows and updates.
  • action: 'delete' removes the vector without a source fetch. Safe to send for rows that may already be absent — the worker is idempotent.
  • Batch limit: 10,000 changes per request. Split larger bursts.

What happens next:

  1. The service appends every change to the ingest buffer for this lens.
  2. It enqueues one ingest.records job, keyed rec:<envId>:<lensName> with a 2-second debounce window — rapid-fire calls to the same lens collapse into one job.
  3. The worker picks up the job, claims the buffered changes, and processes them in batches: upserts fan out to the bridge for the current row state, deletes go straight to the vector store.
  4. jobId is null when a previous job for this lens is still pending — your new changes just get added to its buffer. changesBuffered tells you the new buffer size.

Mode — full

Equivalent to semilayer push --rebuild. Drops every vector for the lens and re-ingests the whole source table.

{ "mode": "full" }
  • Singleton-keyed full:<envId>:<lensName>. Sending it while a full ingest is already running returns { status: 'deduplicated', jobId: null }.
  • No changes field. The worker walks the source with the configured bridge's cursor.
  • Blows away the current index. Queries may return fewer results during the rebuild. If you need a zero-downtime rebuild, promote a new environment instead.

Mode — incremental

Equivalent to semilayer push --resume-ingest. Picks up from the last ingest cursor (the max changeTrackingColumn value the worker saw).

{ "mode": "incremental" }
  • Debounced inc:<envId>:<lensName> with a 5-second window.
  • No changes field. The worker reads rows where updated_at > cursor (or your declared changeTrackingColumn).
  • Right for catch-up, not for live freshness. Missed deletes won't be seen — updated_at > cursor can't detect removed rows.

Authentication with ik_ keys

ik_ keys are created per-environment and can only trigger ingest. They can't query, can't read, can't subscribe. Leaking one means a stranger can force re-ingest on your lens — annoying, not catastrophic.

Create one:

semilayer keys create --env production --type ingest --name ci-cdc-pipeline
# ik_production_abc123...

Format: ik_<envSlug>_<random> — the env slug is literal in the prefix, so a leaked key is attributable to an environment at a glance.

Store them the same way you store webhook-receiving secrets: in your CDC pipeline's secret manager, rotated on a cadence you own. The Console's API Keys page lists every key with last-used timestamps.

Rate limiting

SaaS-only. A 60-second sliding window per (env, lens) pair, tier-aware:

TieringestWebhooksPerMinute
Free10
Pro60
Team300
EnterpriseCustom

Exceeded: 429 Too Many Requests with headers:

Retry-After:         12
X-RateLimit-Limit:   60
X-RateLimit-Remaining: 0

The Beam SDK's CDC helper (when you use one) retries automatically on 429 with exponential backoff. Raw HTTP callers: implement Retry-After-aware retry.

Enterprise deployments have no rate limit.

Error codes

CodeMeaning
400 bad_requestmode missing, invalid JSON, changes missing for records mode, duplicate ids within a single call
400 too_many_changesMore than 10,000 entries in changes
400 invalid_actionAn entry's action isn't 'upsert' or 'delete'
404 lens_not_foundLens doesn't exist in the env (or deleted)
429 rate_limitedTier rate limit hit. Retry per Retry-After.
5xxWorker unavailable. Retry with backoff — the endpoint is idempotent.

Idempotency

The webhook is safe to retry. Duplicate { id, action } entries within the buffer are processed once. Sending the same deltas twice is a no-op on the final state.

For at-least-once CDC pipelines (which is most of them), this is the critical property: you can retry on network flap without worrying about double-indexing.

Example — minimal CDC worker

// your-cdc-worker.ts
async function flushBatch(changes: Array<{ id: string; action: 'upsert' | 'delete' }>) {
  const response = await fetch(
    `https://api.semilayer.com/v1/ingest/products`,
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.SEMILAYER_INGEST_KEY}`,
        'Content-Type':  'application/json',
      },
      body: JSON.stringify({ mode: 'records', changes }),
    },
  )

  if (response.status === 429) {
    const retryAfter = Number(response.headers.get('retry-after')) || 30
    await new Promise((r) => setTimeout(r, retryAfter * 1000))
    return flushBatch(changes)      // retry
  }

  if (!response.ok) {
    throw new Error(`ingest failed: ${response.status} ${await response.text()}`)
  }

  return response.json()            // { jobId, status, changesBuffered }
}

Batch in memory up to 10k changes or for a few seconds (whichever hits first), then flushBatch(batch). That's the entire integration.

Next: CDC patterns — how to wire this into Postgres logical replication, Postgres triggers, MySQL binlog, and AWS DMS.