SemiLayerDocs

Ingest — Troubleshooting

When webhook calls succeed but data isn't landing — or lands slowly, partially, or not at all — this page is the checklist.

The happy-path flow

Before anything else, confirm what should be happening:

  1. Your CDC pipeline POSTs /v1/ingest/:lens → service returns 202 with a jobId.
  2. Service writes every change into the ingest buffer for the lens.
  3. Service enqueues an ingest.records job, debounced 2 seconds per lens.
  4. Worker claims the buffered changes, fans out upserts to the bridge, updates vectors, deletes tombstones.
  5. semilayer status --lens <name> shows cursor advancing and pending=0.

If any step breaks, it's in one of these places.

Step 1 — Did the webhook succeed?

curl -i -X POST https://api.semilayer.com/v1/ingest/products \
  -H "Authorization: Bearer $IK_KEY" \
  -H "Content-Type: application/json" \
  -d '{"mode":"records","changes":[{"id":"p_1","action":"upsert"}]}'

# Expected: 202 Accepted, { jobId, status: 'queued' | 'deduplicated', changesBuffered }
StatusMeaningFix
401Key rejectedCheck the key prefix (ik_<envSlug>_). Wrong env? Rotated?
403Key is valid but wrong type (e.g. you passed a pk_)Use an ik_ or sk_ key
404Lens doesn't exist in the envDid you push? Did you delete it?
400Payload validation failedCheck message: missing mode, duplicate ids, > 10k changes
429Rate limit hitHonor Retry-After. Upgrade tier if chronic.
5xxService unavailableRetry with backoff. Endpoint is idempotent.
202✓ QueuedMove to step 2

Step 2 — Is the worker draining the buffer?

semilayer status --lens products
# products  indexing  pending=487  rate=12 rows/s  last_tick=12s ago
  • pending > 0 and rate > 0 — draining normally. If pending is very high and rising, you're producing faster than the worker can consume (see "The backlog is growing" below).
  • pending > 0 and rate = 0 — worker stuck. Check "The worker is stuck" below.
  • pending = 0 — worker caught up. Either nothing to do or the changes were processed. Move to step 3.

Admins can inspect the same data on the Console → Ingest Jobs page, which also shows per-lens throughput history.

Step 3 — Are the vectors actually there?

semilayer query products.query --where '{"id":"p_1"}' --limit 1

For records that should have been upserted, the row comes back with its fields populated. For deleted records, the query returns nothing.

If the row is there but not appearing in search results:

  • Check that searchable: true (or { weight: N }) is set on the fields you expect to rank the row. A row with no searchable fields is stored but not embedded — see Search — Fields & weights.
  • Check that the field that was supposed to contain the updated text actually updated in the source. Bridge caching can hide stale reads for the first few seconds; re-run after 10 seconds.

Common failure modes

The worker is stuck

Symptom: semilayer status shows pending > 0 and rate = 0 for minutes. The job is probably in the dead-letter queue.

semilayer status --lens products --dead-letter
# ingest.records.dead  3 jobs  oldest=2026-04-20T14:02:03Z

Each ingest mode has its own dead-letter queue:

ModeDead-letter queue
recordsingest.records.dead
fullingest.full.dead
incrementalingest.incremental.dead
sync (periodic)ingest.sync.dead

Jobs land in dead-letter after the retry policy exhausts — by default, 4 attempts with exponential backoff. Inspect a dead job:

semilayer ingest inspect --job-id j_abc123
# jobId        j_abc123
# state        failed (dead-letter)
# failed_reason  bridge_unreachable
# last_error   ETIMEDOUT connecting to postgres://... (attempt 4/4)
# payload      { changes: [...] }

Most dead-letter reasons:

  • bridge_unreachable — source DB rejected connections. Fix credentials, network, firewall. Then re-enqueue the job: semilayer ingest retry --job-id j_abc123.
  • source_query_failed — bridge raised a SQL error. Column renamed? Table dropped? Schema drift between config and source. Reconcile, push config, then retry.
  • embedding_failed — embedder refused or rate-limited. Rare if your own embedder is healthy; check its quotas.
  • row_not_found — an upsert change referenced an id the bridge can't find. Usually means the row was inserted and deleted between your CDC emitting the event and the worker processing it. Silently drop these; they're not real errors.

The backlog is growing

pending climbs without bound. Causes, in order of likelihood:

  1. You're over your tier's ingestWebhooksPerMinute but keep retrying. The webhook returns 202 on every successful call, so your CDC pipeline keeps sending, but the worker can only process at the tier's rate. Check semilayer status --lens X --verbose for throttle signals. Upgrade or slow your emitter.
  2. Embedder is rate-limiting the worker. Every upsert embeds a row; bursts can hit the embedder's RPS cap. The worker retries transparently, but large sustained bursts build backlog. Scale up the embedder's quota or slow your CDC emitter.
  3. Bridge is slow. The worker fetches current row state per upsert. If the source DB is slow under load, each batch takes longer. Check source-side slow query log.

Deletes aren't applying

  • The row still shows in query after you sent { action: 'delete' }. Wait 10s — the worker drains in batches. If it persists, semilayer ingest inspect on the job. Common cause: the lens's config has a field with primaryKey: true set on a column that isn't actually the row id, so the vector store can't match the delete.
  • The row shows in query but not in search. Expected — query reads the source directly via the bridge; search hits the vector index. After a delete, the source row is (usually) gone at the source, and the vector is removed. If query still returns it, your CDC didn't delete the source row; ingest can only remove the vector, not the source.
  • Soft-delete pattern? If your source marks rows deleted with deleted_at IS NOT NULL instead of physically removing them, the upsert path re-indexes them. Add a where filter in the lens's facets.search.fields-adjacent selection — or filter on the client.

Stale data in queries

search returns old content after an update. Possible causes:

  • Change never reached the webhook. Check your CDC pipeline's delivery log. The webhook returning 202 is the point of commit.
  • Worker hasn't processed yet. 2–10 seconds of lag is normal. More than a minute for a single update suggests a dead-lettered job (above).
  • Field that changed isn't searchable. The row is re-indexed on every upsert, but a non-searchable field change doesn't move the vector. search still ranks by the old vector. Fix: mark the field searchable: true or query by a different vector-relevant field.
  • Embedding cache hit. If the embedded text didn't actually change (e.g. a different column updated), the worker skips re-embedding and reuses the old vector. Expected behavior — not a bug.

semilayer ingest reference

semilayer ingest status   --lens products      # pending + rate
semilayer ingest runs     --lens products      # recent ingest runs
semilayer ingest inspect  --job-id j_abc123    # one job's state + last error
semilayer ingest retry    --job-id j_abc123    # re-enqueue a dead-letter job
semilayer ingest flush    --lens products      # force-drain pending buffer now (admin)

The Console's Ingest Jobs page exposes the same surface visually.

Your source dropped a row (hard DELETE), but a search still returns it. This is expected for incremental sync — syncInterval's WHERE updated_at > cursor filter cannot see rows that no longer exist. Three fixes, in increasing order of infrastructure:

  1. One-off cleanup — click Sync now in the Console or run semilayer sync --lens <name>. Smart sync's tombstone pass removes rows missing from the source.
  2. Schedule the cleanup — add smartSyncInterval: '24h' to the lens config. The same tick loop that drives syncInterval will run a nightly smart sync and handle deletes automatically, no webhook code required. See Keeping data fresh.
  3. Propagate synchronously — wire the records webhook with { action: 'delete' } entries from your CDC pipeline. Sub-second delete propagation, no polling lag. See CDC patterns.

Option 2 is the right default for teams without CDC infrastructure.

When to escalate

If ingest retry on a dead-letter job succeeds but it fails again the same way, and the failure isn't an obvious source issue (down DB, bad credentials), file a support ticket with:

  • Lens name + org/env
  • Job id(s) from ingest inspect
  • Timeframe of the failure
  • A sample changes payload that reproduces the error

Platform admins have direct access to the audit log and can see the exact exception trace.