Keeping data fresh
Your source is the truth. SemiLayer's index needs to stay close behind. You have four ways to keep them aligned:
1. push --resume-ingest / --rebuild
The CLI command you already know. Runs a full or incremental ingest on demand. Right for the initial load and for occasional big shifts (renaming fields, moving a column). Not right for routine drift — running it on a cron is wasteful.
2. syncInterval — periodic polling (incremental)
Declare a cadence on the lens and the worker polls the source for you
using the changeTrackingColumn (defaults to updated_at).
Right for: stores where you can't run your own worker, low-write tables,
clean updated_at discipline.
Not right for: deletes (you need soft-deletes or a tombstone column), write rates the polling cadence can't keep up with, multi-source topologies where timing matters.
3. smartSyncInterval — scheduled smart sync (with deletes)
The missing middle. Same content-hash + tombstone pipeline as the
Console "Sync now" button, but scheduled — so hard deletes get cleaned
up automatically without webhook infrastructure. Declare it alongside
syncInterval and you get the best of both: fast incremental refresh
plus a periodic full sweep that catches what incremental can't.
Right for: low-write tables where you can't wire CDC, teams that want
deletes handled correctly without running their own cron, anyone whose
source doesn't have a reliable updated_at (smart sync doesn't need
one).
Not right for: sub-minute latency, huge tables under aggressive schedules (every run reads every row — see cost table below).
Tradeoffs:
| Setup | One config line (or two, paired with syncInterval) |
| Latency | Up to the interval (e.g. 24h for '24h') |
| Deletes | Yes — tombstone detection. Rows absent from the source are purged. |
| Cost | Scales with lens size. Content-hash dedup means unchanged rows are cheap (no re-embed), but every run reads every source row. See your tier's smart sync quota. |
| Tier | Pro and above for scheduled (Free tier: manual button only, 5/month). |
Manual trigger is always available (Console "Sync now" button or
semilayer sync) and uses the same handler. Manual runs count against
the same tier monthly quota as scheduled runs — the button isn't a
bypass.
4. Webhook ingest — you push us the deltas
The pattern most production apps end up on. Your CDC pipeline (Postgres logical replication, triggers, Debezium, AWS DMS, a cron in your worker) posts a list of changed record ids to our webhook. We buffer, dedup, and re-index just those rows.
Right for: exactly-as-fast-as-your-writes freshness, precise delete handling, large tables where polling would be wasteful.
This is the "Ingest" section. Head there for the full API, the
ik_ key details, and real CDC patterns for Postgres, MySQL, and AWS.
→ Ingest
Recommended pairings
| Scenario | Config |
|---|---|
| Small-to-medium table, no CDC | syncInterval: '5m' + smartSyncInterval: '24h' — fast incremental + nightly delete sweep |
| High-write production with CDC | Records webhook + optional smartSyncInterval: '24h' as a safety net |
| Append-only / logs | syncInterval: '5m' alone — no deletes to worry about |
| Occasional / manual refreshes | Console "Sync now" button + no interval declared |
The first row is the answer most teams want and didn't know they could get declaratively. Two config lines, zero webhook code, correct deletes.