Keeping data fresh

Your source is the truth. SemiLayer's index needs to stay close behind. You have four ways to keep them aligned:

1. `push --resume-ingest` / `--rebuild`

The CLI command you already know. Runs a full or incremental ingest on demand. Right for the initial load and for occasional big shifts (renaming fields, moving a column). Not right for routine drift — running it on a cron is wasteful.

semilayer push --resume-ingest         # incremental from the last cursor
semilayer push --rebuild --skip-confirmation-dangerously   # full re-index

2. `syncInterval` — periodic polling (incremental)

Declare a cadence on the lens and the worker polls the source for you using the changeTrackingColumn (defaults to updated_at).

products: {
  // ... fields, facets
  syncInterval: '5m',     // '1m' | '5m' | '15m' | '30m' | '1h' | '6h' | '24h'
  changeTrackingColumn: 'updated_at',
}

Right for: stores where you can't run your own worker, low-write tables, clean updated_at discipline.

Not right for: deletes (you need soft-deletes or a tombstone column), write rates the polling cadence can't keep up with, multi-source topologies where timing matters.

3. `smartSyncInterval` — scheduled smart sync (with deletes)

The missing middle. Same content-hash + tombstone pipeline as the Console "Sync now" button, but scheduled — so hard deletes get cleaned up automatically without webhook infrastructure. Declare it alongside syncInterval and you get the best of both: fast incremental refresh plus a periodic full sweep that catches what incremental can't.

products: {
  // ... fields, facets
  syncInterval: '5m',           // fast partial refresh
  smartSyncInterval: '24h',     // nightly tombstone sweep
  changeTrackingColumn: 'updated_at',   // only used by incremental
}

Right for: low-write tables where you can't wire CDC, teams that want deletes handled correctly without running their own cron, anyone whose source doesn't have a reliable updated_at (smart sync doesn't need one).

Not right for: sub-minute latency, huge tables under aggressive schedules (every run reads every row — see cost table below).

Tradeoffs:


Setup	One config line (or two, paired with `syncInterval`)
Latency	Up to the interval (e.g. 24h for `'24h'`)
Deletes	Yes — tombstone detection. Rows absent from the source are purged.
Cost	Scales with lens size. Content-hash dedup means unchanged rows are cheap (no re-embed), but every run reads every source row. See your tier's smart sync quota.
Tier	Pro and above for scheduled (Free tier: manual button only, 5/month).

Manual trigger is always available (Console "Sync now" button or semilayer sync) and uses the same handler. Manual runs count against the same tier monthly quota as scheduled runs — the button isn't a bypass.

4. Webhook ingest — you push us the deltas

The pattern most production apps end up on. Your CDC pipeline (Postgres logical replication, triggers, Debezium, AWS DMS, a cron in your worker) posts a list of changed record ids to our webhook. We buffer, dedup, and re-index just those rows.

curl -X POST https://api.semilayer.com/v1/ingest/products \
  -H "Authorization: Bearer ik_production_..." \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "records",
    "changes": [
      { "id": "p_7712", "action": "upsert" },
      { "id": "p_0451", "action": "delete" }
    ]
  }'

Right for: exactly-as-fast-as-your-writes freshness, precise delete handling, large tables where polling would be wasteful.

This is the "Ingest" section. Head there for the full API, the ik_ key details, and real CDC patterns for Postgres, MySQL, and AWS.

→ Ingest

Recommended pairings

Scenario	Config
Small-to-medium table, no CDC	`syncInterval: '5m'` + `smartSyncInterval: '24h'` — fast incremental + nightly delete sweep
High-write production with CDC	Records webhook + optional `smartSyncInterval: '24h'` as a safety net
Append-only / logs	`syncInterval: '5m'` alone — no deletes to worry about
Occasional / manual refreshes	Console "Sync now" button + no interval declared

The first row is the answer most teams want and didn't know they could get declaratively. Two config lines, zero webhook code, correct deletes.

Keeping data fresh

1. push --resume-ingest / --rebuild

2. syncInterval — periodic polling (incremental)

3. smartSyncInterval — scheduled smart sync (with deletes)

4. Webhook ingest — you push us the deltas

Recommended pairings

1. `push --resume-ingest` / `--rebuild`

2. `syncInterval` — periodic polling (incremental)

3. `smartSyncInterval` — scheduled smart sync (with deletes)