SemiLayerDocs

Keeping data fresh

Your source is the truth. SemiLayer's index needs to stay close behind. You have four ways to keep them aligned:

1. push --resume-ingest / --rebuild

The CLI command you already know. Runs a full or incremental ingest on demand. Right for the initial load and for occasional big shifts (renaming fields, moving a column). Not right for routine drift — running it on a cron is wasteful.

semilayer push --resume-ingest         # incremental from the last cursor
semilayer push --rebuild --skip-confirmation-dangerously   # full re-index

2. syncInterval — periodic polling (incremental)

Declare a cadence on the lens and the worker polls the source for you using the changeTrackingColumn (defaults to updated_at).

products: {
  // ... fields, facets
  syncInterval: '5m',     // '1m' | '5m' | '15m' | '30m' | '1h' | '6h' | '24h'
  changeTrackingColumn: 'updated_at',
}

Right for: stores where you can't run your own worker, low-write tables, clean updated_at discipline.

Not right for: deletes (you need soft-deletes or a tombstone column), write rates the polling cadence can't keep up with, multi-source topologies where timing matters.

3. smartSyncInterval — scheduled smart sync (with deletes)

The missing middle. Same content-hash + tombstone pipeline as the Console "Sync now" button, but scheduled — so hard deletes get cleaned up automatically without webhook infrastructure. Declare it alongside syncInterval and you get the best of both: fast incremental refresh plus a periodic full sweep that catches what incremental can't.

products: {
  // ... fields, facets
  syncInterval: '5m',           // fast partial refresh
  smartSyncInterval: '24h',     // nightly tombstone sweep
  changeTrackingColumn: 'updated_at',   // only used by incremental
}

Right for: low-write tables where you can't wire CDC, teams that want deletes handled correctly without running their own cron, anyone whose source doesn't have a reliable updated_at (smart sync doesn't need one).

Not right for: sub-minute latency, huge tables under aggressive schedules (every run reads every row — see cost table below).

Tradeoffs:

SetupOne config line (or two, paired with syncInterval)
LatencyUp to the interval (e.g. 24h for '24h')
DeletesYes — tombstone detection. Rows absent from the source are purged.
CostScales with lens size. Content-hash dedup means unchanged rows are cheap (no re-embed), but every run reads every source row. See your tier's smart sync quota.
TierPro and above for scheduled (Free tier: manual button only, 5/month).

Manual trigger is always available (Console "Sync now" button or semilayer sync) and uses the same handler. Manual runs count against the same tier monthly quota as scheduled runs — the button isn't a bypass.

4. Webhook ingest — you push us the deltas

The pattern most production apps end up on. Your CDC pipeline (Postgres logical replication, triggers, Debezium, AWS DMS, a cron in your worker) posts a list of changed record ids to our webhook. We buffer, dedup, and re-index just those rows.

curl -X POST https://api.semilayer.com/v1/ingest/products \
  -H "Authorization: Bearer ik_production_..." \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "records",
    "changes": [
      { "id": "p_7712", "action": "upsert" },
      { "id": "p_0451", "action": "delete" }
    ]
  }'

Right for: exactly-as-fast-as-your-writes freshness, precise delete handling, large tables where polling would be wasteful.

This is the "Ingest" section. Head there for the full API, the ik_ key details, and real CDC patterns for Postgres, MySQL, and AWS.

Ingest


ScenarioConfig
Small-to-medium table, no CDCsyncInterval: '5m' + smartSyncInterval: '24h' — fast incremental + nightly delete sweep
High-write production with CDCRecords webhook + optional smartSyncInterval: '24h' as a safety net
Append-only / logssyncInterval: '5m' alone — no deletes to worry about
Occasional / manual refreshesConsole "Sync now" button + no interval declared

The first row is the answer most teams want and didn't know they could get declaratively. Two config lines, zero webhook code, correct deletes.