SemiLayerDocs

Feeds — Signals

Feeds rank rows using two kinds of signals:

  • Engagement — rows in another lens you own that reference the candidate (likes, views, clicks).
  • Context — values your app passes in at call time (user preferences, current topic, seed record).

Both funnel into the scorers (engagement, similarity). Both have one non-negotiable property: the data stays on your side. SemiLayer reads your engagement lens through a bridge it already has access to, and accepts context as opaque JSON. We never aggregate engagement across customers.

Engagement — via a sibling lens

The usual shape: your product has a recipes table and a recipe_likes table. Declare both as lenses, declare a relation between them, and point the engagement scorer at the sibling lens.

feedrecipes
// Sibling lens — one row per like
recipe_likes: {
  source: 'main',
  table: 'recipe_likes',
  fields: {
    id:         { type: 'number', primaryKey: true },
    recipe_id:  { type: 'number' },
    user_id:    { type: 'number' },
    created_at: { type: 'date' },
  },
  grants: { query: 'staff' },
},

// Feed lens — declares the relation once, uses it in the scorer
recipes: {
  source: 'main',
  table: 'recipes',
  fields: { /* ... */ },
  relations: {
    likes: {
      lens: 'recipe_likes',
      kind: 'hasMany',
      on: { id: 'recipe_id' },
    },
  },
  feeds: {
    discover: {
      candidates: { from: 'embeddings', limit: 500 },
      rank: {
        similarity: { weight: 0.6, against: 'liked_titles' },
        engagement: {
          weight: 0.3,
          lens: 'recipe_likes',
          relation: 'likes',       // ← join derived from relations.likes.on
          aggregate: 'recent_count',
          window: '24h',
          decay: 'log',
        },
        recency: { weight: 0.1, halfLife: '7d' },
      },
    },
  },
  grants: { feed: { discover: 'public' } },
}

Why through a lens, not a raw table

Because a lens is already governed. It has:

  • Access rules (enforced for pk_ callers)
  • A bridge (airgap-aware, runner-routable)
  • A known schema the config validates against

Nothing escapes the system. There's no "and also give the feed ranker raw table access" path to audit.

Engagement config — explicit vs relation-derived

// Preferred — relation is declared once, used everywhere
engagement: {
  lens: 'recipe_likes',
  relation: 'likes',
  aggregate: 'recent_count',
  window: '24h',
}

// Fallback — explicit join, useful when the relation isn't declared on this lens
engagement: {
  lens: 'recipe_likes',
  join: { local: 'id', foreign: 'recipe_id' },
  aggregate: 'recent_count',
  window: '24h',
}

Context — what your app passes in

Context is an opaque Record<string, unknown> passed per-request. Three canonical shapes, all valid for similarity.against:

1. Pre-computed vector

await beam.recipes.feed.discover({
  context: {
    user_profile_vec: [0.12, -0.34, ...],   // you computed this your side
  },
})

against: 'user_profile_vec'. Server treats any number[] of the right dimension as a ready-to-compare vector. Zero embedding API call. Best for personalization where you already run your own profile pipeline.

2. Embedded text

await beam.recipes.feed.discover({
  context: {
    liked_titles: ['Tom Yum Goong', 'Laksa', 'Pad Thai'],
  },
})

against: 'liked_titles'. Server joins the array with newlines, embeds it on demand, and caches the result per-pod. Same pod + same text = one API call regardless of how many requests.

3. Row id → stored vector

await beam.recipes.feed.relatedTo({
  context: {
    seedRecordId: 'r_104',     // a mapped field name or sourceRowId
  },
})

against: { from: 'context.seedRecordId', mode: 'recordVector' }. Server looks up the row's stored vector and uses it directly. Zero embedding API call, ever. Powers "more like this." See Related items.

The 600-likes problem

If a user has liked 600 things and you pass all 600 in liked_titles, the embedding cost balloons. Three scaling patterns:

  1. Compress on your side — pre-compute an average vector from the user's last N likes and pass it in as a number[]. No server embedding cost, no token limits.
  2. Slice to recency — pass only the last 20 liked titles. Tend to the better signal; older likes drift.
  3. Use recordVector — pick one seed record per request (the "most recently liked") and use mode: 'recordVector'. Zero API cost.

Most production feeds use (1). Clients that can't compute vectors use (2). Only "show me things like this" uses (3).

What SemiLayer never does

  • Never aggregates engagement across customers. Your recipe_likes lens is yours; the feed ranker reads it through your bridge with your access rules.
  • Never persists your context. Context is per-request; we cache the embedded form keyed by sha256(text) for up to 5 minutes (configurable), then drop it.

Your data, your ranking signals. SemiLayer is the coordinator.