Feeds — Signals
Feeds rank rows using two kinds of signals:
- Engagement — rows in another lens you own that reference the candidate (likes, views, clicks).
- Context — values your app passes in at call time (user preferences, current topic, seed record).
Both funnel into the scorers (engagement, similarity). Both have one
non-negotiable property: the data stays on your side. SemiLayer reads
your engagement lens through a bridge it already has access to, and accepts
context as opaque JSON. We never aggregate engagement across customers.
Engagement — via a sibling lens
The usual shape: your product has a recipes table and a recipe_likes
table. Declare both as lenses, declare a relation between them, and point the
engagement scorer at the sibling lens.
Why through a lens, not a raw table
Because a lens is already governed. It has:
- Access rules (enforced for
pk_callers) - A bridge (airgap-aware, runner-routable)
- A known schema the config validates against
Nothing escapes the system. There's no "and also give the feed ranker raw table access" path to audit.
Engagement config — explicit vs relation-derived
Context — what your app passes in
Context is an opaque Record<string, unknown> passed per-request. Three
canonical shapes, all valid for similarity.against:
1. Pre-computed vector
against: 'user_profile_vec'. Server treats any number[] of the right
dimension as a ready-to-compare vector. Zero embedding API call. Best for
personalization where you already run your own profile pipeline.
2. Embedded text
against: 'liked_titles'. Server joins the array with newlines, embeds it
on demand, and caches the result per-pod. Same pod + same text = one API
call regardless of how many requests.
3. Row id → stored vector
against: { from: 'context.seedRecordId', mode: 'recordVector' }. Server
looks up the row's stored vector and uses it directly. Zero
embedding API call, ever. Powers "more like this." See
Related items.
The 600-likes problem
If a user has liked 600 things and you pass all 600 in liked_titles, the
embedding cost balloons. Three scaling patterns:
- Compress on your side — pre-compute an average vector from the user's last N likes and pass it in as a
number[]. No server embedding cost, no token limits. - Slice to recency — pass only the last 20 liked titles. Tend to the better signal; older likes drift.
- Use recordVector — pick one seed record per request (the "most recently liked") and use
mode: 'recordVector'. Zero API cost.
Most production feeds use (1). Clients that can't compute vectors use (2). Only "show me things like this" uses (3).
What SemiLayer never does
- Never aggregates engagement across customers. Your
recipe_likeslens is yours; the feed ranker reads it through your bridge with your access rules. - Never persists your context. Context is per-request; we cache the embedded form keyed by
sha256(text)for up to 5 minutes (configurable), then drop it.
Your data, your ranking signals. SemiLayer is the coordinator.