Analyze
search ranks by meaning. query returns exact rows. Analyze is different
again: a declarative aggregation primitive that turns a lens into a typed,
drillable, optionally live-updating dashboard. Each analysis is declared under
analyses.<name> and granted under grants.analyze.<name>.
One config block becomes:
- A typed Beam method —
beam.<lens>.analyze.<name>(input)returns tidy buckets ready for any chart library - A drill-down —
beam.<lens>.analyze.<name>.rows(bucketKey)returns the contributing rows through the same RBAC + mapping pipelinequeryuses - A streaming export — NDJSON or RFC-4180 CSV, tier-capped, no in-memory buffering
- A WebSocket subscription —
analyze.snapshotonce, thenanalyze.diffframes as buckets change in place - A debug endpoint —
.explain(input)returns plan, bridges, sketches, expected cost (sk-only) - A React hook —
useAnalyze(handle, { liveUpdates, swr, cache })
See it live at demo.semilayer.com/analyze — four analyses on one lens (by category, top brands, price distribution, inventory by category), rendered through @semilayer/charts with click-to-drill, search-inside-bucket, sort, and streaming export.
Examples
Four named analyses on one lens, four chart shapes — all from the same tidy
AnalyzeResult envelope. Every bucket carries a signed bucketKey; clicking
any segment drills to the contributing rows through the same RBAC + mapping
pipeline query uses.
The charts above are rendered with @semilayer/charts — a
zero-framework-dep SVG renderer that ships 14 shapes (line, area, stackedArea,
bar, stackedBar, scatter, pie, donut, heatmap, geo, funnel, cohort, treemap,
radar, plus a sortable table). For React apps, @semilayer/react-charts
wraps the same engine with a <AnalyzeChart> component and a useAnalyzeChart
hook that handles live-updates, animations, and exports.
Three kinds, one envelope
A single lens can host as many named analyses as you want. The kind
discriminator picks the engine:
metric— the default. Dim × measure tidy data. Powers every line, bar, pie, treemap, and heatmap chart. Dimensions can be raw fields, bucketed numerics, time buckets (minute|hour|day|week|month|quarter|year), geohash / H3 cells, or — uniquely —semanticclusters that bucket rows by k-means over their stored embeddings.funnel— ordered step counts + dropoff. Each step has a name and a predicate; the engine walks rows once, tracks per-entity earliest match per step inside a configurable window, and emits a per-step bucket withcount,dropoff,conversionRatemeasures.cohort— entities × intervals matrix. Each entity is assigned a cohort by their first-event time; every interval cell counts how many reappear. Powers retention curves and heatmaps.
All three kinds emit the same AnalyzeResult<D, M> envelope, so downstream
consumers (charts, drill-down, export) are kind-agnostic. Codegen specializes
the type per analysis.
The building blocks
Every field is typed in @semilayer/core's AnalyzeFacetConfig. The Beam
codegen reads the spec and emits per-analysis methods with precise dim ×
measure types — count / sum / avg come back as number, top_k as
Array<{ key: string; count: number }>, time buckets as string.
How execution adapts to the bridge
Every published bridge advertises an aggregate() method, but not every
bridge has the same engine support. The planner picks one of three execution
strategies per analyze run, automatically:
| Strategy | When it runs | What you pay |
|---|---|---|
pushdown | The bridge's engine handles the GROUP BY (Postgres, ClickHouse, MySQL, Mongo, …) | One round-trip; native engine cost |
streaming | The bridge can't aggregate (Redis, DynamoDB, Firestore, …) — the SDK helper or service-side reducer pages rows and reduces in-process | Memory bounded by O(buckets × measure-state); scales to billions on a constant footprint |
hybrid | Two lenses joined through a declared relations entry — parent + child both fetched, joined service-side | Bounded by per-tier parentSetMax + candidatesMax caps |
The customer never sees the strategy unless they call .explain() (sk-only).
Result is the same shape either way.
Drill-down, exports, live tail
Three primitives that compose with every analysis:
- Drill-down — every bucket carries a signed
bucketKeythat round-trips toanalyze.<name>.rows(bucketKey). Drill rows go through the same RBAC + mapping pipeline asquery. Inside the bucket, search (auto / simple / semantic / hybrid), sort, and cursor pagination work exactly likequery— drill is "query, but already filtered to the bucket's predicate." - Exports — full-set drill via streaming NDJSON or CSV.
BeamClient.exportRows()returns anAsyncIterable<RowChunk>;useExportRowsin React wraps it withstart/cancel/progress/truncatedstate. Tier-capped (Free 10k → Scale 10M → Enterprise unlimited); theX-SemiLayer-Export-Truncatedtrailer fires when the cap halts the stream before the cursor drains. - Live tail —
useAnalyze(handle, { liveUpdates: true })opens the same WS as feeds. Server emitsanalyze.snapshotonce, then debouncedanalyze.diffframes carryingadded/changed/removedbucket lists- new
totals. Dashboards update in place; the engine's recompute-and-diff strategy keeps the contract simple at the cost of one fresh aggregate per debounce window.
- new
Vector-aware aggregation — the unique edge
candidates.similarTo narrows the candidate pool by cosine similarity
before the GROUP BY runs. The dimension can be an ordinary field, OR — the
move BI tools without vectors can't make — bucket: { type: 'semantic', clusters: N },
which clusters rows by their stored embeddings via in-process k-means.
Drill-down follows: clicking a cluster bucket fetches the rows that fell into it, narrowed by the same similarity predicate.
Access rules
Per-named-analysis. Add to grants.analyze:
Analyses without an explicit grant default to deny for pk_ keys. sk_
keys bypass. Same posture as feeds. Drill-down inherits the parent
analysis's access rule — no separate grants.analyze.<name>.rows.
Tier-aware safety rails
Three per-call ceilings keep one analyze from monopolizing the platform:
| Cap | Free | Pro | Team | Scale | Enterprise |
|---|---|---|---|---|---|
| Candidate pool / scan budget | 100k | 1M | 10M | 100M | unlimited |
| Cross-source parent set | 50k | 250k | 1M | 5M | unlimited |
| Exact reducer values (per call) | 1M | 5M | 25M | 100M | unlimited |
| Export rows (per call) | 10k | 100k | 1M | 10M | unlimited |
Every successful response carries X-SemiLayer-Candidates-Clamped so callers
can see which ceiling they ran under. Hitting a cap returns 413 with a
structured error pointing at the upgrade path or a tighter
candidates.where. See Tier limits — Phase O for the full
matrix once shipped.
Where to start
- Drill-down — bucketKey round-trip, search (auto / simple / semantic / hybrid), sort, cursor pagination.
- Exports — streaming NDJSON / CSV, the truncation trailer, the
useExportRowshook. - Cursors & streaming — how SemiLayer paginates across
query,analyze.rows, and the streaming exports — when to reach for each. semilayer analyzeCLI — list / run / rows / explain / plan from the terminal, NDJSON-tailable.- REST API → Analyze — every route, every body, every error code.