SemiLayerDocs

Analyze

search ranks by meaning. query returns exact rows. Analyze is different again: a declarative aggregation primitive that turns a lens into a typed, drillable, optionally live-updating dashboard. Each analysis is declared under analyses.<name> and granted under grants.analyze.<name>.

One config block becomes:

  • A typed Beam method — beam.<lens>.analyze.<name>(input) returns tidy buckets ready for any chart library
  • A drill-down — beam.<lens>.analyze.<name>.rows(bucketKey) returns the contributing rows through the same RBAC + mapping pipeline query uses
  • A streaming export — NDJSON or RFC-4180 CSV, tier-capped, no in-memory buffering
  • A WebSocket subscription — analyze.snapshot once, then analyze.diff frames as buckets change in place
  • A debug endpoint — .explain(input) returns plan, bridges, sketches, expected cost (sk-only)
  • A React hook — useAnalyze(handle, { liveUpdates, swr, cache })
analyses: {
  byCategory: {
    candidates: { where: { in_stock: true } },
    dimensions: [{ field: 'category' }],
    measures: {
      total:    { agg: 'count' },
      avgPrice: { agg: 'avg', column: 'price' },
    },
    sort: [{ measure: 'total', dir: 'desc' }],
    limit: 10,
  },
},
grants: { analyze: { byCategory: 'public' } },
const result = await beam.products.analyze.byCategory({})
// result.buckets[0].dims.category   ← string
// result.buckets[0].measures.total  ← number
// result.buckets[0].bucketKey       ← signed token, round-trips to .rows()
// result.meta.strategy              ← 'pushdown' | 'streaming' | 'hybrid'
ℹ️

See it live at demo.semilayer.com/analyze — four analyses on one lens (by category, top brands, price distribution, inventory by category), rendered through @semilayer/charts with click-to-drill, search-inside-bucket, sort, and streaming export.

Examples

Four named analyses on one lens, four chart shapes — all from the same tidy AnalyzeResult envelope. Every bucket carries a signed bucketKey; clicking any segment drills to the contributing rows through the same RBAC + mapping pipeline query uses.

The charts above are rendered with @semilayer/charts — a zero-framework-dep SVG renderer that ships 14 shapes (line, area, stackedArea, bar, stackedBar, scatter, pie, donut, heatmap, geo, funnel, cohort, treemap, radar, plus a sortable table). For React apps, @semilayer/react-charts wraps the same engine with a <AnalyzeChart> component and a useAnalyzeChart hook that handles live-updates, animations, and exports.

Three kinds, one envelope

A single lens can host as many named analyses as you want. The kind discriminator picks the engine:

  1. metric — the default. Dim × measure tidy data. Powers every line, bar, pie, treemap, and heatmap chart. Dimensions can be raw fields, bucketed numerics, time buckets (minute|hour|day|week|month|quarter|year), geohash / H3 cells, or — uniquely — semantic clusters that bucket rows by k-means over their stored embeddings.
  2. funnel — ordered step counts + dropoff. Each step has a name and a predicate; the engine walks rows once, tracks per-entity earliest match per step inside a configurable window, and emits a per-step bucket with count, dropoff, conversionRate measures.
  3. cohort — entities × intervals matrix. Each entity is assigned a cohort by their first-event time; every interval cell counts how many reappear. Powers retention curves and heatmaps.

All three kinds emit the same AnalyzeResult<D, M> envelope, so downstream consumers (charts, drill-down, export) are kind-agnostic. Codegen specializes the type per analysis.

The building blocks

analyses: {
  <name>: {
    kind?: 'metric' | 'funnel' | 'cohort',  // default 'metric'
    candidates: {
      where?: Record<string, unknown>,      // pre-filter (full operator grammar)
      similarTo?: { from, mode?, threshold? }, // narrow by vector — the unique edge
      sample?: number,                      // 0 < x ≤ 1, server-side random sample
      relatedThrough?: string,              // pull candidates through a relation
    },
    dimensions: [{ field, bucket?, order?, as? }],   // metric only
    measures: {                                       // metric only
      <alias>: {
        agg: 'count' | 'count_distinct' | 'sum' | 'avg' |
             'min' | 'max' | 'percentile' | 'top_k' |
             'first' | 'last' | 'rate',
        column?, p?, k?, accuracy?: 'fast' | 'exact',
        where?,                            // measure-only filter
      }
    },
    having?: Record<string, unknown>,      // post-aggregate filter
    sort?: [{ measure?, dimension?, dir }],
    limit?: number,
    precompute?: boolean | { onlyAdditive?, refreshInterval? },
    evolve?: {
      onSubscribe?: 'replay' | 'fresh',
      pollOnIngest: boolean,
      minNotifyInterval?: '1s' | '5s' | '30s' | '1m',
    },
  },
}

Every field is typed in @semilayer/core's AnalyzeFacetConfig. The Beam codegen reads the spec and emits per-analysis methods with precise dim × measure types — count / sum / avg come back as number, top_k as Array<{ key: string; count: number }>, time buckets as string.

How execution adapts to the bridge

Every published bridge advertises an aggregate() method, but not every bridge has the same engine support. The planner picks one of three execution strategies per analyze run, automatically:

StrategyWhen it runsWhat you pay
pushdownThe bridge's engine handles the GROUP BY (Postgres, ClickHouse, MySQL, Mongo, …)One round-trip; native engine cost
streamingThe bridge can't aggregate (Redis, DynamoDB, Firestore, …) — the SDK helper or service-side reducer pages rows and reduces in-processMemory bounded by O(buckets × measure-state); scales to billions on a constant footprint
hybridTwo lenses joined through a declared relations entry — parent + child both fetched, joined service-sideBounded by per-tier parentSetMax + candidatesMax caps

The customer never sees the strategy unless they call .explain() (sk-only). Result is the same shape either way.

Drill-down, exports, live tail

Three primitives that compose with every analysis:

  • Drill-down — every bucket carries a signed bucketKey that round-trips to analyze.<name>.rows(bucketKey). Drill rows go through the same RBAC + mapping pipeline as query. Inside the bucket, search (auto / simple / semantic / hybrid), sort, and cursor pagination work exactly like query — drill is "query, but already filtered to the bucket's predicate."
  • Exports — full-set drill via streaming NDJSON or CSV. BeamClient.exportRows() returns an AsyncIterable<RowChunk>; useExportRows in React wraps it with start / cancel / progress / truncated state. Tier-capped (Free 10k → Scale 10M → Enterprise unlimited); the X-SemiLayer-Export-Truncated trailer fires when the cap halts the stream before the cursor drains.
  • Live tailuseAnalyze(handle, { liveUpdates: true }) opens the same WS as feeds. Server emits analyze.snapshot once, then debounced analyze.diff frames carrying added / changed / removed bucket lists
    • new totals. Dashboards update in place; the engine's recompute-and-diff strategy keeps the contract simple at the cost of one fresh aggregate per debounce window.

Vector-aware aggregation — the unique edge

candidates.similarTo narrows the candidate pool by cosine similarity before the GROUP BY runs. The dimension can be an ordinary field, OR — the move BI tools without vectors can't make — bucket: { type: 'semantic', clusters: N }, which clusters rows by their stored embeddings via in-process k-means.

// "How does the average price break down across semantic clusters of products
//  that are similar to what this user has browsed?"
analyses: {
  similarToMyBrowsing: {
    candidates: {
      similarTo: { from: 'context.recentlyViewedTitles' },
    },
    dimensions: [{ field: 'description', bucket: { type: 'semantic', clusters: 5 } }],
    measures: { avgPrice: { agg: 'avg', column: 'price' } },
  },
},

Drill-down follows: clicking a cluster bucket fetches the rows that fell into it, narrowed by the same similarity predicate.

Access rules

Per-named-analysis. Add to grants.analyze:

grants: {
  analyze: {
    byCategory:    'public',
    revenueByDay:  'authenticated',
    funnelByCohort: 'staff',   // claim-check rule
  },
}

Analyses without an explicit grant default to deny for pk_ keys. sk_ keys bypass. Same posture as feeds. Drill-down inherits the parent analysis's access rule — no separate grants.analyze.<name>.rows.

Tier-aware safety rails

Three per-call ceilings keep one analyze from monopolizing the platform:

CapFreeProTeamScaleEnterprise
Candidate pool / scan budget100k1M10M100Munlimited
Cross-source parent set50k250k1M5Munlimited
Exact reducer values (per call)1M5M25M100Munlimited
Export rows (per call)10k100k1M10Munlimited

Every successful response carries X-SemiLayer-Candidates-Clamped so callers can see which ceiling they ran under. Hitting a cap returns 413 with a structured error pointing at the upgrade path or a tighter candidates.where. See Tier limits — Phase O for the full matrix once shipped.

Where to start

  • Drill-down — bucketKey round-trip, search (auto / simple / semantic / hybrid), sort, cursor pagination.
  • Exports — streaming NDJSON / CSV, the truncation trailer, the useExportRows hook.
  • Cursors & streaming — how SemiLayer paginates across query, analyze.rows, and the streaming exports — when to reach for each.
  • semilayer analyze CLI — list / run / rows / explain / plan from the terminal, NDJSON-tailable.
  • REST API → Analyze — every route, every body, every error code.