Cursors & streaming

Three pagination shapes show up across SemiLayer; picking the right one is a ~5-minute decision that pays off forever. This page maps the four read surfaces — query, drill-down, WebSocket streams, and exports — onto the right shape, and explains the cursor model.

The four shapes

Shape	Endpoint(s)	What you get back	Right for
`offset` + `limit`	`query`	One page; you advance `offset` yourself	Admin tables with page links, ≤ ~5 pages deep
`cursor`	`query`, `analyze.rows`	One page + an opaque cursor for the next	Infinite scroll, deep pagination, drill-down
WebSocket batch	`stream.query`, `stream.search`	Per-batch frames terminated by `done`	Full-table walks, exports without writing to disk
Streaming export	`analyze.rows.export`, `query.export`	NDJSON or CSV body, chunked, with truncation trailer	"Download all matching rows as a file"

The shorthand:

Read a few pages → cursor
Read every page → streaming export
React to new rows over time → WS subscribe (different primitive — see Realtime)

offset vs cursor

The obvious difference: offset cursors restart the count-and-discard each call; opaque cursors encode the position. The non-obvious difference: offset cursors race with concurrent writes — rows shift under you, some appear twice or get skipped — while cursors stay coherent.

// offset — fine for admin pickers, breaks at scale
const page = await beam.orders.query({
  where:   { status: 'shipped' },
  orderBy: { field: 'placed_at', dir: 'desc' },
  limit:   20,
  offset:  pageNumber * 20,
})

// cursor — the deep-scroll-safe shape
const first = await beam.orders.query({
  where:   { status: 'shipped' },
  orderBy: { field: 'placed_at', dir: 'desc' },
  limit:   20,
})
const next  = await beam.orders.query({
  where:   { status: 'shipped' },
  orderBy: { field: 'placed_at', dir: 'desc' },
  limit:   20,
  cursor:  first.meta.nextCursor,
})
// Terminate when meta.nextCursor is undefined.

Always set orderBy when you paginate. The cursor encodes the current sort position; if you change the sort between calls, the cursor is meaningless and the next page will look random. The platform appends the lens's primary key as a server-side tiebreaker, so even a single-field orderBy paginates stably.

Drill-down cursors

analyze.<name>.rows() paginates with the same cursor shape query uses. The server generates the cursor itself — an offset under the hood — and the rows respect the resolved orderBy (your override + the PK tiebreaker). Drill on a static bucket is read-only, so write-races aren't a concern at this surface; the offset scheme is exactly the right grain.

const first = await beam.products.analyze.byCategory.rows({
  bucketKey,
  limit: 25,
})
const next  = await beam.products.analyze.byCategory.rows({
  bucketKey,
  cursor: first.cursor,
  limit:  25,
})

If you've already drained the cursor and want a CSV instead of a JSON page loop, swap to exports — same predicate, one body, no page-management code.

When to reach for WS streaming

stream.query opens a WebSocket and yields rows per batch:

for await (const row of beam.orders.stream.query({
  where:   { status: 'shipped' },
  orderBy: { field: 'placed_at', dir: 'asc' },
  limit:   50000,
})) {
  process(row)
}

The shape is similar to the streaming export, but the use case differs:

WS streaming stays open as a long-lived socket; suits programmatic fan-out where the consumer wants typed M per row, no parsing.
Streaming export speaks plain HTTP chunked encoding; suits "download to disk" / "pipe through jq / wc -l" / "share the body with a non-SemiLayer consumer."

Pick WS when the consumer is your TypeScript app. Pick the export when the consumer is a file, a shell pipeline, or a third party.

When to reach for exports

Streaming exports are the right answer when:

You want all matching rows, not just the next page.
You want the body on disk or piped through a shell tool.
You don't want to manage pagination state in your code.

The cap is per-call, tier-aware (Free 10k → Scale 10M → Enterprise unlimited), and the response sets X-SemiLayer-Export-Truncated if you hit it. Beam wraps the trailer as a final { kind: 'truncated' } chunk so you don't need to read it from raw HTTP.

Cursor stability under writes

Surface	Stable under writes?
`query` cursor	Stable. The cursor encodes the sort-key position, so concurrent inserts/updates only show up on the next `nextCursor` walk. Deletes mid-walk are safely skipped.
`analyze.rows` cursor	Drill is on a static bucket — the bucketKey snapshot of the predicate. Concurrent writes to the source DB don't shift the bucket's row set within the 24h `bucketKey` TTL.
`offset`	Not stable. Concurrent writes shift rows under your offset; expect duplicates and skips. Use cursor instead for any deep walk.
WS streaming	Bridge-dependent. Postgres holds a server-side cursor; Mongo holds a snapshot. The bridge cleans up if you abort.
Streaming exports	Same posture as the underlying cursor (bridge-snapshotted where the bridge supports it).

For the rare "I need a write-stable cursor over query() directly" use case, run the export endpoint instead — bridges that support snapshot isolation will use it under the streaming export path.

Aborting cleanly

Every streaming shape — WS streams, drill-down loops, exports — closes the underlying bridge cursor when the consumer aborts. No leaked transactions.

const controller = new AbortController()

setTimeout(() => controller.abort(), 60_000)  // cancel after a minute

for await (const row of beam.orders.stream.query({
  where: { status: 'shipped' },
  signal: controller.signal,   // cooperative cancellation
})) {
  process(row)
}

For exports, the React useExportRows({ ... }).cancel() and Beam's AsyncIterable.return() both propagate the abort to the server.

Choosing — the cheat sheet

Need	Reach for
Admin UI with page picker	`query({ offset, limit })`
Infinite scroll	`query({ cursor, limit })`
Bucket-scoped drill	`analyze.<name>.rows({ bucketKey, cursor })`
Search inside a bucket	`analyze.<name>.rows({ bucketKey, search, cursor })`
Download all rows of a bucket	`analyze.<name>.exportRows({ bucketKey, format })`
Download all rows of a `where`	`BeamClient.queryExport({ lens, where, format })` (or `semilayer query --export`)
Walk a whole table programmatically	`stream.query({ where })`
React to new rows over time	`stream.subscribe({ filter })` — different primitive
Live-updating dashboard	`useAnalyze({ liveUpdates: true })`

When in doubt, start with cursor pagination. It's the right answer for ~80% of UIs, and you can always upgrade to the streaming export when the user reaches for "download."