SemiLayerDocs

Data Mapping — Recipes

Eight patterns pulled from production configs. Copy the block that matches, adapt the field names, and move on.

Full name from first + last

The canonical merge case. Use concat with a space.

fullName: {
  type:      'text',
  from:      ['first_name', 'last_name'],
  merge:     'concat',
  separator: ' ',
  transform: { type: 'trim' },   // handles rows with only one of the two
  nullAs:    'Anonymous',        // for rows with neither
}

Display name with fallbacks

Prefer nickname, fall back to preferred name, fall back to legal first name. coalesce picks the first non-null.

displayName: {
  type:  'text',
  from:  ['nickname', 'preferred_name', 'first_name'],
  merge: 'coalesce',
}

Cents → dollars, rounded

Integer cents is a common source shape. Round to 2 decimals for display.

priceUsd: {
  type: 'number',
  from: 'price_cents',
  transform: [
    { type: 'custom', body: 'return typeof value === "number" ? value / 100 : null' },
    { type: 'round', decimals: 2 },
  ],
}

custom here because there's no built-in "divide by N." Could also be done with toNumber + a scale factor in your source view.

Normalize a tag list

Source has a comma-separated string. Output an array with trimmed, lowercased values.

tags: {
  type: 'text',
  from: 'tags_csv',
  transform: [
    { type: 'default', value: '' },           // null-safe
    { type: 'lowercase' },
    { type: 'split', separator: ',' },
    { type: 'custom', body: 'return value.map(x => x.trim()).filter(Boolean)' },
  ],
}

After this: 'Thai, Vegan, '['thai', 'vegan'].

URL canonicalization

Strip trailing slashes, upgrade http:// to https://, lowercase.

canonicalUrl: {
  type: 'text',
  from: 'url',
  transform: [
    { type: 'default', value: '' },
    { type: 'lowercase' },
    { type: 'replace', pattern: '^http://', replacement: 'https://' },
    { type: 'replace', pattern: '/+$',      replacement: '' },
  ],
}

Each replace is a global regex. Chain them for multi-step cleanup.

Truncate a long description for display

Keep a separate full-body field (for embedding), truncate for UI.

// Full body — feeds the vector
description: {
  type: 'text',
  searchable: true,
  from: 'body',
}

// Truncated preview — shown in list cards
descriptionPreview: {
  type: 'text',
  from: 'body',
  transform: { type: 'truncate', length: 200 },
}

Both point at the same source column. The index carries both: one for search ranking, one for query() list views.

Derived slug

Compose a URL-safe slug from the title. Uses custom because the logic is too specific for built-ins.

slug: {
  type: 'text',
  transform: {
    type: 'custom',
    body: `
      const source = row.title || 'untitled'
      return source
        .toLowerCase()
        .replace(/[^a-z0-9]+/g, '-')
        .replace(/^-+|-+$/g, '')
        .slice(0, 80)
    `,
  },
}

No from needed — custom reads from row directly. The output field is purely derived.

Coerce + check with a companion field

Transforms don't throw. For data hygiene, pair a coerced field with a validity flag so bad data is queryable rather than invisible.

priceCents: {
  type: 'number',
  from: 'price_raw',
  transform: { type: 'toNumber' },
}

priceValid: {
  type: 'boolean',
  transform: {
    type: 'custom',
    body: `
      const n = Number(row.price_raw)
      return Number.isFinite(n) && n >= 0
    `,
  },
}

Now beam.products.query({ where: { priceValid: false } }) surfaces the rows with bad source data — rather than leaving them lurking in the index with NaN prices.

Null sentinels at the lens level

If your source uses 'N/A', '', or '-' to mean "no value," declare them at the lens level via nullValues. Every field then treats those values as null before mapping runs.

users: {
  source:     'main-db',
  table:      'users',
  nullValues: ['N/A', '', '-'],    // ← lens-wide sentinel
  fields: {
    email: { type: 'text', nullAs: 'no-email' },
    // ... 'N/A' in source → null in mapping → 'no-email' after nullAs
  },
}

Saves you from writing the same custom check on every field.

Composition — more than two chained

A chain can be as long as it needs to be. This turns a messy multi-source price string into a canonical number:

priceUsd: {
  type: 'number',
  from: ['price_display', 'price_raw'],
  merge: 'coalesce',                                // prefer display, fall back to raw
  transform: [
    { type: 'toString' },                           // anything → string
    { type: 'trim' },                               // kill whitespace
    { type: 'replace', pattern: '^\\$', replacement: '' },   // leading $
    { type: 'replace', pattern: ',', replacement: '' },       // thousand-sep commas
    { type: 'toNumber' },                           // now parses cleanly
    { type: 'round', decimals: 2 },                 // final shape
  ],
  undefinedAs: null,
  nullAs:      0,                                   // missing → 0 in the index
}

Seven operations, one declarative block, zero ETL infrastructure.

When to reach for a view instead

Mapping is powerful but doesn't replace a source-side view. Reach for a view when:

  • You need aggregation (sum, avg, count across rows).
  • You need filters that remove rows (e.g. WHERE deleted_at IS NULL). Mapping can't skip rows; use table: 'active_users_view' or similar.
  • You need joins at ingest time (not via SemiLayer's include). Materialize the join in a view; point the lens at it.

Mapping is for reshaping row-by-row. Anything that crosses rows or creates/destroys rows belongs upstream.