SemiLayerDocs

Data Mapping

Your source schema is one thing. What your app actually queries is another. The mapping layer on every FieldConfig bridges the two in config — no ETL, no view, no extra moving part.

The five knobs

Every field on a lens can use any combination of these:

FieldPurpose
fromSource column name(s). Omit for identity (output name = source name).
mergeWhen from is an array — how to combine. 'concat' or 'coalesce'.
separatorRequired string when merge: 'concat'.
transformSingle transform or array of transforms, applied left-to-right after resolution.
nullAs / undefinedAsWhat to replace nulls / missing values with.
queryusers
users: {
  source: 'main-db',
  table: 'public.users',
  fields: {
    id: { type: 'number', primaryKey: true },

    // Compose a human-readable name from two source columns
    displayName: {
      type:      'text',
      from:      ['first_name', 'last_name'],
      merge:     'concat',
      separator: ' ',
      transform: { type: 'trim' },
      nullAs:    'Anonymous',
    },

    // Rename + coerce cents to dollars with rounding
    priceUsd: {
      type:      'number',
      from:      'price_cents',
      transform: [
        { type: 'toNumber' },
        { type: 'round', decimals: 2 },
      ],
    },

    // Identity mapping — 'role' in source, 'role' on the lens
    role: { type: 'enum', values: ['owner', 'admin', 'member', 'guest'] },
  },
  grants: { query: 'staff' },
}

How it runs

Mapping runs per row during ingest, between the bridge fetch and embedding. The order is fixed:

1. Null-sentinel replacement        ('N/A', '', whatever you declared)
2. For each field:
   a. Resolve from:
        - omitted        → source[fieldName]
        - string         → source[from]
        - string[]       → merge(parts, strategy, separator)
   b. Apply undefinedAs if result is undefined
   c. Apply nullAs if result is null
   d. Run transform chain left-to-right
3. Content-hash dedup (skip unchanged rows before embedding)
4. Embed the mapped row, write vectors

The rule that falls out of this: transforms run on the already-resolved value, not on raw source. So a transform in a multi-source field operates on the merged string, not each part.

What mapping is for

  • Renaming columns: source has name_en, your app wants name.
  • Merging columns: first_name + last_namefullName.
  • Type coercion: cents (integer) → dollars (float rounded to 2 decimals); 'true' string → boolean.
  • Normalization: trim whitespace, lowercase, regex-replace.
  • Default values: source column sometimes NULL — render as 'Unknown' in the index.
  • Derived fields: a small JS expression that reads two source columns and computes a third.

What mapping is not for

  • Aggregation across rows. Mapping is per-row. Sum/avg/count live elsewhere — usually in a view on the source, or a separate lens that the first one joins to.
  • Filtering rows out of ingest. Mapping can transform a row's values but can't skip the row. For row filtering, use the bridge's where (e.g. table: 'products WHERE deleted_at IS NULL') or a view.
  • Enriching from external APIs. Transforms can't make network calls. For enrichment, pre-populate the data in your source (a view, a materialized column, a cron job).
  • Anything with a schema the source can't produce. Mapping reshapes what the bridge returns. If the bridge can't return a column at all, mapping can't create it.

Validation

semilayer push validates the mapping block:

  • from shape: omit / string / string-array. Arrays must have ≥ 2 entries.
  • merge presence: required when from is an array.
  • separator presence: required when merge: 'concat'.
  • Transform params: required fields per transform type (e.g. split requires separator; truncate requires length > 0; replace requires pattern + replacement).

Not validated at config time: whether from: 'some_column' actually exists on the source. That check runs at bridge introspection time. A mapping pointing at a nonexistent column doesn't fail the push — it just produces undefined on every row until you fix it.

Drift detection

Each lens has a configHash — a canonical SHA-256 of the full config, including the mapping block. When you run push, the CLI compares your local hash against the server's, and if the server was last edited via the Console (not CLI) and the hashes differ, it flags the drift:

! drift detected on 3 lenses
  users        last modified by console at 2026-04-18T09:12:00Z
  products     last modified by console at 2026-04-19T14:02:00Z
  ...
  continue? [y/N]

This catches the common "someone tweaked the mapping in Console and forgot to pull" scenario before you overwrite their edit.

Where to go next

  • Source resolutionfrom, merge, separator in depth
  • Transforms — the 14 built-ins, chaining, custom JS
  • Recipes — full-name, cents→dollars, default values, JSON extraction, etc.