Skip to content

Playbook: New Client Onboarding

Purpose: Step-by-step guide to onboard a new client into the XO-Data platform. Follow these steps in order. Each phase has a checkpoint before proceeding.

Estimated effort: 2–3 days for a standard Gladly-based client.


Prerequisites

Before starting, confirm with the team: - [ ] Client name and short code confirmed (e.g., MMN for MassMutual) - [ ] Source system confirmed (Gladly / Google Sheets / Sprout / other) - [ ] Snowflake environment access confirmed (role: XO_DATA_ENGINEER_DEV) - [ ] S3 buckets accessible (ingest-bucket, stage-bucket) - [ ] Gladly API credentials available (if applicable)


Phase 1 — Snowflake Database Setup

Goal: Create {CLIENT}_DB with BRONZE, SILVER, GOLD, and SANDBOX schemas.

Step 1.1 — Create the schemachange migration

Create: apps/snowflake-bronze/{client_lower}/V1.0.0__create_schemas.sql

Base this on apps/snowflake-bronze/_template/V1.0.0__create_schemas.sql.template — it already includes BRONZE, SILVER, GOLD, and SANDBOX. Substitute {CLIENT_CODE}, {CLIENT_NAME}, and {DATA_SOURCES} in the template.

The SANDBOX schema is created automatically by the template migration and is available to analysts from day one. No separate migration is needed. See ADR 015 for SANDBOX usage rules.

-- V1.0.0__create_schemas.sql
-- Create {CLIENT}_DB with all medallion layers + SANDBOX

USE ROLE XO_DATA_ENGINEER_DEV;

CREATE DATABASE IF NOT EXISTS {CLIENT}_DB;

CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.BRONZE;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.SILVER;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.GOLD;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.SANDBOX;

Step 1.2 — Deploy schemas

# Review the migration file, then deploy
# (Follow schemachange-deployment.md for full deploy steps)

See: schemachange-deployment.md

Step 1.3 — Create Bronze tables migration

Create: apps/snowflake-bronze/{client_lower}/V1.1.0__create_bronze_tables.sql

All Bronze tables must: - Use VARCHAR for every column (no type conversion) - Include metadata columns: RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE, SOURCE_FILE, BATCH_ID, PIPELINE_RUN_ID - Use batch_replace loading strategy (new pipelines) — see ADR 011

CREATE TABLE IF NOT EXISTS {CLIENT}_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS (
    -- Source columns (all VARCHAR)
    CONTACT_ID        VARCHAR,
    CREATED_AT        VARCHAR,
    UPDATED_AT        VARCHAR,
    -- ... source-specific columns

    -- Metadata columns (required for all Bronze tables)
    RECORD_KEY        VARCHAR,
    RECORD_HASH       VARCHAR,
    DATE_TO_WAREHOUSE TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
    SOURCE_FILE       VARCHAR,
    BATCH_ID          VARCHAR,
    PIPELINE_RUN_ID   VARCHAR
);

Checkpoint: Both migrations deployed and schemas/tables visible in Snowflake ✅


Phase 2 — Airflow Pipeline

Goal: Create a working DAG that extracts data from the source and loads it to Bronze.

Step 2.1 — Create the YAML config

Create: apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml

Use an existing config as a reference: warbyparker-gladly-daily.yaml

dag:
  dag_id: {client_lower}_gladly_daily
  schedule: "0 5 * * 1-6"    # 5 AM ET, Mon–Sat
  catchup: false
  tags: ["{client_lower}", "gladly", "daily"]
  time_window:
    refresh_type: daily
    timezone: America/New_York
    daily_lag_minutes: 1440

sources:
  contact_timestamps:
    extractor: gladly_contacts
    load_strategy: full_refresh
    snowflake_table: GLADLY_CONTACT_TIMESTAMPS
    snowflake_database: "{CLIENT}_DB"
    snowflake_schema: BRONZE

Step 2.2 — Validate the config

uv run xo-foundry validate-config \
  --config apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml

Fix any validation errors before proceeding.

Step 2.3 — Generate the DAG

uv run xo-foundry generate-dag \
  --config apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml \
  --output apps/airflow/xo-pipelines/dags/

Review the generated DAG file for correctness.

Step 2.4 — Test locally

cd apps/airflow/xo-pipelines
astro dev start
# Open Airflow UI at localhost:8080
# Trigger the new DAG manually
# Verify data appears in Snowflake Bronze table
astro dev stop

Step 2.5 — Configure Astronomer connections

Ensure these Airflow connections exist in Astronomer: - snowflake_loader_{client_lower} — Snowflake connection for loading - aws_default — S3 access (usually shared) - {source}_api_{client_lower} — Source API credentials

Step 2.6 — Deploy

cd apps/airflow/xo-pipelines
astro deploy <deployment-id>

Checkpoint: DAG running in Astronomer and Bronze tables populating daily ✅


Phase 3 — dbt Silver Layer

Goal: Create Silver models that preserve historical data from Bronze.

Step 3.1 — Create source definition

Add to apps/dbt/xo_medallion/models/sources.yml:

- name: {client_lower}_bronze
  database: "{{ env_var('ENVIRONMENT', 'dev') == 'prod' and '{CLIENT}_DB' or '{CLIENT}_DB_DEV' }}"
  schema: BRONZE
  tables:
    - name: gladly_contact_timestamps
    - name: gladly_conversation_timestamps
    # ... other Bronze tables

Step 3.2 — Create Silver models

Directory: apps/dbt/xo_medallion/models/silver/{client_lower}/

For each Bronze table, create a Silver model. Example:

File: contact_timestamps.sql

{{ config(
    database=env_var('ENVIRONMENT', 'dev') == 'prod' and '{CLIENT}_DB' or '{CLIENT}_DB_DEV',
    schema='SILVER',
    materialized='table'
) }}

SELECT
    CONTACT_ID::VARCHAR          AS CONTACT_ID,
    CREATED_AT::TIMESTAMP_NTZ    AS CREATED_AT,
    UPDATED_AT::TIMESTAMP_NTZ    AS UPDATED_AT,
    -- ... all source columns with explicit casting
    RECORD_KEY,
    RECORD_HASH,
    DATE_TO_WAREHOUSE,
    SOURCE_FILE
FROM {{ source('{client_lower}_bronze', 'gladly_contact_timestamps') }}
QUALIFY ROW_NUMBER() OVER (PARTITION BY RECORD_KEY ORDER BY DATE_TO_WAREHOUSE DESC) = 1

Step 3.3 — Add schema.yml tests

models:
  - name: contact_timestamps
    description: "Historical contact events from Gladly. Grain: one row per contact."
    columns:
      - name: contact_id
        tests: [unique, not_null]
      - name: created_at
        tests: [not_null]

Step 3.4 — Test Silver locally

dbt compile --select silver.{client_lower}.*
dbt test --select silver.{client_lower}.*

Checkpoint: Silver models compile, tests pass, data matches Bronze row counts ✅


Phase 4 — dbt Gold Layer

Goal: Create Gold facts, aggregates, and report views for analyst consumption.

See the standalone playbook: gold-layer-expansion.md

Checkpoint: Gold models deployed to dev, analyst queries validated ✅


Phase 5 — Production Cutover

Goal: Promote everything from _DEV databases to production databases.

Step 5.1 — Verify Astronomer Variables

Ensure ENVIRONMENT=prod is set for production Airflow deployment.

Step 5.2 — Verify Snowflake permissions

Analysts need SELECT on {CLIENT}_DB.GOLD.*. Coordinate with Snowflake admin.

Step 5.3 — Flip dbt to prod

Set ENVIRONMENT=prod in the Airflow Variable and trigger a dbt run. Verify Gold tables populated in production database.

Step 5.4 — Tableau connection

Update or create Tableau data source pointing to {CLIENT}_DB.GOLD production views.

Step 5.5 — Create project archive

Once live:

mkdir .claude/ongoing/archived/$(date +%Y-%m-%d)-{client_lower}-onboarding/
mv .claude/ongoing/projects/{client_lower}-onboarding.md .claude/ongoing/archived/$(date +%Y-%m-%d)-{client_lower}-onboarding/

Create README.md in the archive folder summarizing what was built.

Checkpoint: Production data flowing, Tableau connected, team notified ✅


Reference