Skip to content

Playbook: New Client Onboarding

Purpose: Step-by-step guide to onboard a new client into the XO-Data platform. Follow these steps in order. Each phase has a checkpoint before proceeding.

Estimated effort: 2–3 days for a standard Gladly-based client.


Prerequisites

Before starting, confirm with the team: - [ ] Client name and short code confirmed (e.g., MMN for MassMutual) - [ ] Source system confirmed (Gladly / Google Sheets / Sprout / other) - [ ] Snowflake environment access confirmed (role: XO_DATA_ENGINEER_DEV) - [ ] S3 buckets accessible (ingest-bucket, stage-bucket) - [ ] Gladly API credentials available (if applicable)


Phase 1 — Snowflake Database Setup

Goal: Create {CLIENT}_DB with BRONZE, SILVER, GOLD schemas.

Step 1.1 — Create the schemachange migration

Create: apps/snowflake-bronze/{client_lower}/V1.0.0__create_schemas.sql

-- V1.0.0__create_schemas.sql
-- Create {CLIENT}_DB with all three medallion layers

USE ROLE XO_DATA_ENGINEER_DEV;

CREATE DATABASE IF NOT EXISTS {CLIENT}_DB;

CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.BRONZE;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.SILVER;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.GOLD;

Step 1.2 — Deploy schemas

# Review the migration file, then deploy
# (Follow schemachange-deployment.md for full deploy steps)

See: schemachange-deployment.md

Step 1.3 — Create Bronze tables migration

Create: apps/snowflake-bronze/{client_lower}/V1.1.0__create_bronze_tables.sql

All Bronze tables must: - Use VARCHAR for every column (no type conversion) - Include metadata columns: RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE, SOURCE_FILE, BATCH_ID, PIPELINE_RUN_ID - Use batch_replace loading strategy (new pipelines) — see ADR 011

CREATE TABLE IF NOT EXISTS {CLIENT}_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS (
    -- Source columns (all VARCHAR)
    CONTACT_ID        VARCHAR,
    CREATED_AT        VARCHAR,
    UPDATED_AT        VARCHAR,
    -- ... source-specific columns

    -- Metadata columns (required for all Bronze tables)
    RECORD_KEY        VARCHAR,
    RECORD_HASH       VARCHAR,
    DATE_TO_WAREHOUSE TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
    SOURCE_FILE       VARCHAR,
    BATCH_ID          VARCHAR,
    PIPELINE_RUN_ID   VARCHAR
);

Checkpoint: Both migrations deployed and schemas/tables visible in Snowflake ✅


Phase 2 — Airflow Pipeline

Goal: Create a working DAG that extracts data from the source and loads it to Bronze.

Step 2.1 — Create the YAML config

Create: apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml

Use an existing config as a reference: warbyparker-gladly-daily.yaml

dag:
  dag_id: {client_lower}_gladly_daily
  schedule: "0 5 * * 1-6"    # 5 AM ET, Mon–Sat
  catchup: false
  tags: ["{client_lower}", "gladly", "daily"]
  time_window:
    refresh_type: daily
    timezone: America/New_York
    daily_lag_minutes: 1440

sources:
  contact_timestamps:
    extractor: gladly_contacts
    load_strategy: full_refresh
    snowflake_table: GLADLY_CONTACT_TIMESTAMPS
    snowflake_database: "{CLIENT}_DB"
    snowflake_schema: BRONZE

Step 2.2 — Validate the config

uv run xo-foundry validate-config \
  --config apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml

Fix any validation errors before proceeding.

Step 2.3 — Generate the DAG

uv run xo-foundry generate-dag \
  --config apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml \
  --output apps/airflow/xo-pipelines/dags/

Review the generated DAG file for correctness.

Step 2.4 — Test locally

cd apps/airflow/xo-pipelines
astro dev start
# Open Airflow UI at localhost:8080
# Trigger the new DAG manually
# Verify data appears in Snowflake Bronze table
astro dev stop

Step 2.5 — Configure Astronomer connections

Ensure these Airflow connections exist in Astronomer: - snowflake_loader_{client_lower} — Snowflake connection for loading - aws_default — S3 access (usually shared) - {source}_api_{client_lower} — Source API credentials

Step 2.6 — Deploy

cd apps/airflow/xo-pipelines
astro deploy <deployment-id>

Checkpoint: DAG running in Astronomer and Bronze tables populating daily ✅


Phase 3 — dbt Silver Layer

Goal: Create Silver models that preserve historical data from Bronze.

Step 3.1 — Create source definition

Add to apps/dbt/dbt_xo_models/models/sources.yml:

- name: {client_lower}_bronze
  database: "{{ env_var('ENVIRONMENT', 'dev') == 'prod' and '{CLIENT}_DB' or '{CLIENT}_DB_DEV' }}"
  schema: BRONZE
  tables:
    - name: gladly_contact_timestamps
    - name: gladly_conversation_timestamps
    # ... other Bronze tables

Step 3.2 — Create Silver models

Directory: apps/dbt/dbt_xo_models/models/silver/{client_lower}/

For each Bronze table, create a Silver model. Example:

File: contact_timestamps.sql

{{ config(
    database=env_var('ENVIRONMENT', 'dev') == 'prod' and '{CLIENT}_DB' or '{CLIENT}_DB_DEV',
    schema='SILVER',
    materialized='dynamic_table',
    target_lag='1 hour',
    snowflake_warehouse='XO_TRANSFORM_WH'
) }}

SELECT
    CONTACT_ID::VARCHAR          AS CONTACT_ID,
    CREATED_AT::TIMESTAMP_NTZ    AS CREATED_AT,
    UPDATED_AT::TIMESTAMP_NTZ    AS UPDATED_AT,
    -- ... all source columns with explicit casting
    RECORD_KEY,
    RECORD_HASH,
    DATE_TO_WAREHOUSE,
    SOURCE_FILE
FROM {{ source('{client_lower}_bronze', 'gladly_contact_timestamps') }}
QUALIFY ROW_NUMBER() OVER (PARTITION BY RECORD_KEY ORDER BY DATE_TO_WAREHOUSE DESC) = 1

Step 3.3 — Add schema.yml tests

models:
  - name: contact_timestamps
    description: "Historical contact events from Gladly. Grain: one row per contact."
    columns:
      - name: contact_id
        tests: [unique, not_null]
      - name: created_at
        tests: [not_null]

Step 3.4 — Test Silver locally

dbt compile --select silver.{client_lower}.*
dbt test --select silver.{client_lower}.*

Checkpoint: Silver models compile, tests pass, data matches Bronze row counts ✅


Phase 4 — dbt Gold Layer

Goal: Create Gold facts, aggregates, and report views for analyst consumption.

See the standalone playbook: gold-layer-expansion.md

Checkpoint: Gold models deployed to dev, analyst queries validated ✅


Phase 5 — Production Cutover

Goal: Promote everything from _DEV databases to production databases.

Step 5.1 — Verify Astronomer Variables

Ensure ENVIRONMENT=prod is set for production Airflow deployment.

Step 5.2 — Verify Snowflake permissions

Analysts need SELECT on {CLIENT}_DB.GOLD.*. Coordinate with Snowflake admin.

Step 5.3 — Flip dbt to prod

Set ENVIRONMENT=prod in the Airflow Variable and trigger a dbt run. Verify Gold tables populated in production database.

Step 5.4 — Tableau connection

Update or create Tableau data source pointing to {CLIENT}_DB.GOLD production views.

Step 5.5 — Create project archive

Once live:

mkdir .claude/ongoing/archived/$(date +%Y-%m-%d)-{client_lower}-onboarding/
mv .claude/ongoing/projects/{client_lower}-onboarding.md .claude/ongoing/archived/$(date +%Y-%m-%d)-{client_lower}-onboarding/

Create README.md in the archive folder summarizing what was built.

Checkpoint: Production data flowing, Tableau connected, team notified ✅


Reference