Playbook: New Client Onboarding¶

Purpose: Step-by-step guide to onboard a new client into the XO-Data platform. Follow these steps in order. Each phase has a checkpoint before proceeding.

Estimated effort: 2–3 days for a standard Gladly-based client.

Prerequisites¶

Before starting, confirm with the team: - [ ] Client name and short code confirmed (e.g., MMN for MassMutual) - [ ] Source system confirmed (Gladly / Google Sheets / Sprout / other) - [ ] Snowflake environment access confirmed (role: XO_DATA_ENGINEER_DEV) - [ ] S3 buckets accessible (ingest-bucket, stage-bucket) - [ ] Gladly API credentials available (if applicable)

Phase 1 — Snowflake Database Setup¶

Goal: Create {CLIENT}_DB with BRONZE, SILVER, GOLD, and SANDBOX schemas.

Step 1.1 — Create the schemachange migration¶

Create: apps/snowflake-bronze/{client_lower}/V1.0.0__create_schemas.sql

Base this on apps/snowflake-bronze/_template/V1.0.0__create_schemas.sql.template — it already includes BRONZE, SILVER, GOLD, and SANDBOX. Substitute {CLIENT_CODE}, {CLIENT_NAME}, and {DATA_SOURCES} in the template.

The SANDBOX schema is created automatically by the template migration and is available to analysts from day one. No separate migration is needed. See ADR 015 for SANDBOX usage rules.

-- V1.0.0__create_schemas.sql
-- Create {CLIENT}_DB with all medallion layers + SANDBOX

USE ROLE XO_DATA_ENGINEER_DEV;

CREATE DATABASE IF NOT EXISTS {CLIENT}_DB;

CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.BRONZE;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.SILVER;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.GOLD;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.SANDBOX;

Step 1.2 — Deploy schemas¶

# Review the migration file, then deploy
# (Follow schemachange-deployment.md for full deploy steps)

See: schemachange-deployment.md

Step 1.3 — Create Bronze tables migration¶

Create: apps/snowflake-bronze/{client_lower}/V1.1.0__create_bronze_tables.sql

All Bronze tables must: - Use VARCHAR for every column (no type conversion) - Include metadata columns: RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE, SOURCE_FILE, BATCH_ID, PIPELINE_RUN_ID - Use batch_replace loading strategy (new pipelines) — see ADR 011

CREATE TABLE IF NOT EXISTS {CLIENT}_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS (
    -- Source columns (all VARCHAR)
    CONTACT_ID        VARCHAR,
    CREATED_AT        VARCHAR,
    UPDATED_AT        VARCHAR,
    -- ... source-specific columns

    -- Metadata columns (required for all Bronze tables)
    RECORD_KEY        VARCHAR,
    RECORD_HASH       VARCHAR,
    DATE_TO_WAREHOUSE TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
    SOURCE_FILE       VARCHAR,
    BATCH_ID          VARCHAR,
    PIPELINE_RUN_ID   VARCHAR
);

Checkpoint: Both migrations deployed and schemas/tables visible in Snowflake ✅

Phase 2 — Airflow Pipeline¶

Goal: Create a working DAG that extracts data from the source and loads it to Bronze.

Step 2.1 — Create the YAML config¶

Create: apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml

Use an existing config as a reference: warbyparker-gladly-daily.yaml

dag:
  dag_id: {client_lower}_gladly_daily
  schedule: "0 5 * * 1-6"    # 5 AM ET, Mon–Sat
  catchup: false
  tags: ["{client_lower}", "gladly", "daily"]
  time_window:
    refresh_type: daily
    timezone: America/New_York
    daily_lag_minutes: 1440

sources:
  contact_timestamps:
    extractor: gladly_contacts
    load_strategy: full_refresh
    snowflake_table: GLADLY_CONTACT_TIMESTAMPS
    snowflake_database: "{CLIENT}_DB"
    snowflake_schema: BRONZE

Step 2.2 — Validate the config¶

uv run xo-foundry validate-config \
  --config apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml

Fix any validation errors before proceeding.

Step 2.3 — Generate the DAG¶

uv run xo-foundry generate-dag \
  --config apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml \
  --output apps/airflow/xo-pipelines/dags/

Review the generated DAG file for correctness.

Step 2.4 — Test locally¶

cd apps/airflow/xo-pipelines
astro dev start
# Open Airflow UI at localhost:8080
# Trigger the new DAG manually
# Verify data appears in Snowflake Bronze table
astro dev stop

Step 2.5 — Configure Astronomer connections¶

Ensure these Airflow connections exist in Astronomer: - snowflake_loader_{client_lower} — Snowflake connection for loading - aws_default — S3 access (usually shared) - {source}_api_{client_lower} — Source API credentials

Step 2.6 — Deploy¶

cd apps/airflow/xo-pipelines
astro deploy <deployment-id>

Checkpoint: DAG running in Astronomer and Bronze tables populating daily ✅

Phase 3 — dbt Silver Layer¶

Goal: Create Silver models that preserve historical data from Bronze.

Step 3.1 — Create source definition¶

Add to apps/dbt/xo_medallion/models/sources.yml:

- name: {client_lower}_bronze
  database: "{{ env_var('ENVIRONMENT', 'dev') == 'prod' and '{CLIENT}_DB' or '{CLIENT}_DB_DEV' }}"
  schema: BRONZE
  tables:
    - name: gladly_contact_timestamps
    - name: gladly_conversation_timestamps
    # ... other Bronze tables

Step 3.2 — Create Silver models¶

Directory: apps/dbt/xo_medallion/models/silver/{client_lower}/

For each Bronze table, create a Silver model. Example:

File: contact_timestamps.sql

{{ config(
    database=env_var('ENVIRONMENT', 'dev') == 'prod' and '{CLIENT}_DB' or '{CLIENT}_DB_DEV',
    schema='SILVER',
    materialized='table'
) }}

SELECT
    CONTACT_ID::VARCHAR          AS CONTACT_ID,
    CREATED_AT::TIMESTAMP_NTZ    AS CREATED_AT,
    UPDATED_AT::TIMESTAMP_NTZ    AS UPDATED_AT,
    -- ... all source columns with explicit casting
    RECORD_KEY,
    RECORD_HASH,
    DATE_TO_WAREHOUSE,
    SOURCE_FILE
FROM {{ source('{client_lower}_bronze', 'gladly_contact_timestamps') }}
QUALIFY ROW_NUMBER() OVER (PARTITION BY RECORD_KEY ORDER BY DATE_TO_WAREHOUSE DESC) = 1

Step 3.3 — Add schema.yml tests¶

models:
  - name: contact_timestamps
    description: "Historical contact events from Gladly. Grain: one row per contact."
    columns:
      - name: contact_id
        tests: [unique, not_null]
      - name: created_at
        tests: [not_null]

Step 3.4 — Test Silver locally¶

dbt compile --select silver.{client_lower}.*
dbt test --select silver.{client_lower}.*

Checkpoint: Silver models compile, tests pass, data matches Bronze row counts ✅

Phase 4 — dbt Gold Layer¶

Goal: Create Gold facts, aggregates, and report views for analyst consumption.

See the standalone playbook: gold-layer-expansion.md

Checkpoint: Gold models deployed to dev, analyst queries validated ✅

Phase 5 — Production Cutover¶

Goal: Promote everything from _DEV databases to production databases.

Step 5.1 — Verify Astronomer Variables¶

Ensure ENVIRONMENT=prod is set for production Airflow deployment.

Step 5.2 — Verify Snowflake permissions¶

Analysts need SELECT on {CLIENT}_DB.GOLD.*. Coordinate with Snowflake admin.

Step 5.3 — Flip dbt to prod¶

Set ENVIRONMENT=prod in the Airflow Variable and trigger a dbt run. Verify Gold tables populated in production database.

Step 5.4 — Tableau connection¶

Update or create Tableau data source pointing to {CLIENT}_DB.GOLD production views.

Step 5.5 — Create project archive¶

Once live:

mkdir .claude/ongoing/archived/$(date +%Y-%m-%d)-{client_lower}-onboarding/
mv .claude/ongoing/projects/{client_lower}-onboarding.md .claude/ongoing/archived/$(date +%Y-%m-%d)-{client_lower}-onboarding/

Create README.md in the archive folder summarizing what was built.

Checkpoint: Production data flowing, Tableau connected, team notified ✅

Reference¶

architecture-summary.md — Database layout
snowflake-object-inventory.md — Existing objects
dbt-model-registry.md — Existing dbt models
ADR 008 — Gold layer design
ADR 009 — Naming conventions
ADR 011 — Bronze loading strategy