Playbook: New Client Onboarding¶
Purpose: Step-by-step guide to onboard a new client into the XO-Data platform. Follow these steps in order. Each phase has a checkpoint before proceeding.
Estimated effort: 2–3 days for a standard Gladly-based client.
Prerequisites¶
Before starting, confirm with the team:
- [ ] Client name and short code confirmed (e.g., MMN for MassMutual)
- [ ] Source system confirmed (Gladly / Google Sheets / Sprout / other)
- [ ] Snowflake environment access confirmed (role: XO_DATA_ENGINEER_DEV)
- [ ] S3 buckets accessible (ingest-bucket, stage-bucket)
- [ ] Gladly API credentials available (if applicable)
Phase 1 — Snowflake Database Setup¶
Goal: Create {CLIENT}_DB with BRONZE, SILVER, GOLD schemas.
Step 1.1 — Create the schemachange migration¶
Create: apps/snowflake-bronze/{client_lower}/V1.0.0__create_schemas.sql
-- V1.0.0__create_schemas.sql
-- Create {CLIENT}_DB with all three medallion layers
USE ROLE XO_DATA_ENGINEER_DEV;
CREATE DATABASE IF NOT EXISTS {CLIENT}_DB;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.BRONZE;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.SILVER;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.GOLD;
Step 1.2 — Deploy schemas¶
# Review the migration file, then deploy
# (Follow schemachange-deployment.md for full deploy steps)
See: schemachange-deployment.md
Step 1.3 — Create Bronze tables migration¶
Create: apps/snowflake-bronze/{client_lower}/V1.1.0__create_bronze_tables.sql
All Bronze tables must:
- Use VARCHAR for every column (no type conversion)
- Include metadata columns: RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE, SOURCE_FILE, BATCH_ID, PIPELINE_RUN_ID
- Use batch_replace loading strategy (new pipelines) — see ADR 011
CREATE TABLE IF NOT EXISTS {CLIENT}_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS (
-- Source columns (all VARCHAR)
CONTACT_ID VARCHAR,
CREATED_AT VARCHAR,
UPDATED_AT VARCHAR,
-- ... source-specific columns
-- Metadata columns (required for all Bronze tables)
RECORD_KEY VARCHAR,
RECORD_HASH VARCHAR,
DATE_TO_WAREHOUSE TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
SOURCE_FILE VARCHAR,
BATCH_ID VARCHAR,
PIPELINE_RUN_ID VARCHAR
);
Checkpoint: Both migrations deployed and schemas/tables visible in Snowflake ✅
Phase 2 — Airflow Pipeline¶
Goal: Create a working DAG that extracts data from the source and loads it to Bronze.
Step 2.1 — Create the YAML config¶
Create: apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml
Use an existing config as a reference: warbyparker-gladly-daily.yaml
dag:
dag_id: {client_lower}_gladly_daily
schedule: "0 5 * * 1-6" # 5 AM ET, Mon–Sat
catchup: false
tags: ["{client_lower}", "gladly", "daily"]
time_window:
refresh_type: daily
timezone: America/New_York
daily_lag_minutes: 1440
sources:
contact_timestamps:
extractor: gladly_contacts
load_strategy: full_refresh
snowflake_table: GLADLY_CONTACT_TIMESTAMPS
snowflake_database: "{CLIENT}_DB"
snowflake_schema: BRONZE
Step 2.2 — Validate the config¶
uv run xo-foundry validate-config \
--config apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml
Fix any validation errors before proceeding.
Step 2.3 — Generate the DAG¶
uv run xo-foundry generate-dag \
--config apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml \
--output apps/airflow/xo-pipelines/dags/
Review the generated DAG file for correctness.
Step 2.4 — Test locally¶
cd apps/airflow/xo-pipelines
astro dev start
# Open Airflow UI at localhost:8080
# Trigger the new DAG manually
# Verify data appears in Snowflake Bronze table
astro dev stop
Step 2.5 — Configure Astronomer connections¶
Ensure these Airflow connections exist in Astronomer:
- snowflake_loader_{client_lower} — Snowflake connection for loading
- aws_default — S3 access (usually shared)
- {source}_api_{client_lower} — Source API credentials
Step 2.6 — Deploy¶
Checkpoint: DAG running in Astronomer and Bronze tables populating daily ✅
Phase 3 — dbt Silver Layer¶
Goal: Create Silver models that preserve historical data from Bronze.
Step 3.1 — Create source definition¶
Add to apps/dbt/dbt_xo_models/models/sources.yml:
- name: {client_lower}_bronze
database: "{{ env_var('ENVIRONMENT', 'dev') == 'prod' and '{CLIENT}_DB' or '{CLIENT}_DB_DEV' }}"
schema: BRONZE
tables:
- name: gladly_contact_timestamps
- name: gladly_conversation_timestamps
# ... other Bronze tables
Step 3.2 — Create Silver models¶
Directory: apps/dbt/dbt_xo_models/models/silver/{client_lower}/
For each Bronze table, create a Silver model. Example:
File: contact_timestamps.sql
{{ config(
database=env_var('ENVIRONMENT', 'dev') == 'prod' and '{CLIENT}_DB' or '{CLIENT}_DB_DEV',
schema='SILVER',
materialized='dynamic_table',
target_lag='1 hour',
snowflake_warehouse='XO_TRANSFORM_WH'
) }}
SELECT
CONTACT_ID::VARCHAR AS CONTACT_ID,
CREATED_AT::TIMESTAMP_NTZ AS CREATED_AT,
UPDATED_AT::TIMESTAMP_NTZ AS UPDATED_AT,
-- ... all source columns with explicit casting
RECORD_KEY,
RECORD_HASH,
DATE_TO_WAREHOUSE,
SOURCE_FILE
FROM {{ source('{client_lower}_bronze', 'gladly_contact_timestamps') }}
QUALIFY ROW_NUMBER() OVER (PARTITION BY RECORD_KEY ORDER BY DATE_TO_WAREHOUSE DESC) = 1
Step 3.3 — Add schema.yml tests¶
models:
- name: contact_timestamps
description: "Historical contact events from Gladly. Grain: one row per contact."
columns:
- name: contact_id
tests: [unique, not_null]
- name: created_at
tests: [not_null]
Step 3.4 — Test Silver locally¶
Checkpoint: Silver models compile, tests pass, data matches Bronze row counts ✅
Phase 4 — dbt Gold Layer¶
Goal: Create Gold facts, aggregates, and report views for analyst consumption.
See the standalone playbook: gold-layer-expansion.md
Checkpoint: Gold models deployed to dev, analyst queries validated ✅
Phase 5 — Production Cutover¶
Goal: Promote everything from _DEV databases to production databases.
Step 5.1 — Verify Astronomer Variables¶
Ensure ENVIRONMENT=prod is set for production Airflow deployment.
Step 5.2 — Verify Snowflake permissions¶
Analysts need SELECT on {CLIENT}_DB.GOLD.*. Coordinate with Snowflake admin.
Step 5.3 — Flip dbt to prod¶
Set ENVIRONMENT=prod in the Airflow Variable and trigger a dbt run. Verify Gold tables populated in production database.
Step 5.4 — Tableau connection¶
Update or create Tableau data source pointing to {CLIENT}_DB.GOLD production views.
Step 5.5 — Create project archive¶
Once live:
mkdir .claude/ongoing/archived/$(date +%Y-%m-%d)-{client_lower}-onboarding/
mv .claude/ongoing/projects/{client_lower}-onboarding.md .claude/ongoing/archived/$(date +%Y-%m-%d)-{client_lower}-onboarding/
Create README.md in the archive folder summarizing what was built.
Checkpoint: Production data flowing, Tableau connected, team notified ✅
Reference¶
- architecture-summary.md — Database layout
- snowflake-object-inventory.md — Existing objects
- dbt-model-registry.md — Existing dbt models
- ADR 008 — Gold layer design
- ADR 009 — Naming conventions
- ADR 011 — Bronze loading strategy