Playbook: New Client Onboarding¶
Purpose: Step-by-step guide to onboard a new client into the XO-Data platform. Follow these steps in order. Each phase has a checkpoint before proceeding.
Estimated effort: 2–3 days for a standard Gladly-based client.
Prerequisites¶
Before starting, confirm with the team:
- [ ] Client name and short code confirmed (e.g., MMN for MassMutual)
- [ ] Source system confirmed (Gladly / Google Sheets / Sprout / other)
- [ ] Snowflake environment access confirmed (role: XO_DATA_ENGINEER_DEV)
- [ ] S3 buckets accessible (ingest-bucket, stage-bucket)
- [ ] Gladly API credentials available (if applicable)
Phase 1 — Snowflake Database Setup¶
Goal: Create {CLIENT}_DB with BRONZE, SILVER, GOLD, and SANDBOX schemas.
Step 1.1 — Create the schemachange migration¶
Create: apps/snowflake-bronze/{client_lower}/V1.0.0__create_schemas.sql
Base this on apps/snowflake-bronze/_template/V1.0.0__create_schemas.sql.template — it already
includes BRONZE, SILVER, GOLD, and SANDBOX. Substitute {CLIENT_CODE}, {CLIENT_NAME}, and
{DATA_SOURCES} in the template.
The SANDBOX schema is created automatically by the template migration and is available to analysts from day one. No separate migration is needed. See ADR 015 for SANDBOX usage rules.
-- V1.0.0__create_schemas.sql
-- Create {CLIENT}_DB with all medallion layers + SANDBOX
USE ROLE XO_DATA_ENGINEER_DEV;
CREATE DATABASE IF NOT EXISTS {CLIENT}_DB;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.BRONZE;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.SILVER;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.GOLD;
CREATE SCHEMA IF NOT EXISTS {CLIENT}_DB.SANDBOX;
Step 1.2 — Deploy schemas¶
# Review the migration file, then deploy
# (Follow schemachange-deployment.md for full deploy steps)
See: schemachange-deployment.md
Step 1.3 — Create Bronze tables migration¶
Create: apps/snowflake-bronze/{client_lower}/V1.1.0__create_bronze_tables.sql
All Bronze tables must:
- Use VARCHAR for every column (no type conversion)
- Include metadata columns: RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE, SOURCE_FILE, BATCH_ID, PIPELINE_RUN_ID
- Use batch_replace loading strategy (new pipelines) — see ADR 011
CREATE TABLE IF NOT EXISTS {CLIENT}_DB.BRONZE.GLADLY_CONTACT_TIMESTAMPS (
-- Source columns (all VARCHAR)
CONTACT_ID VARCHAR,
CREATED_AT VARCHAR,
UPDATED_AT VARCHAR,
-- ... source-specific columns
-- Metadata columns (required for all Bronze tables)
RECORD_KEY VARCHAR,
RECORD_HASH VARCHAR,
DATE_TO_WAREHOUSE TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
SOURCE_FILE VARCHAR,
BATCH_ID VARCHAR,
PIPELINE_RUN_ID VARCHAR
);
Checkpoint: Both migrations deployed and schemas/tables visible in Snowflake ✅
Phase 2 — Airflow Pipeline¶
Goal: Create a working DAG that extracts data from the source and loads it to Bronze.
Step 2.1 — Create the YAML config¶
Create: apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml
Use an existing config as a reference: warbyparker-gladly-daily.yaml
dag:
dag_id: {client_lower}_gladly_daily
schedule: "0 5 * * 1-6" # 5 AM ET, Mon–Sat
catchup: false
tags: ["{client_lower}", "gladly", "daily"]
time_window:
refresh_type: daily
timezone: America/New_York
daily_lag_minutes: 1440
sources:
contact_timestamps:
extractor: gladly_contacts
load_strategy: full_refresh
snowflake_table: GLADLY_CONTACT_TIMESTAMPS
snowflake_database: "{CLIENT}_DB"
snowflake_schema: BRONZE
Step 2.2 — Validate the config¶
uv run xo-foundry validate-config \
--config apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml
Fix any validation errors before proceeding.
Step 2.3 — Generate the DAG¶
uv run xo-foundry generate-dag \
--config apps/airflow/xo-pipelines/dags/configs/{client_lower}-gladly-daily.yaml \
--output apps/airflow/xo-pipelines/dags/
Review the generated DAG file for correctness.
Step 2.4 — Test locally¶
cd apps/airflow/xo-pipelines
astro dev start
# Open Airflow UI at localhost:8080
# Trigger the new DAG manually
# Verify data appears in Snowflake Bronze table
astro dev stop
Step 2.5 — Configure Astronomer connections¶
Ensure these Airflow connections exist in Astronomer:
- snowflake_loader_{client_lower} — Snowflake connection for loading
- aws_default — S3 access (usually shared)
- {source}_api_{client_lower} — Source API credentials
Step 2.6 — Deploy¶
Checkpoint: DAG running in Astronomer and Bronze tables populating daily ✅
Phase 3 — dbt Silver Layer¶
Goal: Create Silver models that preserve historical data from Bronze.
Step 3.1 — Create source definition¶
Add to apps/dbt/xo_medallion/models/sources.yml:
- name: {client_lower}_bronze
database: "{{ env_var('ENVIRONMENT', 'dev') == 'prod' and '{CLIENT}_DB' or '{CLIENT}_DB_DEV' }}"
schema: BRONZE
tables:
- name: gladly_contact_timestamps
- name: gladly_conversation_timestamps
# ... other Bronze tables
Step 3.2 — Create Silver models¶
Directory: apps/dbt/xo_medallion/models/silver/{client_lower}/
For each Bronze table, create a Silver model. Example:
File: contact_timestamps.sql
{{ config(
database=env_var('ENVIRONMENT', 'dev') == 'prod' and '{CLIENT}_DB' or '{CLIENT}_DB_DEV',
schema='SILVER',
materialized='table'
) }}
SELECT
CONTACT_ID::VARCHAR AS CONTACT_ID,
CREATED_AT::TIMESTAMP_NTZ AS CREATED_AT,
UPDATED_AT::TIMESTAMP_NTZ AS UPDATED_AT,
-- ... all source columns with explicit casting
RECORD_KEY,
RECORD_HASH,
DATE_TO_WAREHOUSE,
SOURCE_FILE
FROM {{ source('{client_lower}_bronze', 'gladly_contact_timestamps') }}
QUALIFY ROW_NUMBER() OVER (PARTITION BY RECORD_KEY ORDER BY DATE_TO_WAREHOUSE DESC) = 1
Step 3.3 — Add schema.yml tests¶
models:
- name: contact_timestamps
description: "Historical contact events from Gladly. Grain: one row per contact."
columns:
- name: contact_id
tests: [unique, not_null]
- name: created_at
tests: [not_null]
Step 3.4 — Test Silver locally¶
Checkpoint: Silver models compile, tests pass, data matches Bronze row counts ✅
Phase 4 — dbt Gold Layer¶
Goal: Create Gold facts, aggregates, and report views for analyst consumption.
See the standalone playbook: gold-layer-expansion.md
Checkpoint: Gold models deployed to dev, analyst queries validated ✅
Phase 5 — Production Cutover¶
Goal: Promote everything from _DEV databases to production databases.
Step 5.1 — Verify Astronomer Variables¶
Ensure ENVIRONMENT=prod is set for production Airflow deployment.
Step 5.2 — Verify Snowflake permissions¶
Analysts need SELECT on {CLIENT}_DB.GOLD.*. Coordinate with Snowflake admin.
Step 5.3 — Flip dbt to prod¶
Set ENVIRONMENT=prod in the Airflow Variable and trigger a dbt run. Verify Gold tables populated in production database.
Step 5.4 — Tableau connection¶
Update or create Tableau data source pointing to {CLIENT}_DB.GOLD production views.
Step 5.5 — Create project archive¶
Once live:
mkdir .claude/ongoing/archived/$(date +%Y-%m-%d)-{client_lower}-onboarding/
mv .claude/ongoing/projects/{client_lower}-onboarding.md .claude/ongoing/archived/$(date +%Y-%m-%d)-{client_lower}-onboarding/
Create README.md in the archive folder summarizing what was built.
Checkpoint: Production data flowing, Tableau connected, team notified ✅
Reference¶
- architecture-summary.md — Database layout
- snowflake-object-inventory.md — Existing objects
- dbt-model-registry.md — Existing dbt models
- ADR 008 — Gold layer design
- ADR 009 — Naming conventions
- ADR 011 — Bronze loading strategy