BI Roster: Adding a New Client Tab¶
Purpose¶
Add a new client to the Standard_Roster Google Sheet pipeline so their employee data flows from the master Google Sheet into CORE_DB.BRONZE, then through Silver and Gold via dbt. This is the process for every new client tab onboarded into the BI roster system.
Prerequisites¶
- The client's tab exists in the master Standard_Roster Google Sheet (
1TDfcwDHQFM3fmAJhQOzoWjDbTikJCN0wvQ5RYh2xjNY) - The tab has a header row with
EIDas the natural key column - You know the exact
sheet_name(as it appears in the Google Sheet tab label) - You know the client identifier to use as the
{CLIENT}suffix (UPPER_SNAKE_CASE, e.g.NEWCLIENT) - Access to deploy via schemachange and merge a dbt PR
Overview¶
Google Sheet tab
│
▼ (xo-foundry, daily 6 AM EST)
CORE_DB.BRONZE.GSHEET_BI_ROSTER_{CLIENT} ← batch_replace (daily snapshots)
│
▼ (dbt Silver, triggered after Bronze load)
CORE_DB.SILVER.BI_ROSTER_{CLIENT} ← table (latest batch only)
│
▼ (dbt Gold, same DAG run)
CORE_DB.GOLD.DIM_EMPLOYEE_{CLIENT} ← table (SCD Type 1, optional)
The Bronze table accumulates one snapshot per day (identified by BATCH_ID = execution date). The Silver model rebuilds completely each run, keeping only the latest BATCH_ID. The Gold dimension provides a canonical employee reference for downstream client facts.
Step 1: Bronze DDL (schemachange migration)¶
Create a new migration file in apps/snowflake-bronze/core/. Check the last version number in that directory and increment by 1.
File: apps/snowflake-bronze/core/V{next}.0__create_gsheet_bi_roster_{client}.sql
Use the Warby Parker migration as the template (V1.4.0__create_gsheet_bi_roster_warbyparker.sql):
-- V{next}.0: Create BI Roster {CLIENT} Bronze Table
-- Source: Google Sheet 1TDfcwDHQFM3fmAJhQOzoWjDbTikJCN0wvQ5RYh2xjNY ({TAB_NAME} tab)
-- Strategy: batch_replace (appends daily snapshots, preserves history per ADR 011)
--
-- Column names are the standardized forms of the tab row-1 headers after
-- the xo-foundry sanitization + standardize_headers() pipeline:
-- 1. extract_gsheet_data(): strip \n (first line only), strip whitespace, drop empty
-- 2. standardize_headers(): UPPER_SNAKE_CASE, collapse underscores, SQL keyword handling
--
-- IMPORTANT: This runs as XO_SCHEMACHANGE_DEPLOYER role (set in connection)
USE DATABASE CORE_DB_{{ DB_SUFFIX }};
USE SCHEMA BRONZE;
CREATE TABLE IF NOT EXISTS GSHEET_BI_ROSTER_{CLIENT} (
-- ======================================================================
-- Source columns (all VARCHAR — scaffold after first pipeline run)
-- Run: uv run xo-foundry scaffold-bronze --config ... to generate column list
-- ======================================================================
EID VARCHAR, -- Natural key: Employee ID
LAST_NAME VARCHAR,
FIRST_NAME VARCHAR,
GLADLY_NAME VARCHAR,
EMAIL VARCHAR,
JOB_TITLE VARCHAR,
DEPARTMENT VARCHAR,
TEAM VARCHAR,
TEAM_LEADER VARCHAR,
MANAGER VARCHAR,
START_DATE VARCHAR,
END_DATE VARCHAR,
STATUS VARCHAR,
LOCATION VARCHAR,
ACCOUNT VARCHAR,
LINE_OF_BUSINESS VARCHAR,
WORK_TYPE VARCHAR,
IS_XO_AGENT VARCHAR,
-- ======================================================================
-- Metadata columns (required by xo-foundry copy_to_snowflake)
-- ======================================================================
DATE_TO_WAREHOUSE TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
UPDATE_DATE TIMESTAMP_NTZ,
RECORD_KEY VARCHAR,
RECORD_HASH VARCHAR,
SOURCE_FILE VARCHAR,
BATCH_ID VARCHAR,
PIPELINE_RUN_ID VARCHAR
)
COMMENT = '{CLIENT} BI employee roster — daily snapshots from Google Sheets {TAB_NAME} tab. batch_replace preserves one row per employee per BATCH_ID (execution date). Natural key: EID.';
{% if DB_SUFFIX == 'DEV' %}
GRANT INSERT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_AIRFLOW_LOADER_DEV;
GRANT SELECT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_AIRFLOW_LOADER_DEV;
GRANT SELECT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_DBT_TRANSFORMER_DEV;
{% endif %}
{% if DB_SUFFIX == 'PROD' %}
GRANT INSERT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_AIRFLOW_LOADER_PROD;
GRANT SELECT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_AIRFLOW_LOADER_PROD;
GRANT SELECT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_DBT_TRANSFORMER_PROD;
{% endif %}
GRANT SELECT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_DATA_ENGINEER_DEV;
{% if DB_SUFFIX == 'PROD' %}
GRANT SELECT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_DATA_ENGINEER_PROD;
{% endif %}
Column note: The column list above is the standard set. If the client's tab has additional columns, add them as
VARCHARbefore the metadata block. Run the pipeline once in dev first and usescaffold-bronzeto confirm the exact header names afterstandardize_headers()processes them.
Deploy with schemachange — see Schemachange Deployment for the full deployment procedure.
Step 2: DAG Config (add source to YAML)¶
Add a new source block to apps/airflow/xo-pipelines/dags/configs/core-gsheets-bi-rosters-daily.yaml under the sources: key:
bi_roster_{client}:
source_type: gsheet
load_strategy: incremental
extractor:
credentials_var: DATA_ENGINEERING_SERVICE_ACCOUNT
spreadsheet_id: "1TDfcwDHQFM3fmAJhQOzoWjDbTikJCN0wvQ5RYh2xjNY"
sheet_name: {TAB_NAME} # exact tab label in the Google Sheet
paths:
report_name: bi_roster_{client}
filename_pattern: "bi_roster_{client}_{date}.csv"
snowflake:
target_table: GSHEET_BI_ROSTER_{CLIENT}
load_strategy: batch_replace
deduplication:
strategy: single_field
unique_columns:
- EID
use_hash: false
Replace {client} (lowercase) and {CLIENT} (uppercase) and {TAB_NAME} (exact Google Sheet tab label) with the real values. Validate the config before opening a PR:
uv run xo-foundry validate-config \
--config apps/airflow/xo-pipelines/dags/configs/core-gsheets-bi-rosters-daily.yaml
Step 3: Silver dbt Model¶
Create apps/dbt-models/xo_medallion/models/silver/core/bi_roster_{client}.sql.
Use the Warby Parker model as the template (silver/core/bi_roster_warbyparker.sql):
{{
config(
materialized='table',
tags=['silver', 'core', 'roster', 'core_bi_roster_{client}', 'core_bi_roster'],
schema='SILVER',
)
}}
/*
SILVER Layer: {CLIENT} BI Roster — Current State Table
Purpose: Current-state view of the {CLIENT} BI employee roster (latest daily snapshot only).
Strategy: table materialization — rebuilds completely each run from the latest BATCH_ID.
Source: BRONZE.GSHEET_BI_ROSTER_{CLIENT} (batch_replace, historical snapshots retained)
Target: SILVER.BI_ROSTER_{CLIENT} (current state — one row per employee in latest batch)
Natural key: EID (single field, not hashed)
*/
WITH latest_batch AS (
SELECT MAX(BATCH_ID) AS max_batch_id
FROM {{ source('core_bronze', 'GSHEET_BI_ROSTER_{CLIENT}') }}
)
SELECT
EID,
-- ... source columns (add all columns from the Bronze table)
-- Use TRY_TO_DATE() for date columns, leave text columns as VARCHAR
TRY_TO_DATE(START_DATE) AS START_DATE,
TRY_TO_DATE(END_DATE) AS END_DATE,
-- ... remaining columns ...
-- Metadata
DATE_TO_WAREHOUSE,
UPDATE_DATE,
RECORD_KEY,
RECORD_HASH,
SOURCE_FILE,
BATCH_ID,
PIPELINE_RUN_ID
FROM {{ source('core_bronze', 'GSHEET_BI_ROSTER_{CLIENT}') }} AS bronze
INNER JOIN latest_batch
ON bronze.BATCH_ID = latest_batch.max_batch_id
Type casting rule: All Bronze columns are
VARCHAR. In Silver, cast date columns withTRY_TO_DATE(), timestamps withTRY_TO_TIMESTAMP_NTZ(), and numbers withTRY_TO_NUMBER(). LeaveEID, names, and flag columns asVARCHAR. See the dbt rules for the full casting guide.
Step 4: schema.yml Entry¶
Add the model to apps/dbt-models/xo_medallion/models/silver/core/schema.yml:
- name: bi_roster_{client}
description: >
Current {CLIENT} BI employee roster. Latest daily snapshot from the Standard_Roster
Google Sheet ({TAB_NAME} tab). Rebuilt each run from CORE_DB.BRONZE.GSHEET_BI_ROSTER_{CLIENT}
(latest BATCH_ID). One row per employee.
meta:
owner: data-engineering
source: BRONZE.GSHEET_BI_ROSTER_{CLIENT}
load_strategy: table
materialization: table
columns:
- name: EID
description: "Employee ID — natural key from the Google Sheet"
tests:
- unique
- not_null
- name: RECORD_KEY
description: "Deduplication key from xo-foundry (EID as string)"
- name: BATCH_ID
description: "Airflow execution date (YYYY-MM-DD) — max value is the active snapshot"
Step 5 (Optional): Gold Dimension¶
If the client's employee data is needed by downstream Gold facts (e.g., fct_contacts for a new client), create a Gold dimension model.
File: apps/dbt-models/xo_medallion/models/gold/core/dim_employee_{client}.sql
Use gold/core/dim_employee_warbyparker.sql as the template. The model reads from the Silver roster via ref():
{{
config(
materialized='table',
tags=['gold', 'core', 'roster', 'core_bi_roster_{client}', 'core_bi_roster'],
schema='GOLD',
)
}}
SELECT
EID,
-- ... select and rename columns as needed ...
DATE_TO_WAREHOUSE,
BATCH_ID
FROM {{ ref('bi_roster_{client}') }}
Per ADR 010, all employee dimensions live in CORE_DB.GOLD, not in the client database.
Step 6: Update Reference Docs¶
After deploying:
- Add the Bronze table to
docs/reference/snowflake-object-inventory.mdunderCORE_DB > BRONZE Schema - Add the Silver model to
docs/reference/snowflake-object-inventory.mdunderCORE_DB > SILVER Schema - Add the Silver model to
docs/reference/dbt-model-registry.mdunderCORE_DB Silver Models - If Gold dim was created: add it to the inventory under
CORE_DB > GOLD Schemaand to the registry underCORE_DB Gold Models
Verification¶
# 1. Validate DAG config
uv run xo-foundry validate-config \
--config apps/airflow/xo-pipelines/dags/configs/core-gsheets-bi-rosters-daily.yaml
# 2. Compile dbt models (from apps/dbt-models/xo_medallion/)
dbt compile --select bi_roster_{client}
dbt compile --select dim_employee_{client} # if Gold was added
# 3. After Bronze pipeline runs (trigger manually in Airflow dev):
# Check that rows landed in Bronze
SELECT COUNT(*), MAX(BATCH_ID)
FROM CORE_DB_DEV.BRONZE.GSHEET_BI_ROSTER_{CLIENT};
# 4. Run Silver model
dbt run --select bi_roster_{client}
# 5. Verify Silver row count matches Bronze latest batch
SELECT COUNT(*) FROM CORE_DB_DEV.SILVER.BI_ROSTER_{CLIENT};
# 6. Run dbt tests
dbt test --select bi_roster_{client}
# 7. Run Gold dim (if added)
dbt run --select dim_employee_{client}
dbt test --select dim_employee_{client}
Troubleshooting¶
Column name mismatch: If the Silver model references a column that doesn't exist in Bronze, the xo-foundry standardize_headers() function may have transformed the Google Sheet header differently than expected. Run:
SELECT COLUMN_NAME
FROM CORE_DB_DEV.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'GSHEET_BI_ROSTER_{CLIENT}'
ORDER BY ORDINAL_POSITION;
Then align the Silver model column list to match.
EID uniqueness test fails: The unique test on EID will fail if the Google Sheet has duplicate EID values. Check the source tab for duplicates and coordinate with the roster owner to clean the data before re-running.
Missing tab in Google Sheet: If the extractor fails with a "sheet not found" error, verify the sheet_name in the YAML exactly matches the tab label (case-sensitive) in the Google Sheet.
See also: xo-foundry DAG Guide | Schemachange Deployment | dbt Development Workflow | ADR 010 | ADR 011