BI Roster: Adding a New Client Tab¶

Purpose¶

Add a new client to the Standard_Roster Google Sheet pipeline so their employee data flows from the master Google Sheet into CORE_DB.BRONZE, then through Silver and Gold via dbt. This is the process for every new client tab onboarded into the BI roster system.

Prerequisites¶

The client's tab exists in the master Standard_Roster Google Sheet (1TDfcwDHQFM3fmAJhQOzoWjDbTikJCN0wvQ5RYh2xjNY)
The tab has a header row with EID as the natural key column
You know the exact sheet_name (as it appears in the Google Sheet tab label)
You know the client identifier to use as the {CLIENT} suffix (UPPER_SNAKE_CASE, e.g. NEWCLIENT)
Access to deploy via schemachange and merge a dbt PR

Overview¶

Google Sheet tab
      │
      ▼  (xo-foundry, daily 6 AM EST)
CORE_DB.BRONZE.GSHEET_BI_ROSTER_{CLIENT}    ← batch_replace (daily snapshots)
      │
      ▼  (dbt Silver, triggered after Bronze load)
CORE_DB.SILVER.BI_ROSTER_{CLIENT}           ← table (latest batch only)
      │
      ▼  (dbt Gold, same DAG run)
CORE_DB.GOLD.DIM_EMPLOYEE_{CLIENT}          ← table (SCD Type 1, optional)

The Bronze table accumulates one snapshot per day (identified by BATCH_ID = execution date). The Silver model rebuilds completely each run, keeping only the latest BATCH_ID. The Gold dimension provides a canonical employee reference for downstream client facts.

Step 1: Bronze DDL (schemachange migration)¶

Create a new migration file in apps/snowflake-bronze/core/. Check the last version number in that directory and increment by 1.

File: apps/snowflake-bronze/core/V{next}.0__create_gsheet_bi_roster_{client}.sql

Use the Warby Parker migration as the template (V1.4.0__create_gsheet_bi_roster_warbyparker.sql):

-- V{next}.0: Create BI Roster {CLIENT} Bronze Table
-- Source: Google Sheet 1TDfcwDHQFM3fmAJhQOzoWjDbTikJCN0wvQ5RYh2xjNY ({TAB_NAME} tab)
-- Strategy: batch_replace (appends daily snapshots, preserves history per ADR 011)
--
-- Column names are the standardized forms of the tab row-1 headers after
-- the xo-foundry sanitization + standardize_headers() pipeline:
--   1. extract_gsheet_data(): strip \n (first line only), strip whitespace, drop empty
--   2. standardize_headers(): UPPER_SNAKE_CASE, collapse underscores, SQL keyword handling
--
-- IMPORTANT: This runs as XO_SCHEMACHANGE_DEPLOYER role (set in connection)

USE DATABASE CORE_DB_{{ DB_SUFFIX }};
USE SCHEMA BRONZE;

CREATE TABLE IF NOT EXISTS GSHEET_BI_ROSTER_{CLIENT} (
    -- ======================================================================
    -- Source columns (all VARCHAR — scaffold after first pipeline run)
    -- Run: uv run xo-foundry scaffold-bronze --config ... to generate column list
    -- ======================================================================
    EID                 VARCHAR,   -- Natural key: Employee ID
    LAST_NAME           VARCHAR,
    FIRST_NAME          VARCHAR,
    GLADLY_NAME         VARCHAR,
    EMAIL               VARCHAR,
    JOB_TITLE           VARCHAR,
    DEPARTMENT          VARCHAR,
    TEAM                VARCHAR,
    TEAM_LEADER         VARCHAR,
    MANAGER             VARCHAR,
    START_DATE          VARCHAR,
    END_DATE            VARCHAR,
    STATUS              VARCHAR,
    LOCATION            VARCHAR,
    ACCOUNT             VARCHAR,
    LINE_OF_BUSINESS    VARCHAR,
    WORK_TYPE           VARCHAR,
    IS_XO_AGENT         VARCHAR,

    -- ======================================================================
    -- Metadata columns (required by xo-foundry copy_to_snowflake)
    -- ======================================================================
    DATE_TO_WAREHOUSE   TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
    UPDATE_DATE         TIMESTAMP_NTZ,
    RECORD_KEY          VARCHAR,
    RECORD_HASH         VARCHAR,
    SOURCE_FILE         VARCHAR,
    BATCH_ID            VARCHAR,
    PIPELINE_RUN_ID     VARCHAR
)
COMMENT = '{CLIENT} BI employee roster — daily snapshots from Google Sheets {TAB_NAME} tab. batch_replace preserves one row per employee per BATCH_ID (execution date). Natural key: EID.';

{% if DB_SUFFIX == 'DEV' %}
GRANT INSERT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_AIRFLOW_LOADER_DEV;
GRANT SELECT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_AIRFLOW_LOADER_DEV;
GRANT SELECT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_DBT_TRANSFORMER_DEV;
{% endif %}

{% if DB_SUFFIX == 'PROD' %}
GRANT INSERT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_AIRFLOW_LOADER_PROD;
GRANT SELECT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_AIRFLOW_LOADER_PROD;
GRANT SELECT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_DBT_TRANSFORMER_PROD;
{% endif %}

GRANT SELECT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_DATA_ENGINEER_DEV;

{% if DB_SUFFIX == 'PROD' %}
GRANT SELECT ON TABLE GSHEET_BI_ROSTER_{CLIENT} TO ROLE XO_DATA_ENGINEER_PROD;
{% endif %}

Column note: The column list above is the standard set. If the client's tab has additional columns, add them as VARCHAR before the metadata block. Run the pipeline once in dev first and use scaffold-bronze to confirm the exact header names after standardize_headers() processes them.

Deploy with schemachange — see Schemachange Deployment for the full deployment procedure.

Step 2: DAG Config (add source to YAML)¶

Add a new source block to apps/airflow/xo-pipelines/dags/configs/core-gsheets-bi-rosters-daily.yaml under the sources: key:

  bi_roster_{client}:
    source_type: gsheet
    load_strategy: incremental

    extractor:
      credentials_var: DATA_ENGINEERING_SERVICE_ACCOUNT
      spreadsheet_id: "1TDfcwDHQFM3fmAJhQOzoWjDbTikJCN0wvQ5RYh2xjNY"
      sheet_name: {TAB_NAME}   # exact tab label in the Google Sheet

    paths:
      report_name: bi_roster_{client}
      filename_pattern: "bi_roster_{client}_{date}.csv"

    snowflake:
      target_table: GSHEET_BI_ROSTER_{CLIENT}
      load_strategy: batch_replace
      deduplication:
        strategy: single_field
        unique_columns:
          - EID
        use_hash: false

Replace {client} (lowercase) and {CLIENT} (uppercase) and {TAB_NAME} (exact Google Sheet tab label) with the real values. Validate the config before opening a PR:

uv run xo-foundry validate-config \
  --config apps/airflow/xo-pipelines/dags/configs/core-gsheets-bi-rosters-daily.yaml

Step 3: Silver dbt Model¶

Create apps/dbt-models/xo_medallion/models/silver/core/bi_roster_{client}.sql.

Use the Warby Parker model as the template (silver/core/bi_roster_warbyparker.sql):

{{
    config(
        materialized='table',
        tags=['silver', 'core', 'roster', 'core_bi_roster_{client}', 'core_bi_roster'],
        schema='SILVER',
    )
}}

/*
SILVER Layer: {CLIENT} BI Roster — Current State Table

Purpose: Current-state view of the {CLIENT} BI employee roster (latest daily snapshot only).
Strategy: table materialization — rebuilds completely each run from the latest BATCH_ID.
Source: BRONZE.GSHEET_BI_ROSTER_{CLIENT} (batch_replace, historical snapshots retained)
Target: SILVER.BI_ROSTER_{CLIENT} (current state — one row per employee in latest batch)

Natural key: EID (single field, not hashed)
*/

WITH latest_batch AS (
    SELECT MAX(BATCH_ID) AS max_batch_id
    FROM {{ source('core_bronze', 'GSHEET_BI_ROSTER_{CLIENT}') }}
)

SELECT
    EID,
    -- ... source columns (add all columns from the Bronze table)
    -- Use TRY_TO_DATE() for date columns, leave text columns as VARCHAR
    TRY_TO_DATE(START_DATE)  AS START_DATE,
    TRY_TO_DATE(END_DATE)    AS END_DATE,
    -- ... remaining columns ...
    -- Metadata
    DATE_TO_WAREHOUSE,
    UPDATE_DATE,
    RECORD_KEY,
    RECORD_HASH,
    SOURCE_FILE,
    BATCH_ID,
    PIPELINE_RUN_ID

FROM {{ source('core_bronze', 'GSHEET_BI_ROSTER_{CLIENT}') }} AS bronze
INNER JOIN latest_batch
    ON bronze.BATCH_ID = latest_batch.max_batch_id

Type casting rule: All Bronze columns are VARCHAR. In Silver, cast date columns with TRY_TO_DATE(), timestamps with TRY_TO_TIMESTAMP_NTZ(), and numbers with TRY_TO_NUMBER(). Leave EID, names, and flag columns as VARCHAR. See the dbt rules for the full casting guide.

Step 4: schema.yml Entry¶

Add the model to apps/dbt-models/xo_medallion/models/silver/core/schema.yml:

  - name: bi_roster_{client}
    description: >
      Current {CLIENT} BI employee roster. Latest daily snapshot from the Standard_Roster
      Google Sheet ({TAB_NAME} tab). Rebuilt each run from CORE_DB.BRONZE.GSHEET_BI_ROSTER_{CLIENT}
      (latest BATCH_ID). One row per employee.

    meta:
      owner: data-engineering
      source: BRONZE.GSHEET_BI_ROSTER_{CLIENT}
      load_strategy: table
      materialization: table

    columns:
      - name: EID
        description: "Employee ID — natural key from the Google Sheet"
        tests:
          - unique
          - not_null

      - name: RECORD_KEY
        description: "Deduplication key from xo-foundry (EID as string)"

      - name: BATCH_ID
        description: "Airflow execution date (YYYY-MM-DD) — max value is the active snapshot"

Step 5 (Optional): Gold Dimension¶

If the client's employee data is needed by downstream Gold facts (e.g., fct_contacts for a new client), create a Gold dimension model.

File: apps/dbt-models/xo_medallion/models/gold/core/dim_employee_{client}.sql

Use gold/core/dim_employee_warbyparker.sql as the template. The model reads from the Silver roster via ref():

{{
    config(
        materialized='table',
        tags=['gold', 'core', 'roster', 'core_bi_roster_{client}', 'core_bi_roster'],
        schema='GOLD',
    )
}}

SELECT
    EID,
    -- ... select and rename columns as needed ...
    DATE_TO_WAREHOUSE,
    BATCH_ID
FROM {{ ref('bi_roster_{client}') }}

Per ADR 010, all employee dimensions live in CORE_DB.GOLD, not in the client database.

Step 6: Update Reference Docs¶

After deploying:

Add the Bronze table to docs/reference/snowflake-object-inventory.md under CORE_DB > BRONZE Schema
Add the Silver model to docs/reference/snowflake-object-inventory.md under CORE_DB > SILVER Schema
Add the Silver model to docs/reference/dbt-model-registry.md under CORE_DB Silver Models
If Gold dim was created: add it to the inventory under CORE_DB > GOLD Schema and to the registry under CORE_DB Gold Models

Verification¶

# 1. Validate DAG config
uv run xo-foundry validate-config \
  --config apps/airflow/xo-pipelines/dags/configs/core-gsheets-bi-rosters-daily.yaml

# 2. Compile dbt models (from apps/dbt-models/xo_medallion/)
dbt compile --select bi_roster_{client}
dbt compile --select dim_employee_{client}   # if Gold was added

# 3. After Bronze pipeline runs (trigger manually in Airflow dev):
#    Check that rows landed in Bronze
SELECT COUNT(*), MAX(BATCH_ID)
FROM CORE_DB_DEV.BRONZE.GSHEET_BI_ROSTER_{CLIENT};

# 4. Run Silver model
dbt run --select bi_roster_{client}

# 5. Verify Silver row count matches Bronze latest batch
SELECT COUNT(*) FROM CORE_DB_DEV.SILVER.BI_ROSTER_{CLIENT};

# 6. Run dbt tests
dbt test --select bi_roster_{client}

# 7. Run Gold dim (if added)
dbt run --select dim_employee_{client}
dbt test --select dim_employee_{client}

Troubleshooting¶

Column name mismatch: If the Silver model references a column that doesn't exist in Bronze, the xo-foundry standardize_headers() function may have transformed the Google Sheet header differently than expected. Run:

SELECT COLUMN_NAME
FROM CORE_DB_DEV.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'GSHEET_BI_ROSTER_{CLIENT}'
ORDER BY ORDINAL_POSITION;

Then align the Silver model column list to match.

EID uniqueness test fails: The unique test on EID will fail if the Google Sheet has duplicate EID values. Check the source tab for duplicates and coordinate with the roster owner to clean the data before re-running.

Missing tab in Google Sheet: If the extractor fails with a "sheet not found" error, verify the sheet_name in the YAML exactly matches the tab label (case-sensitive) in the Google Sheet.