Skip to content

Naming Conventions

This document defines the naming standards used throughout the XO-Data platform.

General Principles

  • Consistency: Use the same pattern everywhere
  • Readability: Names should be self-documenting
  • UPPERCASE: Snowflake objects use UPPERCASE
  • Underscores: Separate words with _ (no spaces, hyphens, or camelCase)
  • No special characters: Only alphanumeric and underscores

Snowflake Naming

Database Names

Pattern: {CLIENT_CODE}_DB

Pattern Client Description
WBP_DB Warby Parker Client database
CND_DB Conde Nast Client database
MMN_DB Morgan & Morgan Client database
CORE_DB Core Shared infrastructure

Client Codes: See Client Registry for the complete list of client codes and domains. CORE is reserved for shared/cross-client infrastructure.

Environment suffixes

The canonical database name is {CLIENT_CODE}_DB. Development environments use the _DEV suffix (e.g., WBP_DB_DEV) configured via YAML environments config, not hardcoded into names.

Schema Names

Pattern: Layer-based (Medallion Architecture)

Schema Purpose Example
BRONZE Raw data from sources WBP_DB.BRONZE
SILVER Cleaned, typed data WBP_DB.SILVER
GOLD Analytics metrics WBP_DB.GOLD

Table Names

BRONZE Layer Tables

Pattern: {SOURCE_SYSTEM}_{OBJECT}

Pattern Example Description
GLADLY_CONTACT_TIMESTAMPS Gladly contact timestamps Raw Gladly report data
GLADLY_CONVERSATION_TIMESTAMPS Gladly conversations Raw Gladly conversations
BAMBOOHR_EMPLOYEES BambooHR employees Raw employee roster
SPROUT_MESSAGES Sprout Social messages Raw Sprout Social data
GSHEETS_INBOX_GLOSSARY Google Sheets glossary Raw Google Sheets data

Source System Codes:

  • GLADLY_* -- Gladly customer service platform
  • SPROUT_* -- Sprout Social social media platform
  • BAMBOOHR_* -- BambooHR HR system
  • MEDALLIA_* -- Medallia survey platform
  • GMAIL_* -- Gmail attachments
  • GSHEETS_* -- Google Sheets data
  • GSPREAD_* -- Google Sheets via gspread library

SILVER Layer Tables

Pattern: {OBJECT} (no prefix -- the schema name SILVER already indicates the layer)

Example Description
CONTACT_TIMESTAMPS Cleaned contact timestamp data
CONVERSATION_TIMESTAMPS Cleaned conversation data
MESSAGES Cleaned message data
INBOX_GLOSSARY Client inbox glossary (reference data)

No type prefixes in Silver

Silver tables use plain object names: CONTACT_TIMESTAMPS, not WH_CONTACT_TIMESTAMPS or FCT_CONTACT_TIMESTAMPS. The schema name indicates the layer; the table name describes what the data is. See ADR 009.

GOLD Layer Tables

Pattern: Four-type system per ADR 008 and ADR 009

Prefix Type Purpose Example
fct_ Fact Enriched granular data (same grain as Silver + JOINs) fct_contacts, fct_conversations
dim_ Dimension Current state entities (SCD Type ½) dim_agents, dim_customers
agg_ Aggregate Pre-aggregated metrics (building blocks) agg_agent_daily, agg_channel_weekly
rpt_ Report End-user consumption views (zero storage) rpt_agent_dashboard, rpt_email_daily

Gold naming rules

  • Facts: fct_{object} -- enriched with roster/glossary JOINs, filtered to XO agents
  • Dimensions: dim_{entity} -- reference data for downstream joins
  • Aggregates: agg_{subject}_{grain} -- reusable building blocks
  • Reports: rpt_{purpose} -- Tableau-ready views, zero storage cost

Column Names

Pattern: {OBJECT}_{ATTRIBUTE} or {ATTRIBUTE}

Rules:

  • UPPERCASE only
  • Underscores to separate words
  • No special characters (except _)

Standard Column Patterns

Primary keys:

{OBJECT}_ID                 -- CONVERSATION_ID, AGENT_ID, CUSTOMER_ID

Timestamps:

CREATED_AT                  -- When record was created
UPDATED_AT                  -- When record was last modified
DELETED_AT                  -- Soft delete timestamp
{EVENT}_AT                  -- Specific event time: CLOSED_AT, ASSIGNED_AT

Booleans — always prefix with IS_ or HAS_:

IS_{CONDITION}              -- IS_ACTIVE, IS_DELETED, IS_RESOLVED, IS_XO_AGENT
HAS_{ATTRIBUTE}             -- HAS_ATTACHMENTS, HAS_NOTES

Metrics — include the unit or aggregation type:

TOTAL_{METRIC}              -- TOTAL_CONVERSATIONS, TOTAL_CONTACTS
AVG_{METRIC}                -- AVG_HANDLE_TIME, AVG_CSAT_SCORE
COUNT_{OBJECT}              -- COUNT_AGENTS, COUNT_MESSAGES
{METRIC}_{UNIT}             -- HANDLE_TIME_MINUTES, HANDLE_TIME_SECONDS, WEIGHT_KG

Dates:

{EVENT}_DATE                -- CREATED_DATE, CLOSED_DATE, REPORT_DATE

Examples:

-- Good: clear, descriptive, includes unit
CONVERSATION_ID
CREATED_AT
HANDLE_TIME_MINUTES
IS_RESOLVED
IS_XO_AGENT
TOTAL_CONVERSATIONS
AVG_CSAT_SCORE

-- Avoid: vague, missing context, wrong case
time                        -- Too vague, missing unit
active                      -- Missing boolean prefix (use IS_ACTIVE)
ConversationID              -- Mixed case
created                     -- Missing _AT suffix

Metadata Columns (BRONZE)

All Bronze tables include these standard metadata columns:

Column Type Purpose
RECORD_KEY VARCHAR Deduplication key (raw or hashed)
RECORD_HASH VARCHAR NULL-safe hash of ALL data columns
DATE_TO_WAREHOUSE TIMESTAMP_NTZ When data was loaded to Bronze
SOURCE_FILE VARCHAR Source file name
BATCH_ID VARCHAR Batch identifier
PIPELINE_RUN_ID VARCHAR Airflow run identifier

S3 Naming

Bucket Names

Bucket Purpose
xo-ingest Raw data ingestion from sources
xo-stage Staged data with standardized columns
xo-archive Historical data backups

S3 Path Structure

Ingest Bucket

Pattern: {domain}/{report}/{date}/{filename}

s3://xo-ingest/warbyparker/contact_timestamps/2026-01-15/contact_timestamps_2026-01-15.csv
s3://xo-ingest/condenast/conversations/2026-01-15/conversations_2026-01-15.csv

Stage Bucket

Pattern: {domain}/{report}/{load_strategy}/{date_or_timestamp}/{filename}

s3://xo-stage/warbyparker/contact_timestamps/full_refresh/2026-01-15/contact_timestamps_2026-01-15.csv
s3://xo-stage/warbyparker/work_sessions/full_refresh/2026-01-15/work_sessions_2026-01-15.csv
s3://xo-stage/condenast/conversations/incremental/2026-01-15T06:50:00/conversations.csv

Load strategy in S3 path

The stage bucket path includes a load_strategy segment (full_refresh, incremental, or historical) to organize files by their loading pattern. See ADR 001.

Airflow Naming

DAG Names

Pattern: {domain}_{pipeline_name}_dag

Pattern Example Description
warbyparker_gladly_daily_dag Warby Parker Gladly Daily Gladly extraction
condenast_gladly_conversations_dag Conde Nast Gladly Conversations extraction
core_bamboohr_roster_dag Core BambooHR roster sync

Task Names

Pattern: {action}_{object}

# Extraction tasks
extract_gladly_data
extract_sprout_data
extract_gsheets_data

# Staging tasks
copy_and_standardize

# Loading tasks
copy_to_snowflake

# Merge tasks
merge_to_snowflake

Task Group Names

Pattern: {group_type}_{description}

# Task groups
extract_all_reports
load_multiple_tables

Python Naming

Package Names

Pattern: xo-{package_name} (kebab-case for PyPI)

xo-core
xo-foundry
xo-lens
xo-bosun

Module Names

Pattern: {module_name}.py (lowercase, underscores)

# xo-core modules
snowflake_manager.py
s3_manager.py
df_utils.py
file_utils.py

# xo-foundry modules
extract_tasks.py
stage_tasks.py
snowflake_tasks.py
merge_tasks.py
task_utils.py

Class Names

Pattern: {Name}Manager or {Name}Extractor (PascalCase)

class SnowflakeManager:
    pass

class GladlyExtractor:
    pass

class SproutExtractor:
    pass

Function Names

Pattern: {verb}_{noun} (snake_case)

def extract_gladly_data():
    pass

def clean_column_names():
    pass

def copy_to_snowflake():
    pass

Variable Names

Pattern: {description} (snake_case)

# Good
s3_bucket_name = "xo-ingest"
created_at_column = "CREATED_AT"
target_table = "GLADLY_CONTACT_TIMESTAMPS"

# Avoid
bucket = "xo-ingest"  # Missing context
df = extract_data()    # Too generic

File Naming

Python Files

Pattern: {module_name}.py (lowercase, underscores)

df_utils.py
snowflake_manager.py
gladly_extractor.py
sprout_extractor.py

Configuration Files

Pattern: {client}-{source}-{pipeline}.yaml (lowercase, hyphens)

# YAML configs (production)
warbyparker-gladly-daily.yaml
condenast-gladly-conversations.yaml

# Environment files
.env
.env.local

# Python configs
pyproject.toml

SQL Files

Pattern: Depends on tool

schemachange (Snowflake BRONZE)

Pattern: V{major}.{minor}.{patch}__{description}.sql

-- Version migrations
V1.0.0__create_databases.sql
V1.1.0__create_schemas.sql
V1.2.0__create_gladly_contact_timestamps.sql

-- Repeatable migrations
R__create_materialized_view.sql
R__update_permissions.sql

dbt (Snowflake SILVER/GOLD)

Pattern: {object}.sql or {prefix}_{object}.sql (lowercase, underscores)

-- Silver models (no prefix)
contact_timestamps.sql
messages.sql
inbox_glossary.sql

-- Gold models (typed prefix)
fct_contacts.sql
dim_agents.sql
agg_agent_daily.sql
rpt_email_daily.sql

Documentation Files

Pattern: {TITLE}.md (UPPERCASE or Title Case)

README.md
CLAUDE.md
CHANGELOG.md

Git Branch Names

Pattern: {type}/{description} (lowercase, hyphens)

# Feature branches
feat/add-sprout-extractor
feat/dag-factory

# Bug fixes
fix/snowflake-connection-timeout
fix/deduplication-logic

# Documentation
docs/update-naming-conventions

# Releases
release/v1.0.0

Environment Variables

Pattern: {SERVICE}_{ATTRIBUTE} (UPPERCASE, underscores)

# Snowflake
SNOWFLAKE_ACCOUNT=abc123
SNOWFLAKE_USER=dataeng@company.com
SNOWFLAKE_PASSWORD=secret
SNOWFLAKE_WAREHOUSE=XO_WH

# AWS
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=secret
AWS_DEFAULT_REGION=us-east-1

# Gladly
GLADLY_API_USER=api@company.com
GLADLY_API_TOKEN=secret
GLADLY_ORG=company

Anti-Patterns (Avoid These)

Mixed Case in Snowflake

-- Bad
CREATE TABLE MyDatabase.MySchema.MyTable (
    customerId INT,
    customerName VARCHAR
);

-- Good
CREATE TABLE MY_DATABASE.MY_SCHEMA.MY_TABLE (
    CUSTOMER_ID INT,
    CUSTOMER_NAME VARCHAR
);

Redundant Layer Prefixes

-- Bad: prefix repeats schema name
SILVER.WH_CONVERSATIONS    -- "WH" adds nothing
SILVER.FCT_INTERACTIONS     -- "FCT" belongs in Gold
GOLD.DM_SCORECARD_GPP       -- "DM" is vague

-- Good: schema conveys layer, name conveys content
SILVER.CONVERSATIONS
SILVER.CONTACT_TIMESTAMPS
GOLD.fct_contacts
GOLD.agg_agent_daily
GOLD.rpt_email_daily

Special Characters

-- Bad
CREATE TABLE "Customer-Data" (
    "Customer#ID" INT,
    "Customer Name" VARCHAR
);

-- Good
CREATE TABLE CUSTOMER_DATA (
    CUSTOMER_ID INT,
    CUSTOMER_NAME VARCHAR
);

Summary Table

Component Pattern Example
Databases {CLIENT}_DB WBP_DB
Schemas {LAYER} BRONZE, SILVER, GOLD
BRONZE Tables {SOURCE}_{OBJECT} GLADLY_CONTACT_TIMESTAMPS
SILVER Tables {OBJECT} CONTACT_TIMESTAMPS
GOLD Facts fct_{object} fct_contacts
GOLD Dimensions dim_{entity} dim_agents
GOLD Aggregates agg_{subject}_{grain} agg_agent_daily
GOLD Reports rpt_{purpose} rpt_email_daily
Metadata Columns See table above RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE
S3 Paths {domain}/{report}/{load_strategy}/{date} warbyparker/contacts/full_refresh/2026-01-15
DAGs {domain}_{pipeline}_dag warbyparker_gladly_daily_dag
Python Classes {Name}Manager SnowflakeManager
Python Functions {verb}_{noun} extract_gladly_data
SQL Migrations V{x}.{y}.{z}__{desc}.sql V1.2.0__create_table.sql

Next Steps


Related Documentation: