Naming Conventions¶

This document defines the naming standards used throughout the XO-Data platform.

General Principles¶

Consistency: Use the same pattern everywhere
Readability: Names should be self-documenting
UPPERCASE: Snowflake objects use UPPERCASE
Underscores: Separate words with _ (no spaces, hyphens, or camelCase)
No special characters: Only alphanumeric and underscores

Snowflake Naming¶

Database Names¶

Pattern: {CLIENT_CODE}_DB

Pattern	Client	Description
`WBP_DB`	Warby Parker	Client database
`CND_DB`	Conde Nast	Client database
`MMN_DB`	Morgan & Morgan	Client database
`CORE_DB`	Core	Shared infrastructure

Client Codes: See Client Registry for the complete list of client codes and domains. CORE is reserved for shared/cross-client infrastructure.

Environment suffixes

The canonical database name is {CLIENT_CODE}_DB. Development environments use the _DEV suffix (e.g., WBP_DB_DEV) configured via YAML environments config, not hardcoded into names.

Schema Names¶

Pattern: Layer-based (Medallion Architecture)

Schema	Purpose	Example
`BRONZE`	Raw data from sources	`WBP_DB.BRONZE`
`SILVER`	Cleaned, typed data	`WBP_DB.SILVER`
`GOLD`	Analytics metrics	`WBP_DB.GOLD`

Table Names¶

BRONZE Layer Tables¶

Pattern: {SOURCE_SYSTEM}_{OBJECT}

Pattern	Example	Description
`GLADLY_CONTACT_TIMESTAMPS`	Gladly contact timestamps	Raw Gladly report data
`GLADLY_CONVERSATION_TIMESTAMPS`	Gladly conversations	Raw Gladly conversations
`BAMBOOHR_EMPLOYEES`	BambooHR employees	Raw employee roster
`SPROUT_MESSAGES`	Sprout Social messages	Raw Sprout Social data
`GSHEETS_INBOX_GLOSSARY`	Google Sheets glossary	Raw Google Sheets data

Source System Codes:

GLADLY_* -- Gladly customer service platform
SPROUT_* -- Sprout Social social media platform
BAMBOOHR_* -- BambooHR HR system
MEDALLIA_* -- Medallia survey platform
GMAIL_* -- Gmail attachments
GSHEETS_* -- Google Sheets data
GSPREAD_* -- Google Sheets via gspread library

SILVER Layer Tables¶

Pattern: {OBJECT} (no prefix -- the schema name SILVER already indicates the layer)

Example	Description
`CONTACT_TIMESTAMPS`	Cleaned contact timestamp data
`CONVERSATION_TIMESTAMPS`	Cleaned conversation data
`MESSAGES`	Cleaned message data
`INBOX_GLOSSARY`	Client inbox glossary (reference data)

No type prefixes in Silver

Silver tables use plain object names: CONTACT_TIMESTAMPS, not WH_CONTACT_TIMESTAMPS or FCT_CONTACT_TIMESTAMPS. The schema name indicates the layer; the table name describes what the data is. See ADR 009.

GOLD Layer Tables¶

Pattern: Four-type system per ADR 008 and ADR 009

Prefix	Type	Purpose	Example
`fct_`	Fact	Enriched granular data (same grain as Silver + JOINs)	`fct_contacts`, `fct_conversations`
`dim_`	Dimension	Current state entities (SCD Type ½)	`dim_agents`, `dim_customers`
`agg_`	Aggregate	Pre-aggregated metrics (building blocks)	`agg_agent_daily`, `agg_channel_weekly`
`rpt_`	Report	End-user consumption views (zero storage)	`rpt_agent_dashboard`, `rpt_email_daily`

Gold naming rules

Facts: fct_{object} -- enriched with roster/glossary JOINs, filtered to XO agents
Dimensions: dim_{entity} -- reference data for downstream joins
Aggregates: agg_{subject}_{grain} -- reusable building blocks
Reports: rpt_{purpose} -- Tableau-ready views, zero storage cost

Column Names¶

Pattern: {OBJECT}_{ATTRIBUTE} or {ATTRIBUTE}

Rules:

UPPERCASE only
Underscores to separate words
No special characters (except _)

Standard Column Patterns¶

Primary keys:

{OBJECT}_ID                 -- CONVERSATION_ID, AGENT_ID, CUSTOMER_ID

Timestamps:

CREATED_AT                  -- When record was created
UPDATED_AT                  -- When record was last modified
DELETED_AT                  -- Soft delete timestamp
{EVENT}_AT                  -- Specific event time: CLOSED_AT, ASSIGNED_AT

Booleans — always prefix with IS_ or HAS_:

IS_{CONDITION}              -- IS_ACTIVE, IS_DELETED, IS_RESOLVED, IS_XO_AGENT
HAS_{ATTRIBUTE}             -- HAS_ATTACHMENTS, HAS_NOTES

Metrics — include the unit or aggregation type:

TOTAL_{METRIC}              -- TOTAL_CONVERSATIONS, TOTAL_CONTACTS
AVG_{METRIC}                -- AVG_HANDLE_TIME, AVG_CSAT_SCORE
COUNT_{OBJECT}              -- COUNT_AGENTS, COUNT_MESSAGES
{METRIC}_{UNIT}             -- HANDLE_TIME_MINUTES, HANDLE_TIME_SECONDS, WEIGHT_KG

Dates:

{EVENT}_DATE                -- CREATED_DATE, CLOSED_DATE, REPORT_DATE

Examples:

-- Good: clear, descriptive, includes unit
CONVERSATION_ID
CREATED_AT
HANDLE_TIME_MINUTES
IS_RESOLVED
IS_XO_AGENT
TOTAL_CONVERSATIONS
AVG_CSAT_SCORE

-- Avoid: vague, missing context, wrong case
time                        -- Too vague, missing unit
active                      -- Missing boolean prefix (use IS_ACTIVE)
ConversationID              -- Mixed case
created                     -- Missing _AT suffix

Metadata Columns (BRONZE)¶

All Bronze tables include these standard metadata columns:

Column	Type	Purpose
`RECORD_KEY`	VARCHAR	Deduplication key (raw or hashed)
`RECORD_HASH`	VARCHAR	NULL-safe hash of ALL data columns
`DATE_TO_WAREHOUSE`	TIMESTAMP_NTZ	When data was loaded to Bronze
`SOURCE_FILE`	VARCHAR	Source file name
`BATCH_ID`	VARCHAR	Batch identifier
`PIPELINE_RUN_ID`	VARCHAR	Airflow run identifier

S3 Naming¶

Bucket Names¶

Bucket	Purpose
`xo-ingest`	Raw data ingestion from sources
`xo-stage`	Staged data with standardized columns
`xo-archive`	Historical data backups

S3 Path Structure¶

Ingest Bucket¶

Pattern: {domain}/{report}/{date}/{filename}

s3://xo-ingest/warbyparker/contact_timestamps/2026-01-15/contact_timestamps_2026-01-15.csv
s3://xo-ingest/condenast/conversations/2026-01-15/conversations_2026-01-15.csv

Stage Bucket¶

Pattern: {domain}/{report}/{load_strategy}/{date_or_timestamp}/{filename}

s3://xo-stage/warbyparker/contact_timestamps/full_refresh/2026-01-15/contact_timestamps_2026-01-15.csv
s3://xo-stage/warbyparker/work_sessions/full_refresh/2026-01-15/work_sessions_2026-01-15.csv
s3://xo-stage/condenast/conversations/incremental/2026-01-15T06:50:00/conversations.csv

Load strategy in S3 path

The stage bucket path includes a load_strategy segment (full_refresh, incremental, or historical) to organize files by their loading pattern. See ADR 001.

Airflow Naming¶

DAG Names¶

Pattern: {domain}_{pipeline_name}_dag

Pattern	Example	Description
`warbyparker_gladly_daily_dag`	Warby Parker Gladly	Daily Gladly extraction
`condenast_gladly_conversations_dag`	Conde Nast Gladly	Conversations extraction
`core_bamboohr_roster_dag`	Core	BambooHR roster sync

Task Names¶

Pattern: {action}_{object}

# Extraction tasks
extract_gladly_data
extract_sprout_data
extract_gsheets_data

# Staging tasks
copy_and_standardize

# Loading tasks
copy_to_snowflake

# Merge tasks
merge_to_snowflake

Task Group Names¶

Pattern: {group_type}_{description}

# Task groups
extract_all_reports
load_multiple_tables

Python Naming¶

Package Names¶

Pattern: xo-{package_name} (kebab-case for PyPI)

xo-core
xo-foundry
xo-lens
xo-bosun

Module Names¶

Pattern: {module_name}.py (lowercase, underscores)

# xo-core modules
snowflake_manager.py
s3_manager.py
df_utils.py
file_utils.py

# xo-foundry modules
extract_tasks.py
stage_tasks.py
snowflake_tasks.py
merge_tasks.py
task_utils.py

Class Names¶

Pattern: {Name}Manager or {Name}Extractor (PascalCase)

class SnowflakeManager:
    pass

class GladlyExtractor:
    pass

class SproutExtractor:
    pass

Function Names¶

Pattern: {verb}_{noun} (snake_case)

def extract_gladly_data():
    pass

def clean_column_names():
    pass

def copy_to_snowflake():
    pass

Variable Names¶

Pattern: {description} (snake_case)

# Good
s3_bucket_name = "xo-ingest"
created_at_column = "CREATED_AT"
target_table = "GLADLY_CONTACT_TIMESTAMPS"

# Avoid
bucket = "xo-ingest"  # Missing context
df = extract_data()    # Too generic

File Naming¶

Python Files¶

Pattern: {module_name}.py (lowercase, underscores)

df_utils.py
snowflake_manager.py
gladly_extractor.py
sprout_extractor.py

Configuration Files¶

Pattern: {client}-{source}-{pipeline}.yaml (lowercase, hyphens)

# YAML configs (production)
warbyparker-gladly-daily.yaml
condenast-gladly-conversations.yaml

# Environment files
.env
.env.local

# Python configs
pyproject.toml

SQL Files¶

Pattern: Depends on tool

schemachange (Snowflake BRONZE)¶

Pattern: V{major}.{minor}.{patch}__{description}.sql

-- Version migrations
V1.0.0__create_databases.sql
V1.1.0__create_schemas.sql
V1.2.0__create_gladly_contact_timestamps.sql

-- Repeatable migrations
R__create_materialized_view.sql
R__update_permissions.sql

dbt (Snowflake SILVER/GOLD)¶

Pattern: {object}.sql or {prefix}_{object}.sql (lowercase, underscores)

-- Silver models (no prefix)
contact_timestamps.sql
messages.sql
inbox_glossary.sql

-- Gold models (typed prefix)
fct_contacts.sql
dim_agents.sql
agg_agent_daily.sql
rpt_email_daily.sql

Documentation Files¶

Pattern: {TITLE}.md (UPPERCASE or Title Case)

README.md
CLAUDE.md
CHANGELOG.md

Git Branch Names¶

Pattern: {type}/{description} (lowercase, hyphens)

# Feature branches
feat/add-sprout-extractor
feat/dag-factory

# Bug fixes
fix/snowflake-connection-timeout
fix/deduplication-logic

# Documentation
docs/update-naming-conventions

# Releases
release/v1.0.0

Environment Variables¶

Pattern: {SERVICE}_{ATTRIBUTE} (UPPERCASE, underscores)

# Snowflake
SNOWFLAKE_ACCOUNT=abc123
SNOWFLAKE_USER=dataeng@company.com
SNOWFLAKE_PASSWORD=secret
SNOWFLAKE_WAREHOUSE=XO_WH

# AWS
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=secret
AWS_DEFAULT_REGION=us-east-1

# Gladly
GLADLY_API_USER=api@company.com
GLADLY_API_TOKEN=secret
GLADLY_ORG=company

Anti-Patterns (Avoid These)¶

Mixed Case in Snowflake¶

-- Bad
CREATE TABLE MyDatabase.MySchema.MyTable (
    customerId INT,
    customerName VARCHAR
);

-- Good
CREATE TABLE MY_DATABASE.MY_SCHEMA.MY_TABLE (
    CUSTOMER_ID INT,
    CUSTOMER_NAME VARCHAR
);

Redundant Layer Prefixes¶

-- Bad: prefix repeats schema name
SILVER.WH_CONVERSATIONS    -- "WH" adds nothing
SILVER.FCT_INTERACTIONS     -- "FCT" belongs in Gold
GOLD.DM_SCORECARD_GPP       -- "DM" is vague

-- Good: schema conveys layer, name conveys content
SILVER.CONVERSATIONS
SILVER.CONTACT_TIMESTAMPS
GOLD.fct_contacts
GOLD.agg_agent_daily
GOLD.rpt_email_daily

Special Characters¶

-- Bad
CREATE TABLE "Customer-Data" (
    "Customer#ID" INT,
    "Customer Name" VARCHAR
);

-- Good
CREATE TABLE CUSTOMER_DATA (
    CUSTOMER_ID INT,
    CUSTOMER_NAME VARCHAR
);

Summary Table¶

Component	Pattern	Example
Databases	`{CLIENT}_DB`	`WBP_DB`
Schemas	`{LAYER}`	`BRONZE`, `SILVER`, `GOLD`
BRONZE Tables	`{SOURCE}_{OBJECT}`	`GLADLY_CONTACT_TIMESTAMPS`
SILVER Tables	`{OBJECT}`	`CONTACT_TIMESTAMPS`
GOLD Facts	`fct_{object}`	`fct_contacts`
GOLD Dimensions	`dim_{entity}`	`dim_agents`
GOLD Aggregates	`agg_{subject}_{grain}`	`agg_agent_daily`
GOLD Reports	`rpt_{purpose}`	`rpt_email_daily`
Metadata Columns	See table above	`RECORD_KEY`, `RECORD_HASH`, `DATE_TO_WAREHOUSE`
S3 Paths	`{domain}/{report}/{load_strategy}/{date}`	`warbyparker/contacts/full_refresh/2026-01-15`
DAGs	`{domain}_{pipeline}_dag`	`warbyparker_gladly_daily_dag`
Python Classes	`{Name}Manager`	`SnowflakeManager`
Python Functions	`{verb}_{noun}`	`extract_gladly_data`
SQL Migrations	`V{x}.{y}.{z}__{desc}.sql`	`V1.2.0__create_table.sql`

Next Steps¶

Architecture Overview -- Understand the system design
Snowflake Medallion Layers -- Layer naming in practice
ELT Layer Architecture -- Layer responsibilities
Architecture Decisions -- ADRs that shaped these conventions

Related Documentation: