Naming Conventions¶
This document defines the naming standards used throughout the XO-Data platform.
General Principles¶
- Consistency: Use the same pattern everywhere
- Readability: Names should be self-documenting
- UPPERCASE: Snowflake objects use UPPERCASE
- Underscores: Separate words with
_(no spaces, hyphens, or camelCase) - No special characters: Only alphanumeric and underscores
Snowflake Naming¶
Database Names¶
Pattern: {CLIENT_CODE}_DB
| Pattern | Client | Description |
|---|---|---|
WBP_DB |
Warby Parker | Client database |
CND_DB |
Conde Nast | Client database |
MMN_DB |
Morgan & Morgan | Client database |
CORE_DB |
Core | Shared infrastructure |
Client Codes: See Client Registry for the complete list of client codes and domains. CORE is reserved for shared/cross-client infrastructure.
Environment suffixes
The canonical database name is {CLIENT_CODE}_DB. Development environments use the _DEV suffix (e.g., WBP_DB_DEV) configured via YAML environments config, not hardcoded into names.
Schema Names¶
Pattern: Layer-based (Medallion Architecture)
| Schema | Purpose | Example |
|---|---|---|
BRONZE |
Raw data from sources | WBP_DB.BRONZE |
SILVER |
Cleaned, typed data | WBP_DB.SILVER |
GOLD |
Analytics metrics | WBP_DB.GOLD |
Table Names¶
BRONZE Layer Tables¶
Pattern: {SOURCE_SYSTEM}_{OBJECT}
| Pattern | Example | Description |
|---|---|---|
GLADLY_CONTACT_TIMESTAMPS |
Gladly contact timestamps | Raw Gladly report data |
GLADLY_CONVERSATION_TIMESTAMPS |
Gladly conversations | Raw Gladly conversations |
BAMBOOHR_EMPLOYEES |
BambooHR employees | Raw employee roster |
SPROUT_MESSAGES |
Sprout Social messages | Raw Sprout Social data |
GSHEETS_INBOX_GLOSSARY |
Google Sheets glossary | Raw Google Sheets data |
Source System Codes:
GLADLY_*-- Gladly customer service platformSPROUT_*-- Sprout Social social media platformBAMBOOHR_*-- BambooHR HR systemMEDALLIA_*-- Medallia survey platformGMAIL_*-- Gmail attachmentsGSHEETS_*-- Google Sheets dataGSPREAD_*-- Google Sheets via gspread library
SILVER Layer Tables¶
Pattern: {OBJECT} (no prefix -- the schema name SILVER already indicates the layer)
| Example | Description |
|---|---|
CONTACT_TIMESTAMPS |
Cleaned contact timestamp data |
CONVERSATION_TIMESTAMPS |
Cleaned conversation data |
MESSAGES |
Cleaned message data |
INBOX_GLOSSARY |
Client inbox glossary (reference data) |
No type prefixes in Silver
Silver tables use plain object names: CONTACT_TIMESTAMPS, not WH_CONTACT_TIMESTAMPS or FCT_CONTACT_TIMESTAMPS. The schema name indicates the layer; the table name describes what the data is. See ADR 009.
GOLD Layer Tables¶
Pattern: Four-type system per ADR 008 and ADR 009
| Prefix | Type | Purpose | Example |
|---|---|---|---|
fct_ |
Fact | Enriched granular data (same grain as Silver + JOINs) | fct_contacts, fct_conversations |
dim_ |
Dimension | Current state entities (SCD Type ½) | dim_agents, dim_customers |
agg_ |
Aggregate | Pre-aggregated metrics (building blocks) | agg_agent_daily, agg_channel_weekly |
rpt_ |
Report | End-user consumption views (zero storage) | rpt_agent_dashboard, rpt_email_daily |
Gold naming rules
- Facts:
fct_{object}-- enriched with roster/glossary JOINs, filtered to XO agents - Dimensions:
dim_{entity}-- reference data for downstream joins - Aggregates:
agg_{subject}_{grain}-- reusable building blocks - Reports:
rpt_{purpose}-- Tableau-ready views, zero storage cost
Column Names¶
Pattern: {OBJECT}_{ATTRIBUTE} or {ATTRIBUTE}
Rules:
- UPPERCASE only
- Underscores to separate words
- No special characters (except
_)
Standard Column Patterns¶
Primary keys:
Timestamps:
CREATED_AT -- When record was created
UPDATED_AT -- When record was last modified
DELETED_AT -- Soft delete timestamp
{EVENT}_AT -- Specific event time: CLOSED_AT, ASSIGNED_AT
Booleans — always prefix with IS_ or HAS_:
IS_{CONDITION} -- IS_ACTIVE, IS_DELETED, IS_RESOLVED, IS_XO_AGENT
HAS_{ATTRIBUTE} -- HAS_ATTACHMENTS, HAS_NOTES
Metrics — include the unit or aggregation type:
TOTAL_{METRIC} -- TOTAL_CONVERSATIONS, TOTAL_CONTACTS
AVG_{METRIC} -- AVG_HANDLE_TIME, AVG_CSAT_SCORE
COUNT_{OBJECT} -- COUNT_AGENTS, COUNT_MESSAGES
{METRIC}_{UNIT} -- HANDLE_TIME_MINUTES, HANDLE_TIME_SECONDS, WEIGHT_KG
Dates:
Examples:
-- Good: clear, descriptive, includes unit
CONVERSATION_ID
CREATED_AT
HANDLE_TIME_MINUTES
IS_RESOLVED
IS_XO_AGENT
TOTAL_CONVERSATIONS
AVG_CSAT_SCORE
-- Avoid: vague, missing context, wrong case
time -- Too vague, missing unit
active -- Missing boolean prefix (use IS_ACTIVE)
ConversationID -- Mixed case
created -- Missing _AT suffix
Metadata Columns (BRONZE)¶
All Bronze tables include these standard metadata columns:
| Column | Type | Purpose |
|---|---|---|
RECORD_KEY |
VARCHAR | Deduplication key (raw or hashed) |
RECORD_HASH |
VARCHAR | NULL-safe hash of ALL data columns |
DATE_TO_WAREHOUSE |
TIMESTAMP_NTZ | When data was loaded to Bronze |
SOURCE_FILE |
VARCHAR | Source file name |
BATCH_ID |
VARCHAR | Batch identifier |
PIPELINE_RUN_ID |
VARCHAR | Airflow run identifier |
S3 Naming¶
Bucket Names¶
| Bucket | Purpose |
|---|---|
xo-ingest |
Raw data ingestion from sources |
xo-stage |
Staged data with standardized columns |
xo-archive |
Historical data backups |
S3 Path Structure¶
Ingest Bucket¶
Pattern: {domain}/{report}/{date}/{filename}
s3://xo-ingest/warbyparker/contact_timestamps/2026-01-15/contact_timestamps_2026-01-15.csv
s3://xo-ingest/condenast/conversations/2026-01-15/conversations_2026-01-15.csv
Stage Bucket¶
Pattern: {domain}/{report}/{load_strategy}/{date_or_timestamp}/{filename}
s3://xo-stage/warbyparker/contact_timestamps/full_refresh/2026-01-15/contact_timestamps_2026-01-15.csv
s3://xo-stage/warbyparker/work_sessions/full_refresh/2026-01-15/work_sessions_2026-01-15.csv
s3://xo-stage/condenast/conversations/incremental/2026-01-15T06:50:00/conversations.csv
Load strategy in S3 path
The stage bucket path includes a load_strategy segment (full_refresh, incremental, or historical) to organize files by their loading pattern. See ADR 001.
Airflow Naming¶
DAG Names¶
Pattern: {domain}_{pipeline_name}_dag
| Pattern | Example | Description |
|---|---|---|
warbyparker_gladly_daily_dag |
Warby Parker Gladly | Daily Gladly extraction |
condenast_gladly_conversations_dag |
Conde Nast Gladly | Conversations extraction |
core_bamboohr_roster_dag |
Core | BambooHR roster sync |
Task Names¶
Pattern: {action}_{object}
# Extraction tasks
extract_gladly_data
extract_sprout_data
extract_gsheets_data
# Staging tasks
copy_and_standardize
# Loading tasks
copy_to_snowflake
# Merge tasks
merge_to_snowflake
Task Group Names¶
Pattern: {group_type}_{description}
Python Naming¶
Package Names¶
Pattern: xo-{package_name} (kebab-case for PyPI)
Module Names¶
Pattern: {module_name}.py (lowercase, underscores)
# xo-core modules
snowflake_manager.py
s3_manager.py
df_utils.py
file_utils.py
# xo-foundry modules
extract_tasks.py
stage_tasks.py
snowflake_tasks.py
merge_tasks.py
task_utils.py
Class Names¶
Pattern: {Name}Manager or {Name}Extractor (PascalCase)
Function Names¶
Pattern: {verb}_{noun} (snake_case)
Variable Names¶
Pattern: {description} (snake_case)
# Good
s3_bucket_name = "xo-ingest"
created_at_column = "CREATED_AT"
target_table = "GLADLY_CONTACT_TIMESTAMPS"
# Avoid
bucket = "xo-ingest" # Missing context
df = extract_data() # Too generic
File Naming¶
Python Files¶
Pattern: {module_name}.py (lowercase, underscores)
Configuration Files¶
Pattern: {client}-{source}-{pipeline}.yaml (lowercase, hyphens)
# YAML configs (production)
warbyparker-gladly-daily.yaml
condenast-gladly-conversations.yaml
# Environment files
.env
.env.local
# Python configs
pyproject.toml
SQL Files¶
Pattern: Depends on tool
schemachange (Snowflake BRONZE)¶
Pattern: V{major}.{minor}.{patch}__{description}.sql
-- Version migrations
V1.0.0__create_databases.sql
V1.1.0__create_schemas.sql
V1.2.0__create_gladly_contact_timestamps.sql
-- Repeatable migrations
R__create_materialized_view.sql
R__update_permissions.sql
dbt (Snowflake SILVER/GOLD)¶
Pattern: {object}.sql or {prefix}_{object}.sql (lowercase, underscores)
-- Silver models (no prefix)
contact_timestamps.sql
messages.sql
inbox_glossary.sql
-- Gold models (typed prefix)
fct_contacts.sql
dim_agents.sql
agg_agent_daily.sql
rpt_email_daily.sql
Documentation Files¶
Pattern: {TITLE}.md (UPPERCASE or Title Case)
Git Branch Names¶
Pattern: {type}/{description} (lowercase, hyphens)
# Feature branches
feat/add-sprout-extractor
feat/dag-factory
# Bug fixes
fix/snowflake-connection-timeout
fix/deduplication-logic
# Documentation
docs/update-naming-conventions
# Releases
release/v1.0.0
Environment Variables¶
Pattern: {SERVICE}_{ATTRIBUTE} (UPPERCASE, underscores)
# Snowflake
SNOWFLAKE_ACCOUNT=abc123
SNOWFLAKE_USER=dataeng@company.com
SNOWFLAKE_PASSWORD=secret
SNOWFLAKE_WAREHOUSE=XO_WH
# AWS
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=secret
AWS_DEFAULT_REGION=us-east-1
# Gladly
GLADLY_API_USER=api@company.com
GLADLY_API_TOKEN=secret
GLADLY_ORG=company
Anti-Patterns (Avoid These)¶
Mixed Case in Snowflake¶
-- Bad
CREATE TABLE MyDatabase.MySchema.MyTable (
customerId INT,
customerName VARCHAR
);
-- Good
CREATE TABLE MY_DATABASE.MY_SCHEMA.MY_TABLE (
CUSTOMER_ID INT,
CUSTOMER_NAME VARCHAR
);
Redundant Layer Prefixes¶
-- Bad: prefix repeats schema name
SILVER.WH_CONVERSATIONS -- "WH" adds nothing
SILVER.FCT_INTERACTIONS -- "FCT" belongs in Gold
GOLD.DM_SCORECARD_GPP -- "DM" is vague
-- Good: schema conveys layer, name conveys content
SILVER.CONVERSATIONS
SILVER.CONTACT_TIMESTAMPS
GOLD.fct_contacts
GOLD.agg_agent_daily
GOLD.rpt_email_daily
Special Characters¶
-- Bad
CREATE TABLE "Customer-Data" (
"Customer#ID" INT,
"Customer Name" VARCHAR
);
-- Good
CREATE TABLE CUSTOMER_DATA (
CUSTOMER_ID INT,
CUSTOMER_NAME VARCHAR
);
Summary Table¶
| Component | Pattern | Example |
|---|---|---|
| Databases | {CLIENT}_DB |
WBP_DB |
| Schemas | {LAYER} |
BRONZE, SILVER, GOLD |
| BRONZE Tables | {SOURCE}_{OBJECT} |
GLADLY_CONTACT_TIMESTAMPS |
| SILVER Tables | {OBJECT} |
CONTACT_TIMESTAMPS |
| GOLD Facts | fct_{object} |
fct_contacts |
| GOLD Dimensions | dim_{entity} |
dim_agents |
| GOLD Aggregates | agg_{subject}_{grain} |
agg_agent_daily |
| GOLD Reports | rpt_{purpose} |
rpt_email_daily |
| Metadata Columns | See table above | RECORD_KEY, RECORD_HASH, DATE_TO_WAREHOUSE |
| S3 Paths | {domain}/{report}/{load_strategy}/{date} |
warbyparker/contacts/full_refresh/2026-01-15 |
| DAGs | {domain}_{pipeline}_dag |
warbyparker_gladly_daily_dag |
| Python Classes | {Name}Manager |
SnowflakeManager |
| Python Functions | {verb}_{noun} |
extract_gladly_data |
| SQL Migrations | V{x}.{y}.{z}__{desc}.sql |
V1.2.0__create_table.sql |
Next Steps¶
- Architecture Overview -- Understand the system design
- Snowflake Medallion Layers -- Layer naming in practice
- ELT Layer Architecture -- Layer responsibilities
- Architecture Decisions -- ADRs that shaped these conventions
Related Documentation: