Skip to content

Snowflake S3 Stage Setup Guide

Overview

The S3 external stage is shared infrastructure located in CORE_DB and used by all client databases (WBP_DB, CND_DB, etc.) for loading data from S3 to Snowflake BRONZE tables.

Architecture

CORE_DB_DEV.BRONZE.XO_S3_STAGE
    Points to S3 bucket (via storage integration)
Used by:
- WBP_DB_DEV.BRONZE.GLADLY_CONTACT_TIMESTAMPS
- WBP_DB_DEV.BRONZE.GLADLY_CONVERSATION_TIMESTAMPS
- CND_DB_DEV.BRONZE.* (future tables)
- etc.

Why BRONZE schema? - Semantic clarity: Stage is specifically for loading BRONZE data - Organized structure: BRONZE infrastructure lives in BRONZE schema - Clean separation: PUBLIC reserved for cross-layer shared resources

Prerequisites

1. S3 Storage Integration

The storage integration must be created by ACCOUNTADMIN before creating the stage:

USE ROLE ACCOUNTADMIN;

CREATE STORAGE INTEGRATION IF NOT EXISTS XO_S3_INTEGRATION
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::YOUR_ACCOUNT_ID:role/snowflake-s3-access-role'
  STORAGE_ALLOWED_LOCATIONS = ('s3://your-bucket-name/');

-- Grant usage to SYSADMIN
GRANT USAGE ON INTEGRATION XO_S3_INTEGRATION TO ROLE SYSADMIN;

Note: The AWS IAM role must trust Snowflake. Run this to get the trust policy:

DESC STORAGE INTEGRATION XO_S3_INTEGRATION;
-- Copy STORAGE_AWS_IAM_USER_ARN and STORAGE_AWS_EXTERNAL_ID
-- Update AWS IAM role trust relationship

2. S3 Bucket Structure

s3://your-bucket-name/
├── ingest-bucket/           # Raw files from source APIs
│   ├── warbyparker/
│   │   ├── contact_timestamps/
│   │   │   └── 2025-12-01/
│   │   │       └── contact_timestamps_2025-12-01.csv
│   │   └── conversation_timestamps/
│   └── condenast/
│       └── ...
└── stage-bucket/            # Standardized files (column names cleaned)
    ├── warbyparker/
    │   ├── contact_timestamps/
    │   │   └── 2025-12-01/
    │   │       └── contact_timestamps_2025-12-01.csv
    │   └── conversation_timestamps/
    └── condenast/
        └── ...

Deployment

Deploy to DEV

# Deploy CORE_DB_DEV schema and stage
schemachange deploy \
  -f apps/snowflake-bronze/core \
  --connections-file-path .config/connections.toml \
  -c CORE_DB_DEV.SCHEMACHANGE.CHANGE_HISTORY \
  --create-change-history-table \
  --vars '{"DB_SUFFIX": "DEV", "SF_ROLE": "SYSADMIN"}'

Deploy to PROD

# Deploy CORE_DB_PROD schema and stage
schemachange deploy \
  -f apps/snowflake-bronze/core \
  --connections-file-path .config/connections.toml \
  -c CORE_DB_PROD.SCHEMACHANGE.CHANGE_HISTORY \
  --create-change-history-table \
  --vars '{"DB_SUFFIX": "PROD", "SF_ROLE": "SYSADMIN"}'

Verification

Check Stage Exists

-- DEV
SHOW STAGES IN CORE_DB_DEV.BRONZE;
DESC STAGE CORE_DB_DEV.BRONZE.XO_S3_STAGE;

-- PROD
SHOW STAGES IN CORE_DB_PROD.BRONZE;
DESC STAGE CORE_DB_PROD.BRONZE.XO_S3_STAGE;

List Files in Stage

-- List all files
LIST @CORE_DB_DEV.BRONZE.XO_S3_STAGE;

-- List specific path
LIST @CORE_DB_DEV.BRONZE.XO_S3_STAGE/stage-bucket/warbyparker/contact_timestamps/;

Test Cross-Database Access

-- From WBP_DB_DEV context
USE DATABASE WBP_DB_DEV;
USE SCHEMA BRONZE;

-- Should work - cross-database stage reference
SELECT $1, $2, $3
FROM @CORE_DB_DEV.BRONZE.XO_S3_STAGE/stage-bucket/warbyparker/contact_timestamps/
    (FILE_FORMAT => 'CSV_FORMAT')
LIMIT 10;

Usage in xo-foundry

The xo-foundry tasks automatically detect the environment and use the correct stage:

# In snowflake_tasks.py
# Automatically resolves to:
# - CORE_DB_DEV.BRONZE.XO_S3_STAGE (for WBP_DB_DEV, CND_DB_DEV, etc.)
# - CORE_DB_PROD.BRONZE.XO_S3_STAGE (for WBP_DB_PROD, CND_DB_PROD, etc.)

stage_name = "CORE_DB_DEV.BRONZE.XO_S3_STAGE"  # Auto-detected from database config

Airflow Variable Override (Optional)

If you need to override the default stage path:

# In Airflow UI: Admin > Variables
# Key: SNOWFLAKE_STAGE_NAME
# Value: CORE_DB_DEV.BRONZE.XO_S3_STAGE

Permissions

The stage is created in CORE_DB.BRONZE and requires grants for cross-database access.

Grant permissions to Airflow role:

-- Grant USAGE on CORE_DB to role
GRANT USAGE ON DATABASE CORE_DB_DEV TO ROLE XO_AIRFLOW_DEV;
GRANT USAGE ON SCHEMA CORE_DB_DEV.BRONZE TO ROLE XO_AIRFLOW_DEV;
GRANT USAGE ON STAGE CORE_DB_DEV.BRONZE.XO_S3_STAGE TO ROLE XO_AIRFLOW_DEV;

Troubleshooting

Error: "Stage does not exist"

Symptom: 002003 (02000): SQL compilation error: Stage 'XO_S3_STAGE' does not exist

Solution: 1. Verify stage exists: SHOW STAGES IN CORE_DB_DEV.PUBLIC; 2. Check Airflow is using fully qualified name: CORE_DB_DEV.PUBLIC.XO_S3_STAGE 3. Deploy stage migration if missing

Error: "Integration does not exist"

Symptom: 090232 (0A000): Storage integration 'XO_S3_INTEGRATION' does not exist

Solution: 1. Have ACCOUNTADMIN create the storage integration (see Prerequisites) 2. Grant USAGE to SYSADMIN 3. Recreate the stage

Error: "Access Denied" from S3

Symptom: 000001: Failed to download file from S3

Solution: 1. Check IAM role trust policy includes Snowflake 2. Verify IAM role has S3 read permissions 3. Check STORAGE_ALLOWED_LOCATIONS includes your bucket

Migration Files

  • V1.0.1__create_s3_stage.sql: Creates XO_S3_STAGE in CORE_DB_DEV/PROD
  • Located in: apps/snowflake-bronze/core/