Snowflake S3 Stage Setup Guide¶
Overview¶
The S3 external stage is shared infrastructure located in CORE_DB and used by all client databases (WBP_DB, CND_DB, etc.) for loading data from S3 to Snowflake BRONZE tables.
Architecture¶
CORE_DB_DEV.BRONZE.XO_S3_STAGE
↓
Points to S3 bucket (via storage integration)
↓
Used by:
- WBP_DB_DEV.BRONZE.GLADLY_CONTACT_TIMESTAMPS
- WBP_DB_DEV.BRONZE.GLADLY_CONVERSATION_TIMESTAMPS
- CND_DB_DEV.BRONZE.* (future tables)
- etc.
Why BRONZE schema? - Semantic clarity: Stage is specifically for loading BRONZE data - Organized structure: BRONZE infrastructure lives in BRONZE schema - Clean separation: PUBLIC reserved for cross-layer shared resources
Prerequisites¶
1. S3 Storage Integration¶
The storage integration must be created by ACCOUNTADMIN before creating the stage:
USE ROLE ACCOUNTADMIN;
CREATE STORAGE INTEGRATION IF NOT EXISTS XO_S3_INTEGRATION
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'S3'
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::YOUR_ACCOUNT_ID:role/snowflake-s3-access-role'
STORAGE_ALLOWED_LOCATIONS = ('s3://your-bucket-name/');
-- Grant usage to SYSADMIN
GRANT USAGE ON INTEGRATION XO_S3_INTEGRATION TO ROLE SYSADMIN;
Note: The AWS IAM role must trust Snowflake. Run this to get the trust policy:
DESC STORAGE INTEGRATION XO_S3_INTEGRATION;
-- Copy STORAGE_AWS_IAM_USER_ARN and STORAGE_AWS_EXTERNAL_ID
-- Update AWS IAM role trust relationship
2. S3 Bucket Structure¶
s3://your-bucket-name/
├── ingest-bucket/ # Raw files from source APIs
│ ├── warbyparker/
│ │ ├── contact_timestamps/
│ │ │ └── 2025-12-01/
│ │ │ └── contact_timestamps_2025-12-01.csv
│ │ └── conversation_timestamps/
│ └── condenast/
│ └── ...
└── stage-bucket/ # Standardized files (column names cleaned)
├── warbyparker/
│ ├── contact_timestamps/
│ │ └── 2025-12-01/
│ │ └── contact_timestamps_2025-12-01.csv
│ └── conversation_timestamps/
└── condenast/
└── ...
Deployment¶
Deploy to DEV¶
# Deploy CORE_DB_DEV schema and stage
schemachange deploy \
-f apps/snowflake-bronze/core \
--connections-file-path .config/connections.toml \
-c CORE_DB_DEV.SCHEMACHANGE.CHANGE_HISTORY \
--create-change-history-table \
--vars '{"DB_SUFFIX": "DEV", "SF_ROLE": "SYSADMIN"}'
Deploy to PROD¶
# Deploy CORE_DB_PROD schema and stage
schemachange deploy \
-f apps/snowflake-bronze/core \
--connections-file-path .config/connections.toml \
-c CORE_DB_PROD.SCHEMACHANGE.CHANGE_HISTORY \
--create-change-history-table \
--vars '{"DB_SUFFIX": "PROD", "SF_ROLE": "SYSADMIN"}'
Verification¶
Check Stage Exists¶
-- DEV
SHOW STAGES IN CORE_DB_DEV.BRONZE;
DESC STAGE CORE_DB_DEV.BRONZE.XO_S3_STAGE;
-- PROD
SHOW STAGES IN CORE_DB_PROD.BRONZE;
DESC STAGE CORE_DB_PROD.BRONZE.XO_S3_STAGE;
List Files in Stage¶
-- List all files
LIST @CORE_DB_DEV.BRONZE.XO_S3_STAGE;
-- List specific path
LIST @CORE_DB_DEV.BRONZE.XO_S3_STAGE/stage-bucket/warbyparker/contact_timestamps/;
Test Cross-Database Access¶
-- From WBP_DB_DEV context
USE DATABASE WBP_DB_DEV;
USE SCHEMA BRONZE;
-- Should work - cross-database stage reference
SELECT $1, $2, $3
FROM @CORE_DB_DEV.BRONZE.XO_S3_STAGE/stage-bucket/warbyparker/contact_timestamps/
(FILE_FORMAT => 'CSV_FORMAT')
LIMIT 10;
Usage in xo-foundry¶
The xo-foundry tasks automatically detect the environment and use the correct stage:
# In snowflake_tasks.py
# Automatically resolves to:
# - CORE_DB_DEV.BRONZE.XO_S3_STAGE (for WBP_DB_DEV, CND_DB_DEV, etc.)
# - CORE_DB_PROD.BRONZE.XO_S3_STAGE (for WBP_DB_PROD, CND_DB_PROD, etc.)
stage_name = "CORE_DB_DEV.BRONZE.XO_S3_STAGE" # Auto-detected from database config
Airflow Variable Override (Optional)¶
If you need to override the default stage path:
# In Airflow UI: Admin > Variables
# Key: SNOWFLAKE_STAGE_NAME
# Value: CORE_DB_DEV.BRONZE.XO_S3_STAGE
Permissions¶
The stage is created in CORE_DB.BRONZE and requires grants for cross-database access.
Grant permissions to Airflow role:
-- Grant USAGE on CORE_DB to role
GRANT USAGE ON DATABASE CORE_DB_DEV TO ROLE XO_AIRFLOW_DEV;
GRANT USAGE ON SCHEMA CORE_DB_DEV.BRONZE TO ROLE XO_AIRFLOW_DEV;
GRANT USAGE ON STAGE CORE_DB_DEV.BRONZE.XO_S3_STAGE TO ROLE XO_AIRFLOW_DEV;
Troubleshooting¶
Error: "Stage does not exist"¶
Symptom: 002003 (02000): SQL compilation error: Stage 'XO_S3_STAGE' does not exist
Solution:
1. Verify stage exists: SHOW STAGES IN CORE_DB_DEV.PUBLIC;
2. Check Airflow is using fully qualified name: CORE_DB_DEV.PUBLIC.XO_S3_STAGE
3. Deploy stage migration if missing
Error: "Integration does not exist"¶
Symptom: 090232 (0A000): Storage integration 'XO_S3_INTEGRATION' does not exist
Solution: 1. Have ACCOUNTADMIN create the storage integration (see Prerequisites) 2. Grant USAGE to SYSADMIN 3. Recreate the stage
Error: "Access Denied" from S3¶
Symptom: 000001: Failed to download file from S3
Solution: 1. Check IAM role trust policy includes Snowflake 2. Verify IAM role has S3 read permissions 3. Check STORAGE_ALLOWED_LOCATIONS includes your bucket
Migration Files¶
- V1.0.1__create_s3_stage.sql: Creates XO_S3_STAGE in CORE_DB_DEV/PROD
- Located in:
apps/snowflake-bronze/core/