xo-core Package¶
xo-core v1.5.0 -- The foundation package of the XO-Data platform. Provides core utilities, data extractors, and managers for common data engineering operations.
Purpose¶
xo-core serves as the shared utility layer for all XO-Data packages and applications. It contains:
- Extractors: Data source connectors (Gladly API, Sprout Social, Gmail, Google Sheets, S3)
- Managers: Service wrappers (Snowflake, S3, Google Services)
- Utilities: DataFrame operations, file handling, logging
Installation¶
# Install xo-core package
uv sync --package xo-core
# Add as dependency to another package
uv add --package xo-foundry xo-core
Package Structure¶
packages/xo-core/src/xo_core/
├── extractors/ # Data source connectors
│ ├── gladly_extractor.py # Gladly API reports
│ ├── sprout_extractor.py # Sprout Social API
│ ├── gmail_extractor.py # Gmail attachments
│ ├── gsheets_extractor.py# Google Sheets
│ └── s3_extractor.py # S3 files
│
├── google_services/ # Google API integrations
│ ├── auth.py # OAuth2 authentication
│ ├── gmail_manager.py # Gmail operations
│ ├── gdrive_manager.py # Google Drive operations
│ └── gsheets_manager.py # Google Sheets operations (gspread)
│
├── managers/ # Service wrappers
│ ├── snowflake_manager.py# Snowflake operations
│ └── s3_manager.py # S3 operations
│
├── utils/ # Shared utilities
│ ├── df_utils.py # DataFrame operations
│ ├── file_utils.py # File handling
│ └── logger.py # Logging configuration
│
└── __init__.py
Key Components¶
Extractors¶
Data source connectors that extract data and upload to S3.
No pandas in extraction
xo-foundry extraction tasks (which wrap these extractors) use native Python csv.DictWriter -- never pandas. Pandas corrupts data via automatic type inference (IDs become floats, leading zeros stripped). Let Snowflake handle type conversion.
GladlyExtractor¶
Extract reports from Gladly API.
from xo_core.extractors.gladly_extractor import GladlyExtractor
extractor = GladlyExtractor(
api_user="user@example.com",
api_token="your-token",
org="your-org"
)
# Extract conversations report
report_data = extractor.extract_report(
report_type="ContactTimestampsReport",
start_date="2026-01-15",
end_date="2026-01-15"
)
SproutExtractor¶
Extract data from Sprout Social API.
from xo_core.extractors.sprout_extractor import SproutExtractor
extractor = SproutExtractor(
api_token="your-token",
customer_id="your-customer-id"
)
# Extract messages
messages = extractor.extract_messages(
start_date="2026-01-15",
end_date="2026-01-15"
)
GmailExtractor¶
Extract email attachments from Gmail.
from xo_core.extractors.gmail_extractor import GmailExtractor
extractor = GmailExtractor(credentials_path="credentials.json")
# Extract attachments from label
attachments = extractor.extract_attachments(
label="Reports/Daily",
start_date="2026-01-15"
)
GSheetsExtractor¶
Extract data from Google Sheets.
from xo_core.extractors.gsheets_extractor import GSheetsExtractor
extractor = GSheetsExtractor(credentials_path="credentials.json")
# Extract sheet data
df = extractor.extract_sheet(
spreadsheet_id="abc123",
sheet_name="Data",
range="A1:Z1000"
)
S3Extractor¶
Extract files from S3 buckets.
from xo_core.extractors.s3_extractor import S3Extractor
extractor = S3Extractor()
# Extract and read CSV
df = extractor.extract_csv(
bucket="xo-ingest",
key="client/report/data.csv"
)
Managers¶
Service wrappers that provide high-level operations for external services.
SnowflakeManager¶
Modern Snowflake connector with DataFrame operations and deduplication.
from xo_core.snowflake_manager import SnowflakeManager
manager = SnowflakeManager()
# Upload DataFrame with deduplication
prepped_df = manager.prep_dataframe_for_table(
df=raw_df,
table_name="GLADLY_CONTACT_TIMESTAMPS",
filter_existing=True
)
manager.upload_dataframe(
prepped_df,
table_name="GLADLY_CONTACT_TIMESTAMPS",
database="WBP_DB",
schema="BRONZE"
)
# Execute query
results = manager.execute_query("SELECT COUNT(*) FROM TABLE")
# Download table to DataFrame
df = manager.download_table("DATABASE.SCHEMA.TABLE")
Key Features:
- Automatic deduplication (single-field or multi-field)
- Type-safe DataFrame uploads
- Connection pooling
- Prepared statement support
S3Manager¶
S3 operations with DataFrame support.
from xo_core.s3_manager import S3Manager
manager = S3Manager()
# Upload DataFrame
manager.upload_dataframe(
df,
bucket="xo-ingest",
key="client/report/2026-01-15/data.csv"
)
# Download to DataFrame
df = manager.download_to_dataframe(
bucket="xo-ingest",
key="client/report/2026-01-15/data.csv"
)
# List files
files = manager.list_files(bucket="xo-ingest", prefix="client/")
Google Services¶
GSheetsManager¶
Google Sheets operations via gspread library.
from xo_core.google_services.gsheets_manager import GSheetsManager
gsheets = GSheetsManager(credentials_path="credentials.json")
# Read spreadsheet
data = gsheets.read_sheet(spreadsheet_id="abc123", sheet_name="Data")
GmailManager¶
Gmail API operations.
from xo_core.google_services.gmail_manager import GmailManager
gmail = GmailManager(credentials_path="credentials.json")
# List messages
messages = gmail.list_messages(label="Reports", after_date="2026-01-15")
# Get attachments
attachments = gmail.get_attachments(message_id="abc123")
Utilities¶
df_utils¶
DataFrame cleaning and standardization utilities.
from xo_core.df_utils import (
clean_column_names,
clean_dataframe,
datetime_columns_handler,
convert_to_nullable_int_columns,
generate_record_key
)
# Clean column names (UPPERCASE, replace special chars)
df.columns = clean_column_names(df.columns)
# Comprehensive DataFrame cleaning
df = clean_dataframe(
df,
uppercase_columns=True,
datetime_columns=['CREATED_AT'],
auto_detect_int_columns=True,
replace_empty_strings=True
)
Key Functions:
| Function | Purpose |
|---|---|
clean_column_names() |
Standardize column names (UPPERCASE, _) |
clean_dataframe() |
All-in-one DataFrame cleaning |
datetime_columns_handler() |
Convert to datetime with timezone |
convert_to_nullable_int_columns() |
Convert to Int64 (allows nulls) |
replace_empty_strings_with_na() |
Replace "" with NA |
drop_empty_columns() |
Remove columns with all nulls |
generate_record_key() |
Generate unique keys for deduplication |
logger¶
Consistent logging configuration.
from xo_core.logger import get_module_logger
logger = get_module_logger(__name__)
logger.info("Processing started")
logger.warning("Missing optional field")
logger.error("Failed to connect", exc_info=True)
Type Safety¶
xo-core is fully typed and must pass ty with zero errors:
from typing import Any
import pandas as pd
def process_data(items: list[dict[str, str]]) -> pd.DataFrame:
"""Process data with proper typing."""
...
Testing¶
# Run xo-core tests
uv run pytest packages/xo-core/tests/
# Run specific test file
uv run pytest packages/xo-core/tests/test_df_utils.py
# Run with coverage
uv run pytest --cov=xo_core packages/xo-core/tests/
Dependencies¶
Key dependencies:
pandas>=2.0-- DataFrame operationssnowflake-connector-python-- Snowflake connectivityboto3-- AWS S3 operationsgoogle-auth-- Google API authenticationgspread-- Google Sheets APIrequests-- HTTP client for APIstqdm-- Progress bars
Next Steps¶
- xo-foundry Package -- Orchestration layer
- ELT Pipeline Flow -- Pipeline architecture
- Naming Conventions -- Standards
Package Location: packages/xo-core/
Version: 1.5.0
Dependencies: See packages/xo-core/pyproject.toml