ADR-007: Staging Environment Architecture with Docker Compose

Date: July 20, 2025 Status: Accepted Deciders: Claude Code (Architecture Assistant), Development Team

Context

PM-038 required production-grade staging environment to validate MCP integration performance improvements and deployment procedures. The existing development setup lacked the infrastructure complexity and monitoring capabilities needed to properly validate production readiness.

Pre-existing Development Environment

Production Readiness Requirements

Decision

Implement production-grade staging environment using Docker Compose with comprehensive monitoring, automated deployment, and rollback capabilities.

Architecture Overview

Multi-Service Docker Compose Architecture

# 8 Core Services + Monitoring Stack
services:
  - postgres-staging      # Database persistence
  - redis-staging         # Cache and session storage
  - chromadb-staging      # Vector database
  - temporal-staging      # Workflow orchestration
  - api-staging          # Main application API
  - web-staging          # User interface
  - nginx-staging        # Load balancer/reverse proxy
  - prometheus-staging   # Metrics collection
  - grafana-staging      # Monitoring dashboards

Network Architecture

Data Persistence Strategy

Implementation Details

Service Configuration

1. Database Services

postgres-staging:
  image: postgres:15
  environment:
    POSTGRES_INITDB_ARGS: "--auth-host=scram-sha-256"
  ports: ["5434:5432"]  # Isolated from development (5433)
  volumes:
    - piper_postgres_staging_data:/var/lib/postgresql/data
  healthcheck:
    test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
  deploy:
    resources:
      limits: {memory: 1G, cpus: '1.0'}
      reservations: {memory: 512M, cpus: '0.5'}

2. Application Services

api-staging:
  build:
    context: .
    dockerfile: Dockerfile.staging
  environment:
    # MCP Production Configuration
    - ENABLE_MCP_FILE_SEARCH=true
    - USE_MCP_POOL=true
    - MCP_POOL_MAX_CONNECTIONS=10
    - MCP_CIRCUIT_BREAKER_ENABLED=true
  depends_on:
    postgres-staging: {condition: service_healthy}
    redis-staging: {condition: service_healthy}
    chromadb-staging: {condition: service_healthy}

3. Monitoring Stack

prometheus-staging:
  image: prom/prometheus:latest
  command:
    - '--storage.tsdb.retention.time=30d'
    - '--web.enable-lifecycle'
  volumes:
    - ./config/staging/prometheus.yml:/etc/prometheus/prometheus.yml:ro

grafana-staging:
  image: grafana/grafana:latest
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=staging_grafana_admin_2025
    - GF_USERS_ALLOW_SIGN_UP=false
  volumes:
    - ./config/staging/grafana/dashboards:/etc/grafana/provisioning/dashboards:ro

Environment Configuration

1. Staging-Specific Environment Variables

# Application Environment
APP_ENV=staging
APP_DEBUG=false
LOG_LEVEL=INFO

# MCP Integration (Production-ready)
ENABLE_MCP_FILE_SEARCH=true
USE_MCP_POOL=true
MCP_POOL_MAX_CONNECTIONS=10
MCP_CIRCUIT_BREAKER_ENABLED=true
MCP_CONTENT_SCORING_ENABLED=true

# Infrastructure
POSTGRES_HOST=localhost
POSTGRES_PORT=5434
POSTGRES_DB=piper_morgan_staging
REDIS_HOST=localhost
REDIS_PORT=6380

# Security
POSTGRES_PASSWORD=staging_secure_password_2025
REDIS_PASSWORD=staging_redis_secure_2025
SECRET_KEY=staging_secret_key_2025
JWT_SECRET_KEY=staging_jwt_secret_2025

2. Feature Flag Configuration

# Production Features Enabled for Staging
ENABLE_CLARIFYING_QUESTIONS=true
ENABLE_MULTI_REPO=true
ENABLE_RATE_LIMITING=true

# Development Features Disabled
ENABLE_LEARNING=false  # Keep disabled for staging
ENABLE_DEBUG_ENDPOINTS=false

Deployment Automation

1. Automated Deployment Script

#!/bin/bash
# scripts/deploy_staging.sh

# Phase 1: Infrastructure Services (30s)
docker-compose -f docker-compose.staging.yml up -d \
  postgres-staging redis-staging chromadb-staging
sleep 30

# Phase 2: Application Services (45s)
docker-compose -f docker-compose.staging.yml up -d \
  api-staging web-staging
sleep 45

# Phase 3: Monitoring and Proxy (15s)
docker-compose -f docker-compose.staging.yml up -d \
  nginx-staging prometheus-staging grafana-staging
sleep 15

# Phase 4: Verification
./scripts/verify_staging_deployment.sh

2. Comprehensive Verification System

# 14 verification categories
test_categories=(
  "basic_connectivity"
  "health_endpoints"
  "mcp_integration"
  "api_functionality"
  "performance"
  "database_connectivity"
  "redis_connectivity"
  "chromadb_connectivity"
  "container_health"
  "monitoring_stack"
  "security_headers"
  "environment_variables"
  "log_collection"
  "data_persistence"
)

Health Monitoring Integration

1. Kubernetes-Style Health Probes

# All services include comprehensive health checks
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8001/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 60s

2. Service-Specific Health Endpoints

Resource Management

1. Container Resource Limits

# Production-grade resource allocation
api-staging:
  deploy:
    resources:
      limits: {memory: 2G, cpus: '2.0'}
      reservations: {memory: 1G, cpus: '1.0'}

postgres-staging:
  deploy:
    resources:
      limits: {memory: 1G, cpus: '1.0'}
      reservations: {memory: 512M, cpus: '0.5'}

2. Logging Configuration

# All services use structured logging
logging:
  driver: "json-file"
  options:
    max-size: "50m"
    max-file: "3"

Rollback Strategy

Automated Rollback System

1. Emergency Rollback (30 seconds)

# Complete environment shutdown
docker-compose -f docker-compose.staging.yml down

# Restore previous version
docker-compose -f docker-compose.staging.yml pull
docker-compose -f docker-compose.staging.yml up -d

2. Safe Rollback with Data Preservation

# Automated rollback script with data backup
./scripts/rollback_staging.sh --preserve-data --version=previous

3. Rollback Decision Matrix | Issue Type | Severity | Time Limit | Action | |————|———-|————|——–| | Health check failures | High | 5 minutes | Application rollback | | MCP performance issues | Medium | 10 minutes | Feature disable | | Database corruption | Critical | 2 minutes | Full rollback | | Security breach | Critical | 30 seconds | Infrastructure shutdown |

Data Preservation Procedures

1. Automated Backup Before Rollback

# Database backup
docker-compose -f docker-compose.staging.yml exec postgres-staging \
  pg_dump -U piper piper_morgan_staging > \
  backups/pre_rollback_$(date +%Y%m%d_%H%M%S).sql

# Configuration backup
cp .env.staging backups/env_staging_$(date +%Y%m%d).backup

2. Volume Snapshot Strategy

# Docker volume backup
docker run --rm -v piper_postgres_staging_data:/data \
  -v $(pwd)/backups:/backup ubuntu \
  tar czf /backup/postgres_data_$(date +%Y%m%d).tar.gz /data

Monitoring and Observability

Prometheus Metrics Collection

1. Application Metrics

# prometheus.yml configuration
scrape_configs:
  - job_name: 'piper-api-staging'
    static_configs:
      - targets: ['api-staging:8001']
    metrics_path: '/health/metrics'
    scrape_interval: 15s

2. Infrastructure Metrics

Grafana Dashboard Architecture

1. System Overview Dashboard

2. MCP Performance Dashboard

3. Application Performance Dashboard

Security Configuration

Network Security

1. Service Isolation

networks:
  piper-staging-network:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.0.0/16

2. External Access Control

Authentication and Authorization

1. Service Authentication

2. Secret Management

# Environment-specific secrets
POSTGRES_PASSWORD=staging_secure_password_2025
REDIS_PASSWORD=staging_redis_secure_2025
GRAFANA_ADMIN_PASSWORD=staging_grafana_admin_2025

Consequences

Positive

Negative

Neutral

Success Metrics

Deployment Success Criteria

Operational Success Criteria

Business Success Criteria

Integration with Development Workflow

Development Environment Separation

Port Allocation Strategy

# Development Environment
API: 8001, Database: 5433, Redis: 6379, ChromaDB: 8000

# Staging Environment
API: 8001, Database: 5434, Redis: 6380, ChromaDB: 8001

CI/CD Integration Points

1. Automated Testing Pipeline

2. Environment Promotion Strategy

# Development → Staging → Production
git tag staging-YYYYMMDD
./scripts/deploy_staging.sh
./scripts/verify_staging_deployment.sh
# Manual approval for production

Lessons Learned

Infrastructure Design Insights

Operational Insights

Development Process Insights


Implementation Date: July 20, 2025 Staging Environment URL: http://localhost:8001 (API), http://localhost:8081 (Web) Risk Level: Low (well-tested patterns, comprehensive monitoring) Business Impact: High (enables production-ready deployments)