Date: August 11, 2025 Status: Production Ready Environment: Staging with Docker Compose Rollback Strategy: Automated + Manual procedures
This guide provides comprehensive rollback procedures for the Piper Morgan staging environment, ensuring safe deployment practices and quick recovery from deployment issues.
# Verify rollback tools availability
ls -la scripts/rollback_*.sh
ls -la scripts/verify_staging_deployment.sh
ls -la scripts/backup_staging.sh
The staging environment automatically triggers rollback when:
# Health check monitoring (runs every 30 seconds)
while true; do
./scripts/health_check.sh
if [ $? -ne 0 ]; then
echo "Health check failed, triggering automatic rollback"
./scripts/rollback_staging.sh
break
fi
sleep 30
done
# Health check thresholds
health_checks:
response_time:
threshold: 500ms
consecutive_failures: 3
action: "rollback"
error_rate:
threshold: 5%
consecutive_failures: 2
action: "rollback"
service_health:
threshold: 100%
consecutive_failures: 1
action: "rollback"
# 1. Verify current deployment health
./scripts/verify_staging_deployment.sh
# 2. Create backup before rollback
./scripts/backup_staging.sh
# 3. Execute rollback
./scripts/rollback_staging.sh
# 4. Verify rollback success
./scripts/verify_staging_deployment.sh
# Force immediate rollback (bypasses health checks)
./scripts/emergency_rollback.sh
# Verify critical services are running
./scripts/verify_critical_services.sh
# Rollback application to previous version
git checkout HEAD~1
docker-compose build app
docker-compose up -d app
# Verify application health
curl http://localhost:8001/health
# Restore database from backup
./scripts/restore_staging.sh backup_$(date +%Y-%m-%d_%H-%M-%S).sql
# Verify database integrity
docker-compose exec app python -c "
from services.database import get_db
db = get_db()
result = db.execute('SELECT COUNT(*) FROM information_schema.tables')
print(f'Database tables: {result.fetchone()[0]}')
"
# Restore previous configuration
git checkout HEAD~1 -- .env
git checkout HEAD~1 -- docker-compose.yml
# Restart services with old config
docker-compose down
docker-compose up -d
# Rollback monitoring services
docker-compose down prometheus grafana
docker-compose up -d prometheus grafana
# Verify monitoring is working
curl http://localhost:9090/-/healthy
curl http://localhost:3001/api/health
#!/bin/bash
# scripts/rollback_staging.sh
set -e
echo "π¨ Initiating staging rollback..."
# 1. Stop current deployment
echo "Stopping current deployment..."
docker-compose down
# 2. Checkout previous version
echo "Checking out previous version..."
git checkout HEAD~1
# 3. Restore from backup if available
if [ -f "backup_latest.sql" ]; then
echo "Restoring database from backup..."
./scripts/restore_staging.sh backup_latest.sql
fi
# 4. Rebuild and restart
echo "Rebuilding and restarting services..."
docker-compose build
docker-compose up -d
# 5. Wait for services to be ready
echo "Waiting for services to be ready..."
sleep 30
# 6. Verify rollback success
echo "Verifying rollback success..."
./scripts/verify_staging_deployment.sh
echo "β
Rollback completed successfully"
#!/bin/bash
# scripts/emergency_rollback.sh
set -e
echo "π¨ EMERGENCY ROLLBACK - Bypassing health checks"
# 1. Force stop all services
docker-compose down --remove-orphans
# 2. Checkout last known good version
git checkout HEAD~5 # Go back 5 commits for safety
# 3. Restart with minimal verification
docker-compose up -d
# 4. Basic health check only
curl -f http://localhost:8001/health || echo "Health check failed but services are running"
echo "β οΈ Emergency rollback completed - manual verification required"
#!/bin/bash
# scripts/verify_staging_deployment.sh
set -e
echo "π Verifying staging deployment..."
# Test 1: Application health
echo "Test 1: Application health..."
curl -f http://localhost:8001/health
# Test 2: Database connection
echo "Test 2: Database connection..."
docker-compose exec app python -c "from services.database import get_db; print('Database OK')"
# Test 3: Redis connection
echo "Test 3: Redis connection..."
docker-compose exec app python -c "import redis; r = redis.Redis(host='redis'); r.ping(); print('Redis OK')"
# Test 4: MCP health
echo "Test 4: MCP health..."
curl -f http://localhost:8001/health/mcp
# Test 5: Performance test
echo "Test 5: Performance test..."
start_time=$(date +%s%N)
curl -s http://localhost:8001/health > /dev/null
end_time=$(date +%s%N)
response_time=$(( (end_time - start_time) / 1000000 ))
if [ $response_time -lt 500 ]; then
echo "β
Performance OK: ${response_time}ms"
else
echo "β οΈ Performance degraded: ${response_time}ms"
fi
echo "β
All verification tests passed"
| Issue Type | Severity | Rollback Action | Verification Required |
|---|---|---|---|
| Critical Service Down | High | Immediate | Full health check |
| Performance Degradation | Medium | After 3 failures | Performance metrics |
| Configuration Error | Medium | Manual | Configuration validation |
| Database Corruption | High | Immediate | Data integrity check |
| Security Vulnerability | High | Immediate | Security scan |
| Rollback Type | Trigger Time | Recovery Time | Verification Time |
|---|---|---|---|
| Automatic | <30 seconds | 2-5 minutes | 1-2 minutes |
| Manual | Developer initiated | 5-10 minutes | 2-3 minutes |
| Emergency | <10 seconds | 1-3 minutes | Manual verification |
# Monitor rollback progress
watch -n 5 'docker-compose ps && echo "---" && curl -s http://localhost:8001/health | jq'
# Check service logs during rollback
docker-compose logs -f app postgres redis
# Create rollback status dashboard
cat > rollback_status.md << EOF
# Rollback Status: $(date)
## Services Status
$(docker-compose ps)
## Health Checks
$(curl -s http://localhost:8001/health | jq)
## Performance Metrics
$(curl -s http://localhost:8001/metrics | grep -E "(response_time|error_rate)")
EOF
# 1. Create backup
./scripts/backup_staging.sh
# 2. Tag current version
git tag "pre-deploy-$(date +%Y%m%d-%H%M%S)"
# 3. Verify rollback tools
./scripts/test_rollback_tools.sh
# 1. Monitor health checks
./scripts/monitor_deployment.sh
# 2. Watch for warning signs
- Response time increases
- Error rate spikes
- Service health degradation
# 1. Verify system stability
./scripts/verify_staging_deployment.sh
# 2. Document rollback details
./scripts/document_rollback.sh
# 3. Plan next deployment
./scripts/plan_next_deployment.sh
# Check script permissions
chmod +x scripts/rollback_staging.sh
# Verify script dependencies
./scripts/check_rollback_dependencies.sh
# Run with debug output
bash -x scripts/rollback_staging.sh
# Check Docker resources
docker system df
docker system prune -f
# Verify port availability
netstat -tulpn | grep -E "(8001|8081|3001|9090)"
# Check service logs
docker-compose logs [service_name]
# Verify backup file integrity
./scripts/verify_backup.sh backup_file.sql
# Check database permissions
docker-compose exec postgres psql -U piper_user -l
# Manual database restore
docker-compose exec -T postgres psql -U piper_user -d piper_morgan_staging < backup_file.sql
# Track rollback success rate
./scripts/rollback_metrics.sh
# Expected metrics:
# - Rollback success rate: >95%
# - Average rollback time: <5 minutes
# - Recovery time: <10 minutes
# Rollback Report
**Date**: [Date]
**Trigger**: [Automatic/Manual/Emergency]
**Reason**: [Description of issue]
**Rollback Time**: [Duration]
**Recovery Time**: [Duration]
**Services Affected**: [List]
**Root Cause**: [Analysis]
**Prevention Measures**: [Actions taken]
**Lessons Learned**: [Key insights]
Status: Production Ready β Rollback Strategy: Multi-layer approach β Automation: Health check triggers β Recovery Time: <5 minutes target β Documentation: Comprehensive procedures β