ADR-040: Local Database Per Environment Architecture

Status: Accepted Date: 2025-11-01 Context: Issue #280 CORE-ALPHA-DATA-LEAK Deciders: Christian Crumlish (PM), Code Agent, Chief Architect (Claude Sonnet)


Context and Problem Statement

During implementation of Issue #280 (data leak remediation), we discovered the need to make an explicit architectural decision about database management across development, alpha testing, and production environments.

Questions That Arose

  1. Where do alpha_users records live?
    • In the dev laptop database?
    • In each tester’s local database?
    • In a shared database?
  2. Do databases merge across git branches?
    • Does main branch have different database than production branch?
    • How do we maintain consistency?
  3. How does user data move between environments?
    • From development to alpha testing?
    • From alpha to production?
  4. What security model applies?
    • How is data isolated between alpha testers?
    • What happens when code is shared but data shouldn’t be?

The Confusion

The root confusion stemmed from treating databases like git branches - assuming they would merge or be shared. This is a category error: code lives in git, data lives in PostgreSQL, and they follow different rules.


Decision

We will use separate local databases for each environment, with no automatic synchronization between environments.

Architecture Overview

┌─────────────────────────────────────────────────────┐
│ GIT REPOSITORY (Code - Shared via git)              │
├─────────────────────────────────────────────────────┤
│ main branch       → Active development (PM)         │
│ production branch → Stable releases (alpha testers) │
└─────────────────────────────────────────────────────┘
                    ↓ git pull/clone
                    ↓ (code only)
┌─────────────────────────────────────────────────────┐
│ LOCAL ENVIRONMENTS (Code + Data - Never Merge)      │
├─────────────────────────────────────────────────────┤
│                                                      │
│ Dev Laptop:                                          │
│   Branch: main                                       │
│   PostgreSQL DB #1: {xian: Christian's data}        │
│                                                      │
│ Alpha Laptop:                                        │
│   Branch: production                                 │
│   PostgreSQL DB #2: {alfy: generic/test data}       │
│                                                      │
│ External Tester:                                     │
│   Branch: production                                 │
│   PostgreSQL DB #3: {tester123: their data}         │
│                                                      │
└─────────────────────────────────────────────────────┘

DB #1 ≠ DB #2 ≠ DB #3 (Never merge)

Key Principles

  1. Code ≠ Data
    • Code lives in git (shared)
    • Data lives in PostgreSQL (isolated per environment)
  2. Databases Never Merge
    • Each environment has its own local PostgreSQL instance
    • User data stays in local database
    • No automatic sync between environments
  3. User Data Storage
    • Personal data: alpha_users.preferences JSONB field
    • Generic config: config/PIPER.md (in git, shared)
    • User-specific overrides generic
  4. Environment Separation
    • Development: PM’s local database with real data
    • Alpha: Each tester’s local database with their data
    • Production: Hosted database (future) with all real users
  5. Migration is Manual
    • No automatic data migration between environments
    • Alpha → Production requires explicit export/import per user
    • Gives users control over what data moves to production

Rationale

Why Local Databases?

Security and Isolation:

Simplicity:

Development Velocity:

Why Not Shared Database?

We considered and rejected a shared development database because:

  1. Security Risk: PM’s personal data exposed to all testers
  2. Contamination Risk: One tester’s changes affect others
  3. Branch Conflicts: main changes could break production users
  4. Cost: Unnecessary infrastructure for 5-10 alpha testers
  5. Complexity: Need multi-tenant isolation, backups, monitoring

Why Manual Migration to Production?

We explicitly chose manual migration because:

  1. User Consent: Testers decide if/when to migrate data
  2. Quality Control: Review data before production import
  3. Selective Migration: Not all alpha data needs production migration
  4. Simplicity: No complex automated sync logic
  5. Safety: Can’t accidentally expose test data in production

Implementation Details

File Structure

In Git (Shared):

config/
  PIPER.md                          # Generic system config
  PIPER.md.backup-YYYYMMDD          # Historical backups

scripts/
  migrate_personal_data.py          # Flexible migration tool
  create_test_alpha_user.py         # User creation utility
  test_user_data_isolation.py       # Isolation verification

docs/
  ALPHA_DATABASE_ARCHITECTURE.md    # Operational guide
  adrs/adr-040-local-database.md    # This decision record

services/database/
  models.py                          # Table schemas
  connection.py                      # Database connection

Not in Git (.gitignore):

postgres_data/         # PostgreSQL data directory
*.db                   # SQLite files
.env                   # Environment variables
uploads/               # User-uploaded files
__pycache__/          # Python cache

Migration Script Usage

# Migrate PM's data to xian account (dev laptop)
python scripts/migrate_personal_data.py --username xian

# Verify alfy user exists, no migration (alpha laptop)
python scripts/migrate_personal_data.py --username alfy --skip-migration

# Migrate custom data from JSON file (future testers)
python scripts/migrate_personal_data.py \
  --username tester123 \
  --data-file tester_data.json

Branch Strategy

main Branch:

production Branch:

Merge Flow:

PM develops on main
  ↓
Test locally with xian account
  ↓
Merge main → production when stable
  ↓
Alpha testers pull production branch
  ↓
Each tester runs on their local database

Consequences

Positive Consequences

Security: Strong data isolation, no cross-contamination risk ✅ Privacy: PM’s personal data never exposed to testers ✅ Simplicity: No complex multi-tenant database infrastructure ✅ Control: Each user controls their own data ✅ Rollback: Easy to restore (just local database backup) ✅ Cost: No hosting costs for alpha phase ✅ Development Speed: PM can experiment without breaking testers

Negative Consequences

⚠️ Manual Migration: Requires explicit work to move alpha → production ⚠️ Documentation: Needs clear explanation (non-obvious to new developers) ⚠️ Setup Complexity: Each tester must set up local PostgreSQL ⚠️ Backup Responsibility: Users responsible for backing up local data ⚠️ No Real-Time Sync: Can’t automatically propagate improvements

Neutral Consequences

📋 Requires Clear Documentation: ALPHA_DATABASE_ARCHITECTURE.md created 📋 Migration Scripts Must Be Flexible: Added –username parameter 📋 Setup Wizard Needed: For alpha tester onboarding 📋 Production Migration Plan: Documented in architecture guide


Alternatives Considered

Alternative 1: Shared Development Database

Architecture:

Rejected Because:

When It Makes Sense:

Alternative 2: Database Per Branch

Architecture:

Rejected Because:

Why This Doesn’t Work:

Alternative 3: Cloud-Hosted Alpha Database

Architecture:

Deferred to MVP Because:

When We’ll Implement This:

Alternative 4: No Database, Files Only

Architecture:

Rejected Because:

Why We Need PostgreSQL:


Security Considerations

Alpha Phase (Current)

Security Model:

Acceptable for Alpha Because:

MVP/Production Phase (Future)

Required Security:

Email Backend Security (from email-service-research-mvp.md):


Migration Path: Alpha → Production

When Production Ready

  1. Set Up Production Database:
    -- Hosted on Render/Railway/Supabase
    CREATE TABLE users (
      id UUID PRIMARY KEY,
      username TEXT UNIQUE,
      email TEXT UNIQUE,
      password_hash TEXT,
      preferences JSONB,
      created_at TIMESTAMP DEFAULT NOW()
    );
    
  2. Export Alpha User Data:
    # On alpha tester's local machine
    python scripts/export_user_data.py --username alfy > alfy_data.json
    
  3. Import to Production:
    # On production server
    python scripts/import_user_data.py \
      --username alfy \
      --email alfy@example.com \
      --data alfy_data.json
    
  4. User Setup:
    • User receives email invitation
    • User sets production password
    • User logs into production system
    • User’s preferences migrated automatically

What Gets Migrated

User Preferences (from alpha_users.preferences) ✅ User Profile (username, email) ✅ Configuration Overrides (personalization)

NOT Migrated:

Manual Process

Why Manual:

Estimated Time:


Compliance and Best Practices

Data Handling

Alpha Phase:

Production Phase:

Documentation Requirements

For Alpha Testers:

For Developers:


Testing and Verification

Automated Tests

# Data isolation tests
python -m pytest tests/config/test_data_isolation.py

# UserContextService tests
python -m pytest tests/services/test_user_context_service.py

# Migration script tests
python -m pytest tests/scripts/test_migration.py

Manual Verification

# Verify PIPER.md has no personal data
grep -i "christian\|xian\|VA\|DRAGONS" config/PIPER.md
# Expected: No matches

# Test migration script flexibility
python scripts/migrate_personal_data.py --username test-user

# Verify data isolation
python scripts/test_user_data_isolation.py
# Expected: Each user sees only their data

Future Considerations

When to Revisit This Decision

Triggers to Reconsider:

  1. Scale: More than 50 alpha users (local databases become burden)
  2. Collaboration: Need real-time data sharing between users
  3. Features: Implementing features that require centralized data
  4. Cost: Local setup complexity exceeds hosting cost
  5. Security: Need centralized security controls

Expected Timeline:

Production Architecture (Future)

┌─────────────────────────────────────────────┐
│ Production Environment (Hosted)             │
├─────────────────────────────────────────────┤
│                                             │
│ Web Application (Render/Railway)            │
│   ├── FastAPI backend                       │
│   └── React frontend                        │
│                                             │
│ PostgreSQL Database (Hosted)                │
│   ├── users table (all production users)   │
│   ├── Row-level security enabled            │
│   └── Encrypted connections (SSL/TLS)       │
│                                             │
│ Email Service (SendGrid)                    │
│   ├── Verification emails                   │
│   ├── Password reset                        │
│   └── Rate limited                          │
│                                             │
└─────────────────────────────────────────────┘

This production architecture will be documented in a separate ADR when we reach that phase.



Decision Log

Date Event Decision
2025-11-01 08:10 Discovered confusion about database location Clarified code vs data separation
2025-11-01 08:12 Discussed branch strategy with PM Confirmed separate local databases
2025-11-01 08:21 Migration script made flexible Added –username parameter
2025-11-01 08:26 ADR drafted and accepted Local database per environment

Approval

Approved by: Christian Crumlish (PM) Date: 2025-11-01 Status: Accepted and Implemented

Implementation Evidence:


Last Updated: 2025-11-01 08:26 AM PT Next Review: Before MVP deployment (when hosting production database)