Pattern-045: Green Tests, Red User
Status: Established
Date: December 25, 2025
Category: Testing Anti-Pattern (Development & Process)
Related Issues: #479 (CRUD failures), #485 (FK violations), #487 (Intent classification)
Overview
An anti-pattern where unit tests pass with mocked dependencies but real user testing against actual infrastructure reveals systematic failures. Named for the testing dashboard showing “green” while users experience “red” errors.
Part of the Completion Discipline Triad: Patterns 045, 046, and 047 form a reinforcing system:
- Pattern-045 reveals the gap (tests pass, users fail)
- Pattern-046 prevents premature closure (Beads discipline)
- Pattern-047 enables pause when uncertain (Time Lord Alert)
Problem Statement
Systems can achieve high test coverage and passing test suites while being fundamentally broken for real users due to:
- Mocked dependencies hiding integration issues
- Schema/model drift not caught by unit tests
- Type mismatches only enforced at database level
- Temporal bugs (operations before entities exist)
- Configuration differences between test and production
Pattern Manifestations
Case 1: UUID Type Mismatch (Dec 7, 2025)
- Tests: 705 unit tests passing
- Reality: All CRUD operations failing
- Root Cause: Schema defined
owner_id as uuid, models as String
- Why Escaped: PostgreSQL type checking bypassed by mocks
- Debug Time: 24 hours across 6 layers
Case 2: FK Violations (Dec 17-18, 2025)
- Tests: All tests passing
- Reality: Setup wizard crashes with FK violations
- Root Causes:
store_user_key() commits before user exists
learned_patterns uses hardcoded user_id
- Why Escaped: Test fixtures pre-create users
- Debug Time: Multiple sessions
Case 3: Intent Classification (Dec 20, 2025)
- Tests: Intent tests passing
- Reality: “Menu of services” returns generic response
- Root Cause: Missing patterns, over-greedy matching
- Why Escaped: No discovery scenario tests
- Debug Time: 12+ hour overnight session
Prevention Strategies
1. Integration Testing Requirements
# Bad: Unit test with mocks
def test_create_todo_mocked():
repo = Mock()
repo.create.return_value = Todo(...)
assert repo.create(name="Test")
# Good: Integration test with real DB
@pytest.mark.integration
async def test_create_todo_real():
async with real_database() as db:
todo = await TodoRepository(db).create(name="Test")
assert todo.id is not None
2. Schema Validation on Startup
class SchemaValidator:
def validate_on_startup(self):
"""Compare DB schema with SQLAlchemy models"""
for model in Base.metadata.tables:
db_columns = inspector.get_columns(model)
for column in model.columns:
if not types_compatible(column.type, db_columns[column.name]):
raise SchemaValidationError(...)
3. Fresh Install Testing
@pytest.fixture
def fresh_database():
"""Database with no pre-existing data"""
# No users, no setup flags, virgin state
yield empty_db
def test_setup_wizard_fresh_install(fresh_database):
"""Test the actual first-time user experience"""
# Should work without any pre-existing entities
4. E2E Scenario Testing
Test actual user workflows, not just components:
- Setup → Login → Create Todo → View Todo
- Fresh Install → API Key Validation → First Query
- Discovery queries without prior context
5. Critical Path Smoke Tests
Before marking issues “done”:
- Test in browser, not just API
- Test with fresh database
- Test complete user journey
- Verify against production-like environment
Acceptance Criteria Updates
All issues should include:
Detection Signals
Watch for these warning signs:
- High unit test coverage with user complaints
- “Works on my machine” syndrome
- Tests that only pass with fixtures
- No integration test markers
- Missing fresh install scenarios
- Hardcoded test data (user IDs, etc.)
Cultural Practices
“Done” Means User-Verified
- Code complete ≠ Done
- Tests passing ≠ Done
- User can accomplish task = Done
The Five Whys Protocol
When user failures occur:
- Why did it fail for the user?
- Why didn’t tests catch it?
- Why were tests inadequate?
- Why wasn’t integration tested?
- Why was this gap acceptable?
Verification-First Development
From Pattern-006: Write verification before implementation
- Define how you’ll know it works for users
- Create integration test scenarios first
- Mock as little as possible
Implementation Checklist
When implementing new features:
- Pattern-006: Verification-First Development
- Pattern-010: Cross-Validation Protocol
- Pattern-042: Investigation-Only Protocol
- Pattern-043: Defense-in-Depth Prevention
- Pattern-046: Beads Completion Discipline (prevents declaring done prematurely)
- Pattern-047: Time Lord Alert (enables pause when uncertain)
If you discover Green Tests, Red User:
- Stop and create integration test that fails
- Fix the integration issue
- Add schema validation if applicable
- Add fresh install test
- Document in Pattern-045 instances
- Update acceptance criteria going forward
Historical Impact
- Dec 7: 24-hour debugging marathon
- Dec 17-18: Multiple FK violation sessions
- Dec 20: 12+ hour overnight fix
Cumulative Time Lost: ~40+ hours of debugging that integration tests would have prevented
Quotes
“The discipline is to mark it ‘done’ when a user can use it.” - Lead Developer, Dec 3
“Schema defined owner_id as uuid, models as String. PostgreSQL rejected operations with type mismatch error. Root cause why ALL CRUD failed.” - Dec 7 Omnibus
“Unit tests with mocks passed; real database revealed the truth.” - Dec 22 Memo
Last Updated: December 27, 2025
Instances: 3 major (UUID mismatch, FK violations, Intent classification)
Prevention: Integration tests, schema validation, fresh install scenarios