Pattern-045: Green Tests, Red User

Status: Established Date: December 25, 2025 Category: Testing Anti-Pattern (Development & Process) Related Issues: #479 (CRUD failures), #485 (FK violations), #487 (Intent classification)

Overview

An anti-pattern where unit tests pass with mocked dependencies but real user testing against actual infrastructure reveals systematic failures. Named for the testing dashboard showing “green” while users experience “red” errors.

Part of the Completion Discipline Triad: Patterns 045, 046, and 047 form a reinforcing system:

Problem Statement

Systems can achieve high test coverage and passing test suites while being fundamentally broken for real users due to:

Pattern Manifestations

Case 1: UUID Type Mismatch (Dec 7, 2025)

Case 2: FK Violations (Dec 17-18, 2025)

Case 3: Intent Classification (Dec 20, 2025)

Prevention Strategies

1. Integration Testing Requirements

# Bad: Unit test with mocks
def test_create_todo_mocked():
    repo = Mock()
    repo.create.return_value = Todo(...)
    assert repo.create(name="Test")

# Good: Integration test with real DB
@pytest.mark.integration
async def test_create_todo_real():
    async with real_database() as db:
        todo = await TodoRepository(db).create(name="Test")
        assert todo.id is not None

2. Schema Validation on Startup

class SchemaValidator:
    def validate_on_startup(self):
        """Compare DB schema with SQLAlchemy models"""
        for model in Base.metadata.tables:
            db_columns = inspector.get_columns(model)
            for column in model.columns:
                if not types_compatible(column.type, db_columns[column.name]):
                    raise SchemaValidationError(...)

3. Fresh Install Testing

@pytest.fixture
def fresh_database():
    """Database with no pre-existing data"""
    # No users, no setup flags, virgin state
    yield empty_db

def test_setup_wizard_fresh_install(fresh_database):
    """Test the actual first-time user experience"""
    # Should work without any pre-existing entities

4. E2E Scenario Testing

Test actual user workflows, not just components:

5. Critical Path Smoke Tests

Before marking issues “done”:

Acceptance Criteria Updates

All issues should include:

Detection Signals

Watch for these warning signs:

Cultural Practices

“Done” Means User-Verified

The Five Whys Protocol

When user failures occur:

  1. Why did it fail for the user?
  2. Why didn’t tests catch it?
  3. Why were tests inadequate?
  4. Why wasn’t integration tested?
  5. Why was this gap acceptable?

Verification-First Development

From Pattern-006: Write verification before implementation

Implementation Checklist

When implementing new features:

Anti-Pattern Remediation

If you discover Green Tests, Red User:

  1. Stop and create integration test that fails
  2. Fix the integration issue
  3. Add schema validation if applicable
  4. Add fresh install test
  5. Document in Pattern-045 instances
  6. Update acceptance criteria going forward

Historical Impact

Cumulative Time Lost: ~40+ hours of debugging that integration tests would have prevented

Quotes

“The discipline is to mark it ‘done’ when a user can use it.” - Lead Developer, Dec 3

“Schema defined owner_id as uuid, models as String. PostgreSQL rejected operations with type mismatch error. Root cause why ALL CRUD failed.” - Dec 7 Omnibus

“Unit tests with mocks passed; real database revealed the truth.” - Dec 22 Memo


Last Updated: December 27, 2025 Instances: 3 major (UUID mismatch, FK violations, Intent classification) Prevention: Integration tests, schema validation, fresh install scenarios