Status: Established Date: December 25, 2025 Category: Testing Anti-Pattern (Development & Process) Related Issues: #479 (CRUD failures), #485 (FK violations), #487 (Intent classification)
An anti-pattern where unit tests pass with mocked dependencies but real user testing against actual infrastructure reveals systematic failures. Named for the testing dashboard showing “green” while users experience “red” errors.
Part of the Completion Discipline Triad: Patterns 045, 046, and 047 form a reinforcing system:
Systems can achieve high test coverage and passing test suites while being fundamentally broken for real users due to:
owner_id as uuid, models as Stringstore_user_key() commits before user existslearned_patterns uses hardcoded user_idA subtler manifestation: the completion bias dressed up as honest deferral.
[x] anyway with self-justifying parentheticals like “deferred to PM UAT — unit tests cover the shape” or “deferred — handler/adapter docstrings carry the discipline notes inline”Audit findings (May 24, 2026) — past-week closure review found 4 issues with this pattern:
[x] despite no live run[x] despite no live run; only the runner was shipped[x] with self-justifying notes; infrastructure was shipped but the live smoke wasn’t run[x] despite no live smokeWhy Escaped: The pattern is invisible if you only read the issue body — the checkboxes are [x] and the closing comment is substantive (real work shipped, real tests passing). The unmet AC is buried in a parenthetical that doesn’t change the visual checkbox state. Multi-issue accumulation compounds: each individual deferral seems reasonable, but the aggregate state is “M2 looks closer to done than it is.”
Debug Time: ~30 min to audit 20 past-week closures + 4 reopens + memory pin + this catalog entry.
Specific signal: any AC marked [x] whose accompanying note contains “deferred”, “agent cannot drive”, “unit tests cover the shape”, or “manual UAT” is lying about completion. The honest state is [ ] or [⏸] (per the #1050 UI-deferred convention).
Prevention: see [[feedback_deferred_ac_self_justification_is_premature_closure]] memory pin + the new “Acceptance Criteria Updates” guidance in this catalog (use [⏸] for deferred verification ACs, not [x]).
# Bad: Unit test with mocks
def test_create_todo_mocked():
repo = Mock()
repo.create.return_value = Todo(...)
assert repo.create(name="Test")
# Good: Integration test with real DB
@pytest.mark.integration
async def test_create_todo_real():
async with real_database() as db:
todo = await TodoRepository(db).create(name="Test")
assert todo.id is not None
class SchemaValidator:
def validate_on_startup(self):
"""Compare DB schema with SQLAlchemy models"""
for model in Base.metadata.tables:
db_columns = inspector.get_columns(model)
for column in model.columns:
if not types_compatible(column.type, db_columns[column.name]):
raise SchemaValidationError(...)
@pytest.fixture
def fresh_database():
"""Database with no pre-existing data"""
# No users, no setup flags, virgin state
yield empty_db
def test_setup_wizard_fresh_install(fresh_database):
"""Test the actual first-time user experience"""
# Should work without any pre-existing entities
Test actual user workflows, not just components:
Before marking issues “done”:
All issues should include:
Watch for these warning signs:
When user failures occur:
From Pattern-006: Write verification before implementation
When implementing new features:
If you discover Green Tests, Red User:
Cumulative Time Lost: ~40+ hours of debugging that integration tests would have prevented
“The discipline is to mark it ‘done’ when a user can use it.” - Lead Developer, Dec 3
“Schema defined owner_id as uuid, models as String. PostgreSQL rejected operations with type mismatch error. Root cause why ALL CRUD failed.” - Dec 7 Omnibus
“Unit tests with mocks passed; real database revealed the truth.” - Dec 22 Memo
Last Updated: December 27, 2025 Instances: 3 major (UUID mismatch, FK violations, Intent classification) Prevention: Integration tests, schema validation, fresh install scenarios