ADR-011: Test Infrastructure Hanging Fixes

Status

Accepted - 2025-07-30

Context

During Phase 4 TDD implementation, tests were hanging during teardown due to MCP connection pool issues. The hanging was caused by:

  1. Circular import errors during MCP module teardown
  2. Logging errors on closed file handles during shutdown
  3. Timeout during pool shutdown without proper cleanup
  4. Connection pool configuration issues in test environment

Decision

Implement aggressive timeout handling and defensive import strategies to prevent test hanging:

1. MCP Connection Pool Timeout Handling

# In conftest.py mcp_infrastructure_reset fixture
try:
    await asyncio.wait_for(pool.shutdown(), timeout=0.1)
except asyncio.TimeoutError:
    # Force shutdown immediately if timeout occurs
    pool._is_shutdown = True
    pool._all_connections.clear()
    pool._available_connections.clear()
    pool._pool_lock = None

2. Defensive Import Strategies

# Only try to reset MCP if the module exists and can be imported
try:
    from services.infrastructure.mcp.connection_pool import MCPConnectionPool
except ImportError:
    # MCP modules not available, skip cleanup
    pass

3. Logging Level Management

# Disable logging during shutdown to prevent I/O errors
original_level = logging.getLogger().level
logging.getLogger().setLevel(logging.ERROR)

try:
    # MCP cleanup operations
    pass
finally:
    # Restore logging level
    logging.getLogger().setLevel(original_level)

4. Database Engine Disposal

# In cleanup_sessions fixture
try:
    from services.database.connection import db
    if db.engine:
        await db.engine.dispose()
except Exception:
    pass  # Ignore disposal errors during cleanup

Consequences

Positive

Negative

Risks

Implementation

Files Modified

Configuration

# Database connection pool settings
pool_size=5,           # Multiple connections for concurrent tests
max_overflow=10,       # Allow bursts during peak usage
pool_recycle=3600      # Refresh connections every hour

Monitoring

Notes

This ADR addresses immediate infrastructure stability issues. Future work should investigate root causes of MCP circular imports and implement more robust connection pool management.

The aggressive timeout approach is a pragmatic solution that enables continued development while the underlying architectural issues are addressed.