Domain: Document Analysis & Summarization
Issue: LLM generates non-standard markdown (• - bullets, formatting issues)
Wrong Fix: Frontend preprocessing (violates layered architecture)
Correct Fix: Domain layer formatting rules and post-processing
services/prompts/summarization.pyIMPORTANT FORMATTING RULES:
codeYour response should include:
Format your response using clean, standard markdown that will render properly in any markdown parser.
Document content: {content} “””
#### Task 2: Create Domain Service for Markdown Validation
- [ ] **File**: `services/utils/markdown_formatter.py` (create if doesn't exist)
- [ ] **Action**: Domain service to ensure LLM output follows standards
- [ ] **Implementation**:
```python
"""
Domain service for markdown formatting validation and correction
"""
import re
from typing import str
class MarkdownFormatter:
"""Domain service responsible for ensuring markdown output follows standards"""
@staticmethod
def ensure_standard_format(markdown_text: str) -> str:
"""
Ensure LLM-generated markdown follows CommonMark standards
This is a domain service that enforces business rules about
how markdown should be formatted in our system.
"""
if not markdown_text:
return ""
# Domain rule: Use standard bullet syntax
cleaned = re.sub(r'^• - ', '- ', markdown_text, flags=re.MULTILINE)
# Domain rule: Ensure proper header spacing
cleaned = re.sub(r'^(#{1,6})([^\s#])', r'\1 \2', cleaned, flags=re.MULTILINE)
# Domain rule: Fix broken bold formatting
cleaned = re.sub(r'\*\*([^*]+)\*([^*]*)\*\*', r'**\1\2**', cleaned)
return cleaned
@staticmethod
def validate_markdown_syntax(markdown_text: str) -> list[str]:
"""
Validate markdown syntax and return list of issues found
Used for monitoring LLM output quality
"""
issues = []
# Check for non-standard bullet syntax
if re.search(r'^• - ', markdown_text, re.MULTILINE):
issues.append("Non-standard bullet syntax: '• -' found")
# Check for malformed headers
if re.search(r'^#{1,6}[^\s#]', markdown_text, re.MULTILINE):
issues.append("Malformed headers: missing space after #")
return issues
services/analysis/text_analyzer.pyanalyze_document method after LLM completionsummary_raw = await self.llm_client.complete( task_type=TaskType.SUMMARIZE.value, prompt=summary_prompt.format(content=text[:3000]) )
summary = MarkdownFormatter.ensure_standard_format(summary_raw)
issues = MarkdownFormatter.validate_markdown_syntax(summary_raw) if issues: logger.warning(f”Markdown formatting issues detected: {issues}”)
#### Task 4: Update Document Analyzer
- [ ] **File**: `services/analysis/document_analyzer.py`
- [ ] **Action**: Same integration as text analyzer
- [ ] **Implementation**: Apply `MarkdownFormatter.ensure_standard_format()` after LLM completion
### Phase 3: Testing & Validation
#### Task 5: Create Domain Service Tests
- [ ] **File**: `tests/services/utils/test_markdown_formatter.py` (create)
- [ ] **Action**: Test domain formatting rules
- [ ] **Implementation**:
```python
import pytest
from services.utils.markdown_formatter import MarkdownFormatter
class TestMarkdownFormatter:
def test_fixes_non_standard_bullets(self):
input_text = "• - Item 1\n• - Item 2"
expected = "- Item 1\n- Item 2"
result = MarkdownFormatter.ensure_standard_format(input_text)
assert result == expected
def test_fixes_malformed_headers(self):
input_text = "##Header\n###Another"
expected = "## Header\n### Another"
result = MarkdownFormatter.ensure_standard_format(input_text)
assert result == expected
def test_validates_syntax_issues(self):
problematic_text = "• - Bad bullet\n##Bad header"
issues = MarkdownFormatter.validate_markdown_syntax(problematic_text)
assert len(issues) == 2
assert any("bullet syntax" in issue for issue in issues)
assert any("headers" in issue for issue in issues)
tests/services/analysis/test_text_analyzer.pyasync def test_analyze_document_formats_markdown_properly(self):
# Test that domain formatting rules are applied
result = await self.analyzer.analyze_document(...)
# Should not contain non-standard syntax
assert "• -" not in result.summary
assert not result.summary.startswith("##") # Should have space
services/analysis/text_analyzer.py# After validation
issues = MarkdownFormatter.validate_markdown_syntax(summary_raw)
if issues:
# Could add to metrics/monitoring system
logger.info(f"LLM markdown quality metrics: {len(issues)} issues found")
Total: ~70 minutes for proper DDD solution vs. 5 minutes for quick fix
This solution: