LLM Feedback Loop
Iteratively refine LLM responses through multiple critique-and-improvement cycles. Generate an answer, critique it, improve it, and repeat until you achieve maximum quality.
What is LLM Feedback Loop?
LLM Feedback Loop is an iterative self-improvement system that refines AI responses through structured critique and improvement cycles. Instead of accepting the first answer, it systematically improves quality through multiple refinement iterations.
- Iterative Refinement: Generate → Critique → Improve → Repeat
- Structured Critique: Detailed feedback with strengths, weaknesses, and actionable suggestions
- Quality Scoring: Track improvement with 1-10 scores across iterations
- Smart Stopping: Automatic convergence detection and quality thresholds
- Complete History: Full iteration history with improvement trajectory
- Flexible Architecture: Single provider, dual provider, or multi-provider rotation
Three Architectural Patterns
Choose the architecture that fits your quality and budget needs:
🔄 Single Provider
Same LLM generates, critiques, and improves its own answers
Best for: Quick iterations, cost-effective
Pattern: Generate → Self-Critique → Improve → Repeat
⚡ Dual Provider
One LLM generates, another critiques and provides feedback
Best for: Critical tasks, balanced approach
Pattern: Generator creates → Critic evaluates → Generator improves
🌐 Multi-Provider
Multiple providers rotate through generation and critique roles
Best for: Maximum quality, diversity
Pattern: Provider A generates → B critiques → B improves → C critiques
Quick Start
from SimplerLLM.language import LLM, LLMProvider
from SimplerLLM.language.llm_feedback import LLMFeedbackLoop
# Create LLM instance
llm = LLM.create(
provider=LLMProvider.OPENAI,
model_name="gpt-4o-mini"
)
# Create feedback loop
feedback = LLMFeedbackLoop(
llm=llm,
max_iterations=3,
check_convergence=True
)
# Improve an answer
result = feedback.improve(
"Explain machine learning in simple terms"
)
# See the improvement
print(f"Improvement: {result.initial_score:.1f} → {result.final_score:.1f}")
print(f"Final answer: {result.final_answer}")
Architectural Patterns in Detail
Single Provider Self-Critique
llm = LLM.create(LLMProvider.OPENAI, model_name="gpt-4")
feedback = LLMFeedbackLoop(llm=llm, max_iterations=3)
result = feedback.improve("Explain quantum computing")
Dual Provider (Generator + Critic)
generator = LLM.create(LLMProvider.OPENAI, model_name="gpt-4")
critic = LLM.create(LLMProvider.ANTHROPIC, model_name="claude-sonnet-4")
feedback = LLMFeedbackLoop(
generator_llm=generator,
critic_llm=critic,
max_iterations=3
)
result = feedback.improve("Write a professional email")
Multi-Provider Rotation
providers = [
LLM.create(LLMProvider.OPENAI, model_name="gpt-4"),
LLM.create(LLMProvider.ANTHROPIC, model_name="claude-sonnet-4"),
LLM.create(LLMProvider.GEMINI, model_name="gemini-pro"),
]
feedback = LLMFeedbackLoop(providers=providers, max_iterations=3)
result = feedback.improve("Explain neural networks")
Stopping Criteria
Control when the feedback loop stops with three criteria (can be combined):
1. Max Iterations (Required)
feedback = LLMFeedbackLoop(
llm=llm,
max_iterations=5
)
# Stops after 5 iterations
2. Quality Threshold
feedback = LLMFeedbackLoop(
llm=llm,
max_iterations=10,
quality_threshold=9.0
)
# Stops when score >= 9.0
3. Convergence Detection
feedback = LLMFeedbackLoop(
llm=llm,
max_iterations=10,
check_convergence=True,
convergence_threshold=0.1
)
# Stops if improvement < 10%
Temperature Scheduling
Control creativity across iterations for better convergence:
Fixed (Default)
feedback = LLMFeedbackLoop(
llm=llm,
temperature=0.7
)
Same temperature all iterations
Decreasing
feedback = LLMFeedbackLoop(
llm=llm,
temperature=0.9,
temperature_schedule="decreasing"
)
0.9 → 0.63 → 0.44 → 0.31
Custom
feedback = LLMFeedbackLoop(
llm=llm,
temperature_schedule=[0.9, 0.7, 0.5, 0.3]
)
Precise control per iteration
Configuration Options
LLMFeedbackLoop(
llm=llm, # Single provider
# OR generator_llm=gen, critic_llm=critic # Dual provider
# OR providers=[llm1, llm2, llm3] # Multi-provider
max_iterations=3, # Max improvement cycles
convergence_threshold=0.1, # Stop if improvement < 10%
quality_threshold=None, # Stop if score >= threshold
check_convergence=True, # Enable convergence detection
default_criteria=["accuracy", "clarity", "completeness"],
temperature=0.7, # Base temperature
temperature_schedule=None, # "fixed", "decreasing", or list
verbose=False # Detailed logging
)
Working with Results
result = feedback.improve("Explain AI")
# Final output
print(result.final_answer)
print(f"Score: {result.final_score}/10")
print(f"Improved from: {result.initial_score}/10")
# Improvement trajectory
print(f"Scores: {result.improvement_trajectory}")
# Example: [6.5, 7.8, 8.9, 9.2]
# Access iteration history
for iteration in result.all_iterations:
print(f"\nIteration {iteration.iteration_number}")
print(f" Score: {iteration.critique.quality_score}")
print(f" Strengths: {iteration.critique.strengths}")
print(f" Weaknesses: {iteration.critique.weaknesses}")
print(f" Suggestions: {iteration.critique.improvement_suggestions}")
print(f" Answer: {iteration.answer[:100]}...")
# Metadata
print(f"Stopped because: {result.stopped_reason}")
print(f"Converged: {result.convergence_detected}")
print(f"Total time: {result.total_execution_time:.2f}s")
Result Structure
final_answer- The refined final answerall_iterations- Complete history of all iterationsinitial_score/final_score- Quality scores (1-10)improvement_trajectory- Score progression liststopped_reason- "max_iterations", "converged", or "threshold_met"convergence_detected- Whether convergence was detected
Advanced Features
Focus on Specific Criteria
result = feedback.improve(
prompt="Explain blockchain",
focus_on=["simplicity", "clarity", "conciseness"]
)
# Critique will emphasize these specific aspects
Start with Existing Content
draft_email = "Hey, can we meet tomorrow?"
result = feedback.improve(
prompt="Make this email more professional",
initial_answer=draft_email,
focus_on=["professionalism", "clarity"]
)
print(result.final_answer)
# "Dear [Name], I would like to schedule a meeting..."
Custom Prompt Templates
custom_critique = """
Evaluate this answer strictly on {criteria}.
Question: {original_prompt}
Answer: {current_answer}
Provide harsh, specific critique.
"""
result = feedback.improve(
prompt="Explain AI",
critique_prompt_template=custom_critique
)
Real-World Example: Content Refinement
from SimplerLLM.language import LLM, LLMProvider
from SimplerLLM.language.llm_feedback import LLMFeedbackLoop
# Use a cost-effective model with multiple iterations
llm = LLM.create(LLMProvider.OPENAI, model_name="gpt-4o-mini")
feedback = LLMFeedbackLoop(
llm=llm,
max_iterations=5,
quality_threshold=9.0,
temperature_schedule="decreasing",
verbose=True
)
# Refine a blog post introduction
blog_intro = """
Machine learning is cool. It's when computers learn things
by looking at data. It's used everywhere now.
"""
result = feedback.improve(
prompt="Refine this blog post introduction to be professional and engaging",
initial_answer=blog_intro,
focus_on=["professionalism", "engagement", "clarity", "technical_accuracy"]
)
print(f"Improved from score {result.initial_score} to {result.final_score}")
print(f"Iterations: {result.total_iterations}")
print(f"Stopped because: {result.stopped_reason}")
print(f"\nFinal version:\n{result.final_answer}")
Use Cases
- ✍️ Content Refinement: Improve blog posts, documentation, marketing copy
- 📧 Automated Editing: Polish emails, reports, proposals
- 🎯 Maximum Quality: Mission-critical content requiring highest quality
- 💰 Cost-Effective Quality: Use cheap model + iterations vs expensive model once
- 🔬 Research & Analysis: Track improvement process with full iteration history
- 🤝 Multi-Model Collaboration: Leverage strengths of different models
- 💻 Code Review: Iteratively improve code based on structured critique
Comparison: LLM Feedback Loop vs LLM Judge
Both features improve answer quality, but use different strategies:
| Feature | LLM Judge | LLM Feedback Loop |
|---|---|---|
| Purpose | Compare multiple providers | Iteratively improve single answer |
| Input | Multiple providers answer once | One answer improved multiple times |
| Output | Best/synthesized from multiple | Refined single answer |
| Iterations | 1 round | Multiple rounds (configurable) |
| Use Case | Get best answer fast | Maximize answer quality |
| Cost | N providers × 1 call each | ~2N API calls (N iterations × 2) |
Use LLM Judge when: You want to compare different models and pick the best answer quickly
Use LLM Feedback Loop when: You want to maximize quality through iterative refinement
Best Practices
⚙️ Configuration
- Start with 3 iterations, increase if needed
- Enable convergence detection to avoid waste
- Set quality thresholds for target quality
- Use
focus_onto guide improvements
💰 Cost Optimization
- Each iteration = 2 API calls (critique + improve)
- Cheap model + iterations may match expensive model quality
- Convergence typically happens in 2-4 iterations
- Monitor costs: N iterations ≈ 2N API calls
🎯 Quality Maximization
- Use dual provider for critical tasks
- Set decreasing temperature schedule
- Use specific
focus_oncriteria - Monitor improvement trajectory
🐛 Debugging
- Use
verbose=Trueduring development - Check
stopped_reasonin results - Review critique details for insights
- Analyze improvement trajectory
Performance Considerations
- Iterations: More iterations = better quality but higher cost and latency
- Convergence: Usually converges in 2-4 iterations for most tasks
- Processing: Currently sequential (critique → improve → critique...)
- API Calls: N iterations ≈ 2N API calls (1 critique + 1 improvement per iteration)
- Time: Expect 2-15 seconds per iteration depending on model and complexity
Async Support
import asyncio
async def improve_async():
llm = LLM.create(LLMProvider.OPENAI, model_name="gpt-4o-mini")
feedback = LLMFeedbackLoop(llm=llm, max_iterations=3)
result = await feedback.improve_async(
"Explain cloud computing"
)
return result
# Run async
result = asyncio.run(improve_async())