LLM Feedback Loop

Iteratively refine LLM responses through multiple critique-and-improvement cycles. Generate an answer, critique it, improve it, and repeat until you achieve maximum quality.

What is LLM Feedback Loop?

LLM Feedback Loop is an iterative self-improvement system that refines AI responses through structured critique and improvement cycles. Instead of accepting the first answer, it systematically improves quality through multiple refinement iterations.

  • Iterative Refinement: Generate → Critique → Improve → Repeat
  • Structured Critique: Detailed feedback with strengths, weaknesses, and actionable suggestions
  • Quality Scoring: Track improvement with 1-10 scores across iterations
  • Smart Stopping: Automatic convergence detection and quality thresholds
  • Complete History: Full iteration history with improvement trajectory
  • Flexible Architecture: Single provider, dual provider, or multi-provider rotation

Three Architectural Patterns

Choose the architecture that fits your quality and budget needs:

🔄 Single Provider

Same LLM generates, critiques, and improves its own answers

Best for: Quick iterations, cost-effective

Pattern: Generate → Self-Critique → Improve → Repeat

⚡ Dual Provider

One LLM generates, another critiques and provides feedback

Best for: Critical tasks, balanced approach

Pattern: Generator creates → Critic evaluates → Generator improves

🌐 Multi-Provider

Multiple providers rotate through generation and critique roles

Best for: Maximum quality, diversity

Pattern: Provider A generates → B critiques → B improves → C critiques

Quick Start

from SimplerLLM.language import LLM, LLMProvider
from SimplerLLM.language.llm_feedback import LLMFeedbackLoop

# Create LLM instance
llm = LLM.create(
    provider=LLMProvider.OPENAI,
    model_name="gpt-4o-mini"
)

# Create feedback loop
feedback = LLMFeedbackLoop(
    llm=llm,
    max_iterations=3,
    check_convergence=True
)

# Improve an answer
result = feedback.improve(
    "Explain machine learning in simple terms"
)

# See the improvement
print(f"Improvement: {result.initial_score:.1f} → {result.final_score:.1f}")
print(f"Final answer: {result.final_answer}")

Architectural Patterns in Detail

Single Provider Self-Critique

llm = LLM.create(LLMProvider.OPENAI, model_name="gpt-4")
feedback = LLMFeedbackLoop(llm=llm, max_iterations=3)
result = feedback.improve("Explain quantum computing")

Dual Provider (Generator + Critic)

generator = LLM.create(LLMProvider.OPENAI, model_name="gpt-4")
critic = LLM.create(LLMProvider.ANTHROPIC, model_name="claude-sonnet-4")

feedback = LLMFeedbackLoop(
    generator_llm=generator,
    critic_llm=critic,
    max_iterations=3
)
result = feedback.improve("Write a professional email")

Multi-Provider Rotation

providers = [
    LLM.create(LLMProvider.OPENAI, model_name="gpt-4"),
    LLM.create(LLMProvider.ANTHROPIC, model_name="claude-sonnet-4"),
    LLM.create(LLMProvider.GEMINI, model_name="gemini-pro"),
]

feedback = LLMFeedbackLoop(providers=providers, max_iterations=3)
result = feedback.improve("Explain neural networks")

Stopping Criteria

Control when the feedback loop stops with three criteria (can be combined):

1. Max Iterations (Required)

feedback = LLMFeedbackLoop(
    llm=llm,
    max_iterations=5
)
# Stops after 5 iterations

2. Quality Threshold

feedback = LLMFeedbackLoop(
    llm=llm,
    max_iterations=10,
    quality_threshold=9.0
)
# Stops when score >= 9.0

3. Convergence Detection

feedback = LLMFeedbackLoop(
    llm=llm,
    max_iterations=10,
    check_convergence=True,
    convergence_threshold=0.1
)
# Stops if improvement < 10%

Temperature Scheduling

Control creativity across iterations for better convergence:

Fixed (Default)

feedback = LLMFeedbackLoop(
    llm=llm,
    temperature=0.7
)

Same temperature all iterations

Decreasing

feedback = LLMFeedbackLoop(
    llm=llm,
    temperature=0.9,
    temperature_schedule="decreasing"
)

0.9 → 0.63 → 0.44 → 0.31

Custom

feedback = LLMFeedbackLoop(
    llm=llm,
    temperature_schedule=[0.9, 0.7, 0.5, 0.3]
)

Precise control per iteration

Configuration Options

LLMFeedbackLoop(
    llm=llm,                                    # Single provider
    # OR generator_llm=gen, critic_llm=critic  # Dual provider
    # OR providers=[llm1, llm2, llm3]          # Multi-provider

    max_iterations=3,                          # Max improvement cycles
    convergence_threshold=0.1,                 # Stop if improvement < 10%
    quality_threshold=None,                    # Stop if score >= threshold
    check_convergence=True,                    # Enable convergence detection
    default_criteria=["accuracy", "clarity", "completeness"],
    temperature=0.7,                           # Base temperature
    temperature_schedule=None,                 # "fixed", "decreasing", or list
    verbose=False                              # Detailed logging
)

Working with Results

result = feedback.improve("Explain AI")

# Final output
print(result.final_answer)
print(f"Score: {result.final_score}/10")
print(f"Improved from: {result.initial_score}/10")

# Improvement trajectory
print(f"Scores: {result.improvement_trajectory}")
# Example: [6.5, 7.8, 8.9, 9.2]

# Access iteration history
for iteration in result.all_iterations:
    print(f"\nIteration {iteration.iteration_number}")
    print(f"  Score: {iteration.critique.quality_score}")
    print(f"  Strengths: {iteration.critique.strengths}")
    print(f"  Weaknesses: {iteration.critique.weaknesses}")
    print(f"  Suggestions: {iteration.critique.improvement_suggestions}")
    print(f"  Answer: {iteration.answer[:100]}...")

# Metadata
print(f"Stopped because: {result.stopped_reason}")
print(f"Converged: {result.convergence_detected}")
print(f"Total time: {result.total_execution_time:.2f}s")

Result Structure

  • final_answer - The refined final answer
  • all_iterations - Complete history of all iterations
  • initial_score / final_score - Quality scores (1-10)
  • improvement_trajectory - Score progression list
  • stopped_reason - "max_iterations", "converged", or "threshold_met"
  • convergence_detected - Whether convergence was detected

Advanced Features

Focus on Specific Criteria

result = feedback.improve(
    prompt="Explain blockchain",
    focus_on=["simplicity", "clarity", "conciseness"]
)
# Critique will emphasize these specific aspects

Start with Existing Content

draft_email = "Hey, can we meet tomorrow?"

result = feedback.improve(
    prompt="Make this email more professional",
    initial_answer=draft_email,
    focus_on=["professionalism", "clarity"]
)
print(result.final_answer)
# "Dear [Name], I would like to schedule a meeting..."

Custom Prompt Templates

custom_critique = """
Evaluate this answer strictly on {criteria}.
Question: {original_prompt}
Answer: {current_answer}
Provide harsh, specific critique.
"""

result = feedback.improve(
    prompt="Explain AI",
    critique_prompt_template=custom_critique
)

Real-World Example: Content Refinement

from SimplerLLM.language import LLM, LLMProvider
from SimplerLLM.language.llm_feedback import LLMFeedbackLoop

# Use a cost-effective model with multiple iterations
llm = LLM.create(LLMProvider.OPENAI, model_name="gpt-4o-mini")

feedback = LLMFeedbackLoop(
    llm=llm,
    max_iterations=5,
    quality_threshold=9.0,
    temperature_schedule="decreasing",
    verbose=True
)

# Refine a blog post introduction
blog_intro = """
Machine learning is cool. It's when computers learn things
by looking at data. It's used everywhere now.
"""

result = feedback.improve(
    prompt="Refine this blog post introduction to be professional and engaging",
    initial_answer=blog_intro,
    focus_on=["professionalism", "engagement", "clarity", "technical_accuracy"]
)

print(f"Improved from score {result.initial_score} to {result.final_score}")
print(f"Iterations: {result.total_iterations}")
print(f"Stopped because: {result.stopped_reason}")
print(f"\nFinal version:\n{result.final_answer}")

Use Cases

  • ✍️ Content Refinement: Improve blog posts, documentation, marketing copy
  • 📧 Automated Editing: Polish emails, reports, proposals
  • 🎯 Maximum Quality: Mission-critical content requiring highest quality
  • 💰 Cost-Effective Quality: Use cheap model + iterations vs expensive model once
  • 🔬 Research & Analysis: Track improvement process with full iteration history
  • 🤝 Multi-Model Collaboration: Leverage strengths of different models
  • 💻 Code Review: Iteratively improve code based on structured critique

Comparison: LLM Feedback Loop vs LLM Judge

Both features improve answer quality, but use different strategies:

Feature LLM Judge LLM Feedback Loop
Purpose Compare multiple providers Iteratively improve single answer
Input Multiple providers answer once One answer improved multiple times
Output Best/synthesized from multiple Refined single answer
Iterations 1 round Multiple rounds (configurable)
Use Case Get best answer fast Maximize answer quality
Cost N providers × 1 call each ~2N API calls (N iterations × 2)

Use LLM Judge when: You want to compare different models and pick the best answer quickly
Use LLM Feedback Loop when: You want to maximize quality through iterative refinement

Best Practices

⚙️ Configuration

  • Start with 3 iterations, increase if needed
  • Enable convergence detection to avoid waste
  • Set quality thresholds for target quality
  • Use focus_on to guide improvements

💰 Cost Optimization

  • Each iteration = 2 API calls (critique + improve)
  • Cheap model + iterations may match expensive model quality
  • Convergence typically happens in 2-4 iterations
  • Monitor costs: N iterations ≈ 2N API calls

🎯 Quality Maximization

  • Use dual provider for critical tasks
  • Set decreasing temperature schedule
  • Use specific focus_on criteria
  • Monitor improvement trajectory

🐛 Debugging

  • Use verbose=True during development
  • Check stopped_reason in results
  • Review critique details for insights
  • Analyze improvement trajectory

Performance Considerations

  • Iterations: More iterations = better quality but higher cost and latency
  • Convergence: Usually converges in 2-4 iterations for most tasks
  • Processing: Currently sequential (critique → improve → critique...)
  • API Calls: N iterations ≈ 2N API calls (1 critique + 1 improvement per iteration)
  • Time: Expect 2-15 seconds per iteration depending on model and complexity

Async Support

import asyncio

async def improve_async():
    llm = LLM.create(LLMProvider.OPENAI, model_name="gpt-4o-mini")
    feedback = LLMFeedbackLoop(llm=llm, max_iterations=3)

    result = await feedback.improve_async(
        "Explain cloud computing"
    )

    return result

# Run async
result = asyncio.run(improve_async())

Next Steps