Structured Output
Generate validated, type-safe JSON responses from LLMs using Pydantic models, with automatic retry logic on validation failures.
Why Structured Output?
LLMs naturally generate unstructured text, but many applications need structured data like JSON. SimplerLLM's structured output feature uses Pydantic models to ensure LLM responses conform to your exact specifications with automatic validation and type checking.
Key Benefits
- Type Safety: Automatic validation and type checking with Pydantic
- Retry Logic: Automatically retries if validation fails
- Developer Experience: IntelliSense and autocomplete for response fields
- Data Validation: Ensures responses match your schema exactly
- Consistency: Same structure across all LLM providers
Basic Usage
Here's a simple example of generating structured JSON output:
from pydantic import BaseModel, Field
from SimplerLLM.language.llm import LLM, LLMProvider
from SimplerLLM.language.llm_addons import generate_pydantic_json_model
# Define your data model
class MovieRecommendation(BaseModel):
title: str = Field(description="Movie title")
genre: str = Field(description="Movie genre")
year: int = Field(description="Release year")
rating: float = Field(description="IMDb rating out of 10")
reason: str = Field(description="Why this movie is recommended")
# Create LLM instance
llm = LLM.create(
provider=LLMProvider.OPENAI,
model_name="gpt-4o"
)
# Generate structured response
prompt = "Recommend a great science fiction movie from the 2020s"
recommendation = generate_pydantic_json_model(
llm_instance=llm,
prompt=prompt,
model_class=MovieRecommendation
)
# Access validated fields
print(f"Title: {recommendation.title}")
print(f"Genre: {recommendation.genre}")
print(f"Year: {recommendation.year}")
print(f"Rating: {recommendation.rating}")
print(f"Reason: {recommendation.reason}")
Expected Output
Title: Dune: Part Two
Genre: Science Fiction
Year: 2024
Rating: 8.6
Reason: A visually stunning epic that combines political intrigue with spectacular action sequences...
Defining Pydantic Models
Pydantic models define the structure and validation rules for your data:
Basic Model
from pydantic import BaseModel, Field
class Person(BaseModel):
name: str = Field(description="Full name")
age: int = Field(description="Age in years", ge=0, le=150)
email: str = Field(description="Email address")
Model with Nested Objects
from typing import List
class Address(BaseModel):
street: str = Field(description="Street address")
city: str = Field(description="City name")
country: str = Field(description="Country name")
class Company(BaseModel):
name: str = Field(description="Company name")
founded: int = Field(description="Year founded")
address: Address = Field(description="Company headquarters")
employees: List[str] = Field(description="List of employee names")
Model with Optional Fields
from typing import Optional
class Product(BaseModel):
name: str = Field(description="Product name")
price: float = Field(description="Price in USD", gt=0)
description: Optional[str] = Field(description="Product description", default=None)
in_stock: bool = Field(description="Availability status", default=True)
Model with Enums
from enum import Enum
class Priority(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
URGENT = "urgent"
class Task(BaseModel):
title: str = Field(description="Task title")
priority: Priority = Field(description="Task priority level")
completed: bool = Field(description="Completion status", default=False)
Pro Tip
Use detailed Field(description=...)
values. The LLM uses these descriptions to understand what data to generate, making your results more accurate.
Configuration Options
Customize the structured output generation behavior:
result = generate_pydantic_json_model(
llm_instance=llm,
prompt=prompt,
model_class=YourModel,
max_retries=3, # Number of retry attempts on validation failure
verbose=True # Print detailed logs
)
Parameter Reference
llm_instance (required)
The LLM instance to use for generation
prompt (required)
The prompt describing what data to generate
model_class (required)
The Pydantic model class defining the output structure
max_retries (optional, default: 3)
Maximum retry attempts if validation fails
verbose (optional, default: False)
Enable detailed logging of generation and validation
Using with ReliableLLM
Combine structured output with automatic failover for maximum reliability:
from SimplerLLM.language.llm import LLM, LLMProvider
from SimplerLLM.language.llm.reliable import ReliableLLM
from SimplerLLM.language.llm_addons import generate_pydantic_json_model_reliable
from pydantic import BaseModel, Field
# Define your model
class ArticleSummary(BaseModel):
title: str = Field(description="Article title")
summary: str = Field(description="Brief summary in 2-3 sentences")
key_points: list[str] = Field(description="List of 3-5 key points")
sentiment: str = Field(description="Overall sentiment: positive, negative, or neutral")
# Create ReliableLLM
primary_llm = LLM.create(provider=LLMProvider.OPENAI, model_name="gpt-4o")
secondary_llm = LLM.create(provider=LLMProvider.ANTHROPIC, model_name="claude-3-5-sonnet-20241022")
reliable_llm = ReliableLLM(primary_llm, secondary_llm)
# Generate with automatic failover
prompt = """
Analyze this article about renewable energy:
[Your article text here...]
"""
summary, provider, model_name = generate_pydantic_json_model_reliable(
reliable_llm=reliable_llm,
prompt=prompt,
model_class=ArticleSummary
)
print(f"Generated by: {provider.name} using {model_name}")
print(f"Title: {summary.title}")
print(f"Summary: {summary.summary}")
print(f"Key Points: {', '.join(summary.key_points)}")
print(f"Sentiment: {summary.sentiment}")
Why Use ReliableLLM?
If the primary provider fails or returns invalid JSON, it automatically tries the secondary provider. This ensures your structured output generation is highly reliable.
Real-World Examples
Example 1: Extract Product Information
from pydantic import BaseModel, Field
from typing import List
class ProductInfo(BaseModel):
name: str = Field(description="Product name")
category: str = Field(description="Product category")
price: float = Field(description="Price in USD")
features: List[str] = Field(description="List of key features")
pros: List[str] = Field(description="Advantages")
cons: List[str] = Field(description="Disadvantages")
llm = LLM.create(provider=LLMProvider.OPENAI, model_name="gpt-4o")
prompt = """
Extract product information from this description:
"The iPhone 15 Pro is a premium smartphone featuring a titanium design,
A17 Pro chip, advanced camera system with 5x optical zoom, and USB-C charging.
Priced at $999, it offers excellent performance and camera quality but has
limited battery life and high cost."
"""
product = generate_pydantic_json_model(
llm_instance=llm,
prompt=prompt,
model_class=ProductInfo
)
print(f"Product: {product.name}")
print(f"Category: {product.category}")
print(f"Price: ${product.price}")
print(f"Features: {', '.join(product.features)}")
print(f"Pros: {', '.join(product.pros)}")
print(f"Cons: {', '.join(product.cons)}")
Example 2: Meeting Notes Extraction
from datetime import datetime
from typing import List
class ActionItem(BaseModel):
task: str = Field(description="Task description")
assignee: str = Field(description="Person responsible")
deadline: str = Field(description="Deadline date")
class MeetingNotes(BaseModel):
meeting_title: str = Field(description="Meeting title")
date: str = Field(description="Meeting date")
attendees: List[str] = Field(description="List of attendees")
summary: str = Field(description="Brief meeting summary")
decisions: List[str] = Field(description="Key decisions made")
action_items: List[ActionItem] = Field(description="Action items from meeting")
llm = LLM.create(provider=LLMProvider.ANTHROPIC, model_name="claude-3-5-sonnet-20241022")
prompt = """
Extract structured information from this meeting transcript:
[Meeting Title: Q4 Product Planning]
[Date: January 15, 2024]
[Attendees: Sarah (Product Manager), John (Engineer), Mike (Designer)]
We discussed the Q4 roadmap. Sarah decided to prioritize the mobile app redesign.
John will implement the new API by March 1st. Mike agreed to provide initial
mockups by February 15th. We also decided to delay the analytics feature to Q1 2025.
"""
notes = generate_pydantic_json_model(
llm_instance=llm,
prompt=prompt,
model_class=MeetingNotes
)
print(f"Meeting: {notes.meeting_title}")
print(f"Date: {notes.date}")
print(f"Attendees: {', '.join(notes.attendees)}")
print(f"\nSummary: {notes.summary}")
print(f"\nDecisions:")
for decision in notes.decisions:
print(f" - {decision}")
print(f"\nAction Items:")
for item in notes.action_items:
print(f" - {item.task} ({item.assignee} by {item.deadline})")
Example 3: Content Classification
from enum import Enum
from typing import List
class Category(str, Enum):
TECHNOLOGY = "technology"
BUSINESS = "business"
SCIENCE = "science"
HEALTH = "health"
ENTERTAINMENT = "entertainment"
class Sentiment(str, Enum):
POSITIVE = "positive"
NEGATIVE = "negative"
NEUTRAL = "neutral"
class ContentAnalysis(BaseModel):
category: Category = Field(description="Primary content category")
sentiment: Sentiment = Field(description="Overall sentiment")
topics: List[str] = Field(description="Main topics discussed (3-5)")
summary: str = Field(description="One sentence summary")
confidence: float = Field(description="Classification confidence 0-1", ge=0, le=1)
llm = LLM.create(provider=LLMProvider.GEMINI, model_name="gemini-1.5-pro")
prompt = """
Analyze this article:
"Researchers at MIT have developed a new AI model that can predict protein structures
with 95% accuracy. This breakthrough could accelerate drug discovery and our understanding
of diseases. The team is optimistic about clinical applications within the next five years."
"""
analysis = generate_pydantic_json_model(
llm_instance=llm,
prompt=prompt,
model_class=ContentAnalysis
)
print(f"Category: {analysis.category.value}")
print(f"Sentiment: {analysis.sentiment.value}")
print(f"Topics: {', '.join(analysis.topics)}")
print(f"Summary: {analysis.summary}")
print(f"Confidence: {analysis.confidence:.2%}")
Error Handling
Handle validation errors and edge cases properly:
from pydantic import BaseModel, Field, ValidationError
from SimplerLLM.language.llm import LLM, LLMProvider
from SimplerLLM.language.llm_addons import generate_pydantic_json_model
class DataModel(BaseModel):
field1: str = Field(description="Description")
field2: int = Field(description="Number")
llm = LLM.create(provider=LLMProvider.OPENAI, model_name="gpt-4o")
try:
result = generate_pydantic_json_model(
llm_instance=llm,
prompt="Your prompt",
model_class=DataModel,
max_retries=3
)
# Check if result is the model or an error string
if isinstance(result, DataModel):
print(f"Success: {result.field1}, {result.field2}")
else:
# Result is an error message string
print(f"Failed after retries: {result}")
except ValidationError as e:
print(f"Validation error: {e}")
except Exception as e:
print(f"Error: {e}")
Common Error Scenarios
- Invalid JSON: LLM returns malformed JSON - automatically retried
- Validation Failure: Data doesn't match schema - automatically retried with error feedback
- Type Mismatch: Wrong data types - Pydantic attempts coercion or fails validation
- Missing Required Fields: LLM doesn't provide all required fields - retried with specific feedback
Best Practices
1. Write Descriptive Field Descriptions
The LLM uses field descriptions to understand what to generate:
# Good
name: str = Field(description="Person's full name including first and last name")
# Bad
name: str = Field(description="Name")
2. Use Validation Constraints
Add validation rules to ensure data quality:
age: int = Field(description="Age in years", ge=0, le=150)
price: float = Field(description="Price in USD", gt=0)
email: str = Field(description="Email", pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$")
3. Start Simple, Then Add Complexity
Test with simple models first, then add nested objects and lists
4. Use Enums for Fixed Choices
Enums ensure LLMs select from predefined options, improving accuracy
5. Provide Clear Prompts
Include context and examples in your prompts for better results
6. Test with Multiple Providers
Different LLMs have varying levels of JSON generation capability
What's Next?
Async Support →
Use async/await for better performance
← Reliable LLM
Add automatic failover between providers
← LLM Interface
Learn about the unified LLM interface
Quick Start Guide →
Step-by-step tutorial for beginners
Need More Help?
Check out our full documentation, join the Discord community, or browse example code on GitHub.