Structured Output

Generate validated, type-safe JSON responses from LLMs using Pydantic models, with automatic retry logic on validation failures.

Why Structured Output?

LLMs naturally generate unstructured text, but many applications need structured data like JSON. SimplerLLM's structured output feature uses Pydantic models to ensure LLM responses conform to your exact specifications with automatic validation and type checking.

Key Benefits

  • Type Safety: Automatic validation and type checking with Pydantic
  • Retry Logic: Automatically retries if validation fails
  • Developer Experience: IntelliSense and autocomplete for response fields
  • Data Validation: Ensures responses match your schema exactly
  • Consistency: Same structure across all LLM providers

Basic Usage

Here's a simple example of generating structured JSON output:

from pydantic import BaseModel, Field
from SimplerLLM.language.llm import LLM, LLMProvider
from SimplerLLM.language.llm_addons import generate_pydantic_json_model

# Define your data model
class MovieRecommendation(BaseModel):
    title: str = Field(description="Movie title")
    genre: str = Field(description="Movie genre")
    year: int = Field(description="Release year")
    rating: float = Field(description="IMDb rating out of 10")
    reason: str = Field(description="Why this movie is recommended")

# Create LLM instance
llm = LLM.create(
    provider=LLMProvider.OPENAI,
    model_name="gpt-4o"
)

# Generate structured response
prompt = "Recommend a great science fiction movie from the 2020s"
recommendation = generate_pydantic_json_model(
    llm_instance=llm,
    prompt=prompt,
    model_class=MovieRecommendation
)

# Access validated fields
print(f"Title: {recommendation.title}")
print(f"Genre: {recommendation.genre}")
print(f"Year: {recommendation.year}")
print(f"Rating: {recommendation.rating}")
print(f"Reason: {recommendation.reason}")

Expected Output

Title: Dune: Part Two
Genre: Science Fiction
Year: 2024
Rating: 8.6
Reason: A visually stunning epic that combines political intrigue with spectacular action sequences...

Defining Pydantic Models

Pydantic models define the structure and validation rules for your data:

Basic Model

from pydantic import BaseModel, Field

class Person(BaseModel):
    name: str = Field(description="Full name")
    age: int = Field(description="Age in years", ge=0, le=150)
    email: str = Field(description="Email address")

Model with Nested Objects

from typing import List

class Address(BaseModel):
    street: str = Field(description="Street address")
    city: str = Field(description="City name")
    country: str = Field(description="Country name")

class Company(BaseModel):
    name: str = Field(description="Company name")
    founded: int = Field(description="Year founded")
    address: Address = Field(description="Company headquarters")
    employees: List[str] = Field(description="List of employee names")

Model with Optional Fields

from typing import Optional

class Product(BaseModel):
    name: str = Field(description="Product name")
    price: float = Field(description="Price in USD", gt=0)
    description: Optional[str] = Field(description="Product description", default=None)
    in_stock: bool = Field(description="Availability status", default=True)

Model with Enums

from enum import Enum

class Priority(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    URGENT = "urgent"

class Task(BaseModel):
    title: str = Field(description="Task title")
    priority: Priority = Field(description="Task priority level")
    completed: bool = Field(description="Completion status", default=False)

Pro Tip

Use detailed Field(description=...) values. The LLM uses these descriptions to understand what data to generate, making your results more accurate.

Configuration Options

Customize the structured output generation behavior:

result = generate_pydantic_json_model(
    llm_instance=llm,
    prompt=prompt,
    model_class=YourModel,
    max_retries=3,              # Number of retry attempts on validation failure
    verbose=True                # Print detailed logs
)

Parameter Reference

llm_instance (required)

The LLM instance to use for generation

prompt (required)

The prompt describing what data to generate

model_class (required)

The Pydantic model class defining the output structure

max_retries (optional, default: 3)

Maximum retry attempts if validation fails

verbose (optional, default: False)

Enable detailed logging of generation and validation

Using with ReliableLLM

Combine structured output with automatic failover for maximum reliability:

from SimplerLLM.language.llm import LLM, LLMProvider
from SimplerLLM.language.llm.reliable import ReliableLLM
from SimplerLLM.language.llm_addons import generate_pydantic_json_model_reliable
from pydantic import BaseModel, Field

# Define your model
class ArticleSummary(BaseModel):
    title: str = Field(description="Article title")
    summary: str = Field(description="Brief summary in 2-3 sentences")
    key_points: list[str] = Field(description="List of 3-5 key points")
    sentiment: str = Field(description="Overall sentiment: positive, negative, or neutral")

# Create ReliableLLM
primary_llm = LLM.create(provider=LLMProvider.OPENAI, model_name="gpt-4o")
secondary_llm = LLM.create(provider=LLMProvider.ANTHROPIC, model_name="claude-3-5-sonnet-20241022")
reliable_llm = ReliableLLM(primary_llm, secondary_llm)

# Generate with automatic failover
prompt = """
Analyze this article about renewable energy:
[Your article text here...]
"""

summary, provider, model_name = generate_pydantic_json_model_reliable(
    reliable_llm=reliable_llm,
    prompt=prompt,
    model_class=ArticleSummary
)

print(f"Generated by: {provider.name} using {model_name}")
print(f"Title: {summary.title}")
print(f"Summary: {summary.summary}")
print(f"Key Points: {', '.join(summary.key_points)}")
print(f"Sentiment: {summary.sentiment}")

Why Use ReliableLLM?

If the primary provider fails or returns invalid JSON, it automatically tries the secondary provider. This ensures your structured output generation is highly reliable.

Real-World Examples

Example 1: Extract Product Information

from pydantic import BaseModel, Field
from typing import List

class ProductInfo(BaseModel):
    name: str = Field(description="Product name")
    category: str = Field(description="Product category")
    price: float = Field(description="Price in USD")
    features: List[str] = Field(description="List of key features")
    pros: List[str] = Field(description="Advantages")
    cons: List[str] = Field(description="Disadvantages")

llm = LLM.create(provider=LLMProvider.OPENAI, model_name="gpt-4o")

prompt = """
Extract product information from this description:
"The iPhone 15 Pro is a premium smartphone featuring a titanium design,
A17 Pro chip, advanced camera system with 5x optical zoom, and USB-C charging.
Priced at $999, it offers excellent performance and camera quality but has
limited battery life and high cost."
"""

product = generate_pydantic_json_model(
    llm_instance=llm,
    prompt=prompt,
    model_class=ProductInfo
)

print(f"Product: {product.name}")
print(f"Category: {product.category}")
print(f"Price: ${product.price}")
print(f"Features: {', '.join(product.features)}")
print(f"Pros: {', '.join(product.pros)}")
print(f"Cons: {', '.join(product.cons)}")

Example 2: Meeting Notes Extraction

from datetime import datetime
from typing import List

class ActionItem(BaseModel):
    task: str = Field(description="Task description")
    assignee: str = Field(description="Person responsible")
    deadline: str = Field(description="Deadline date")

class MeetingNotes(BaseModel):
    meeting_title: str = Field(description="Meeting title")
    date: str = Field(description="Meeting date")
    attendees: List[str] = Field(description="List of attendees")
    summary: str = Field(description="Brief meeting summary")
    decisions: List[str] = Field(description="Key decisions made")
    action_items: List[ActionItem] = Field(description="Action items from meeting")

llm = LLM.create(provider=LLMProvider.ANTHROPIC, model_name="claude-3-5-sonnet-20241022")

prompt = """
Extract structured information from this meeting transcript:
[Meeting Title: Q4 Product Planning]
[Date: January 15, 2024]
[Attendees: Sarah (Product Manager), John (Engineer), Mike (Designer)]

We discussed the Q4 roadmap. Sarah decided to prioritize the mobile app redesign.
John will implement the new API by March 1st. Mike agreed to provide initial
mockups by February 15th. We also decided to delay the analytics feature to Q1 2025.
"""

notes = generate_pydantic_json_model(
    llm_instance=llm,
    prompt=prompt,
    model_class=MeetingNotes
)

print(f"Meeting: {notes.meeting_title}")
print(f"Date: {notes.date}")
print(f"Attendees: {', '.join(notes.attendees)}")
print(f"\nSummary: {notes.summary}")
print(f"\nDecisions:")
for decision in notes.decisions:
    print(f"  - {decision}")
print(f"\nAction Items:")
for item in notes.action_items:
    print(f"  - {item.task} ({item.assignee} by {item.deadline})")

Example 3: Content Classification

from enum import Enum
from typing import List

class Category(str, Enum):
    TECHNOLOGY = "technology"
    BUSINESS = "business"
    SCIENCE = "science"
    HEALTH = "health"
    ENTERTAINMENT = "entertainment"

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

class ContentAnalysis(BaseModel):
    category: Category = Field(description="Primary content category")
    sentiment: Sentiment = Field(description="Overall sentiment")
    topics: List[str] = Field(description="Main topics discussed (3-5)")
    summary: str = Field(description="One sentence summary")
    confidence: float = Field(description="Classification confidence 0-1", ge=0, le=1)

llm = LLM.create(provider=LLMProvider.GEMINI, model_name="gemini-1.5-pro")

prompt = """
Analyze this article:
"Researchers at MIT have developed a new AI model that can predict protein structures
with 95% accuracy. This breakthrough could accelerate drug discovery and our understanding
of diseases. The team is optimistic about clinical applications within the next five years."
"""

analysis = generate_pydantic_json_model(
    llm_instance=llm,
    prompt=prompt,
    model_class=ContentAnalysis
)

print(f"Category: {analysis.category.value}")
print(f"Sentiment: {analysis.sentiment.value}")
print(f"Topics: {', '.join(analysis.topics)}")
print(f"Summary: {analysis.summary}")
print(f"Confidence: {analysis.confidence:.2%}")

Error Handling

Handle validation errors and edge cases properly:

from pydantic import BaseModel, Field, ValidationError
from SimplerLLM.language.llm import LLM, LLMProvider
from SimplerLLM.language.llm_addons import generate_pydantic_json_model

class DataModel(BaseModel):
    field1: str = Field(description="Description")
    field2: int = Field(description="Number")

llm = LLM.create(provider=LLMProvider.OPENAI, model_name="gpt-4o")

try:
    result = generate_pydantic_json_model(
        llm_instance=llm,
        prompt="Your prompt",
        model_class=DataModel,
        max_retries=3
    )

    # Check if result is the model or an error string
    if isinstance(result, DataModel):
        print(f"Success: {result.field1}, {result.field2}")
    else:
        # Result is an error message string
        print(f"Failed after retries: {result}")

except ValidationError as e:
    print(f"Validation error: {e}")

except Exception as e:
    print(f"Error: {e}")

Common Error Scenarios

  • Invalid JSON: LLM returns malformed JSON - automatically retried
  • Validation Failure: Data doesn't match schema - automatically retried with error feedback
  • Type Mismatch: Wrong data types - Pydantic attempts coercion or fails validation
  • Missing Required Fields: LLM doesn't provide all required fields - retried with specific feedback

Best Practices

1. Write Descriptive Field Descriptions

The LLM uses field descriptions to understand what to generate:

# Good
name: str = Field(description="Person's full name including first and last name")

# Bad
name: str = Field(description="Name")

2. Use Validation Constraints

Add validation rules to ensure data quality:

age: int = Field(description="Age in years", ge=0, le=150)
price: float = Field(description="Price in USD", gt=0)
email: str = Field(description="Email", pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$")

3. Start Simple, Then Add Complexity

Test with simple models first, then add nested objects and lists

4. Use Enums for Fixed Choices

Enums ensure LLMs select from predefined options, improving accuracy

5. Provide Clear Prompts

Include context and examples in your prompts for better results

6. Test with Multiple Providers

Different LLMs have varying levels of JSON generation capability

What's Next?

Need More Help?

Check out our full documentation, join the Discord community, or browse example code on GitHub.