Async Support
SimplerLLM provides full asynchronous support for non-blocking LLM operations, enabling concurrent requests and improved performance in async applications.
Why Use Async?
Asynchronous programming allows your application to handle multiple LLM requests concurrently without blocking. This is essential for building responsive, high-performance applications that interact with LLMs.
Key Benefits
- Concurrent Requests: Process multiple LLM requests simultaneously
- Better Performance: Reduce total wait time when making multiple requests
- Scalability: Handle more requests with the same resources
- Responsive Applications: Don't block while waiting for LLM responses
- Resource Efficiency: Better CPU and network utilization
Performance Comparison
See the dramatic performance difference between synchronous and asynchronous requests:
Synchronous (Sequential)
5 requests × 2 seconds each = 10 seconds total
Asynchronous (Concurrent)
5 requests in parallel = ~2 seconds total
Basic Async Usage
SimplerLLM provides async versions of all major functions:
import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider
async def main():
# Create LLM instance
llm = LLM.create(
provider=LLMProvider.OPENAI,
model_name="gpt-4o"
)
# Generate response asynchronously
response = await llm.generate_response_async(
prompt="Explain quantum computing in simple terms"
)
print(response)
# Run the async function
asyncio.run(main())
Key Differences
- Use
async def
to define async functions - Use
await
before async function calls - Use
asyncio.run()
to run async functions - Async methods end with
_async
suffix
Concurrent Requests
Process multiple LLM requests concurrently for dramatic performance improvements:
Using asyncio.gather
import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider
async def main():
llm = LLM.create(
provider=LLMProvider.OPENAI,
model_name="gpt-4o"
)
# Define multiple prompts
prompts = [
"Explain machine learning",
"What is cloud computing?",
"Describe blockchain technology",
"What are microservices?",
"Explain neural networks"
]
# Create async tasks for all prompts
tasks = [
llm.generate_response_async(prompt=prompt)
for prompt in prompts
]
# Execute all tasks concurrently
responses = await asyncio.gather(*tasks)
# Display results
for prompt, response in zip(prompts, responses):
print(f"\nPrompt: {prompt}")
print(f"Response: {response[:100]}...")
asyncio.run(main())
Performance Gain
If each request takes 2 seconds, 5 synchronous requests take 10 seconds total. With async, all 5 requests complete in approximately 2 seconds - a 5x speedup!
Using asyncio.create_task
import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider
async def generate_response(llm, prompt, name):
"""Generate response and print with identifier"""
print(f"Starting: {name}")
response = await llm.generate_response_async(prompt=prompt)
print(f"Completed: {name}")
return response
async def main():
llm = LLM.create(
provider=LLMProvider.ANTHROPIC,
model_name="claude-3-5-sonnet-20241022"
)
# Create tasks
task1 = asyncio.create_task(
generate_response(llm, "Explain AI", "Task 1")
)
task2 = asyncio.create_task(
generate_response(llm, "Explain ML", "Task 2")
)
task3 = asyncio.create_task(
generate_response(llm, "Explain DL", "Task 3")
)
# Wait for all tasks to complete
responses = await asyncio.gather(task1, task2, task3)
for i, response in enumerate(responses, 1):
print(f"\nTask {i} response: {response[:100]}...")
asyncio.run(main())
Async Structured Output
Generate validated JSON responses asynchronously:
import asyncio
from pydantic import BaseModel, Field
from SimplerLLM.language.llm import LLM, LLMProvider
from SimplerLLM.language.llm_addons import generate_pydantic_json_model_async
class MovieRecommendation(BaseModel):
title: str = Field(description="Movie title")
genre: str = Field(description="Movie genre")
year: int = Field(description="Release year")
rating: float = Field(description="Rating out of 10")
reason: str = Field(description="Why recommended")
async def main():
llm = LLM.create(
provider=LLMProvider.OPENAI,
model_name="gpt-4o"
)
# Generate structured output asynchronously
recommendation = await generate_pydantic_json_model_async(
llm_instance=llm,
prompt="Recommend a great sci-fi movie from 2020s",
model_class=MovieRecommendation
)
print(f"Title: {recommendation.title}")
print(f"Genre: {recommendation.genre}")
print(f"Year: {recommendation.year}")
print(f"Rating: {recommendation.rating}")
print(f"Reason: {recommendation.reason}")
asyncio.run(main())
Concurrent Structured Output
import asyncio
from pydantic import BaseModel, Field
from SimplerLLM.language.llm import LLM, LLMProvider
from SimplerLLM.language.llm_addons import generate_pydantic_json_model_async
class ProductAnalysis(BaseModel):
product: str = Field(description="Product name")
category: str = Field(description="Product category")
sentiment: str = Field(description="Sentiment: positive/negative/neutral")
score: float = Field(description="Score 0-10")
async def analyze_review(llm, review_text):
"""Analyze a single review"""
return await generate_pydantic_json_model_async(
llm_instance=llm,
prompt=f"Analyze this product review: {review_text}",
model_class=ProductAnalysis
)
async def main():
llm = LLM.create(
provider=LLMProvider.OPENAI,
model_name="gpt-4o"
)
reviews = [
"iPhone 15 Pro is amazing! Best phone I've ever owned.",
"Samsung Galaxy S24 has great camera but battery life is poor.",
"MacBook Pro M3 is incredibly fast for video editing."
]
# Analyze all reviews concurrently
tasks = [analyze_review(llm, review) for review in reviews]
analyses = await asyncio.gather(*tasks)
# Display results
for analysis in analyses:
print(f"\nProduct: {analysis.product}")
print(f"Category: {analysis.category}")
print(f"Sentiment: {analysis.sentiment}")
print(f"Score: {analysis.score}/10")
asyncio.run(main())
Error Handling in Async
Properly handle errors in async operations:
Basic Error Handling
import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider
async def safe_generate(llm, prompt):
"""Generate response with error handling"""
try:
response = await llm.generate_response_async(prompt=prompt)
return {"success": True, "response": response}
except Exception as e:
return {"success": False, "error": str(e)}
async def main():
llm = LLM.create(
provider=LLMProvider.OPENAI,
model_name="gpt-4o"
)
prompts = ["Explain AI", "Explain ML", "Explain DL"]
tasks = [safe_generate(llm, prompt) for prompt in prompts]
results = await asyncio.gather(*tasks)
for i, result in enumerate(results):
if result["success"]:
print(f"Prompt {i+1} succeeded: {result['response'][:50]}...")
else:
print(f"Prompt {i+1} failed: {result['error']}")
asyncio.run(main())
Handling Partial Failures
import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider
async def main():
llm = LLM.create(
provider=LLMProvider.OPENAI,
model_name="gpt-4o"
)
prompts = ["Explain AI", "Explain ML", "Explain DL"]
tasks = [
llm.generate_response_async(prompt=prompt)
for prompt in prompts
]
# return_exceptions=True prevents one failure from stopping others
results = await asyncio.gather(*tasks, return_exceptions=True)
for i, result in enumerate(results):
if isinstance(result, Exception):
print(f"Prompt {i+1} failed: {result}")
else:
print(f"Prompt {i+1} succeeded: {result[:50]}...")
asyncio.run(main())
Rate Limiting and Throttling
Control concurrency to avoid rate limits:
import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider
async def process_with_semaphore(llm, prompt, semaphore):
"""Process request with concurrency limit"""
async with semaphore:
return await llm.generate_response_async(prompt=prompt)
async def main():
llm = LLM.create(
provider=LLMProvider.OPENAI,
model_name="gpt-4o"
)
# Limit to 5 concurrent requests
semaphore = asyncio.Semaphore(5)
prompts = [f"Explain topic {i}" for i in range(20)]
tasks = [
process_with_semaphore(llm, prompt, semaphore)
for prompt in prompts
]
# Only 5 requests will run concurrently at a time
responses = await asyncio.gather(*tasks)
print(f"Processed {len(responses)} prompts with max 5 concurrent requests")
asyncio.run(main())
Pro Tip
Use asyncio.Semaphore
to control concurrency and avoid hitting API rate limits. Adjust the limit based on your API tier and provider limits.
Real-World Examples
Example 1: Batch Content Generation
import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider
async def generate_blog_post(llm, topic):
"""Generate a blog post for a topic"""
prompt = f"Write a 200-word blog post about {topic}"
return await llm.generate_response_async(prompt=prompt)
async def main():
llm = LLM.create(
provider=LLMProvider.ANTHROPIC,
model_name="claude-3-5-sonnet-20241022"
)
topics = [
"Machine Learning Basics",
"Cloud Computing Trends",
"Cybersecurity Best Practices",
"AI in Healthcare",
"Future of Remote Work"
]
print("Generating blog posts...")
tasks = [generate_blog_post(llm, topic) for topic in topics]
posts = await asyncio.gather(*tasks)
for topic, post in zip(topics, posts):
print(f"\n{'='*60}")
print(f"Topic: {topic}")
print(f"{'='*60}")
print(post)
asyncio.run(main())
Example 2: Multi-Language Translation
import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider
async def translate_text(llm, text, target_language):
"""Translate text to target language"""
prompt = f"Translate this to {target_language}: '{text}'"
return await llm.generate_response_async(prompt=prompt)
async def main():
llm = LLM.create(
provider=LLMProvider.GEMINI,
model_name="gemini-1.5-pro"
)
text = "Hello, how are you today?"
languages = ["Spanish", "French", "German", "Italian", "Portuguese"]
print(f"Original: {text}\n")
# Translate to all languages concurrently
tasks = [translate_text(llm, text, lang) for lang in languages]
translations = await asyncio.gather(*tasks)
for language, translation in zip(languages, translations):
print(f"{language}: {translation}")
asyncio.run(main())
Example 3: Data Processing Pipeline
import asyncio
from pydantic import BaseModel, Field
from SimplerLLM.language.llm import LLM, LLMProvider
from SimplerLLM.language.llm_addons import generate_pydantic_json_model_async
class CustomerFeedback(BaseModel):
customer_name: str = Field(description="Customer name")
sentiment: str = Field(description="positive/negative/neutral")
category: str = Field(description="feedback category")
priority: str = Field(description="low/medium/high")
summary: str = Field(description="brief summary")
async def process_feedback(llm, feedback_text):
"""Process customer feedback"""
return await generate_pydantic_json_model_async(
llm_instance=llm,
prompt=f"Analyze this customer feedback: {feedback_text}",
model_class=CustomerFeedback
)
async def main():
llm = LLM.create(
provider=LLMProvider.OPENAI,
model_name="gpt-4o"
)
feedback_list = [
"John Smith: Your product is amazing! Best purchase this year.",
"Jane Doe: Delivery was late and item was damaged. Very disappointed.",
"Bob Wilson: Good quality but price is a bit high for what you get.",
"Alice Brown: Customer service was excellent and resolved my issue quickly."
]
print("Processing customer feedback concurrently...\n")
# Process all feedback concurrently
tasks = [process_feedback(llm, feedback) for feedback in feedback_list]
results = await asyncio.gather(*tasks)
# Display results
for result in results:
print(f"Customer: {result.customer_name}")
print(f"Sentiment: {result.sentiment}")
print(f"Category: {result.category}")
print(f"Priority: {result.priority}")
print(f"Summary: {result.summary}")
print("-" * 60)
asyncio.run(main())
Integration with Web Frameworks
Use async SimplerLLM with async web frameworks:
FastAPI Example
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from SimplerLLM.language.llm import LLM, LLMProvider
app = FastAPI()
# Initialize LLM (do this once at startup)
llm = LLM.create(
provider=LLMProvider.OPENAI,
model_name="gpt-4o"
)
class PromptRequest(BaseModel):
prompt: str
class PromptResponse(BaseModel):
response: str
@app.post("/generate", response_model=PromptResponse)
async def generate_response(request: PromptRequest):
"""Generate LLM response endpoint"""
try:
response = await llm.generate_response_async(
prompt=request.prompt
)
return PromptResponse(response=response)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# Run with: uvicorn app:app --reload
Django Async View
from django.http import JsonResponse
from SimplerLLM.language.llm import LLM, LLMProvider
import json
# Initialize LLM
llm = LLM.create(
provider=LLMProvider.OPENAI,
model_name="gpt-4o"
)
async def generate_view(request):
"""Async Django view"""
if request.method == 'POST':
try:
data = json.loads(request.body)
prompt = data.get('prompt')
response = await llm.generate_response_async(
prompt=prompt
)
return JsonResponse({
'success': True,
'response': response
})
except Exception as e:
return JsonResponse({
'success': False,
'error': str(e)
}, status=500)
return JsonResponse({'error': 'Method not allowed'}, status=405)
Best Practices
1. Use Semaphores for Rate Limiting
Prevent hitting API rate limits by controlling concurrent requests
2. Handle Exceptions Gracefully
Use return_exceptions=True
in asyncio.gather
to handle partial failures
3. Monitor Performance
Track response times and adjust concurrency levels accordingly
4. Use Connection Pooling
Reuse LLM instances across requests to reduce overhead
5. Test with Small Batches First
Start with small batch sizes and scale up gradually
6. Implement Timeout Mechanisms
Use asyncio.wait_for
to set timeouts for async operations
What's Next?
← LLM Interface
Learn about the unified LLM interface
← Reliable LLM
Add automatic failover between providers
← Structured Output
Generate validated JSON responses
Quick Start Guide →
Step-by-step tutorial for beginners
Need More Help?
Check out our full documentation, join the Discord community, or browse example code on GitHub.