Async Support

SimplerLLM provides full asynchronous support for non-blocking LLM operations, enabling concurrent requests and improved performance in async applications.

Why Use Async?

Asynchronous programming allows your application to handle multiple LLM requests concurrently without blocking. This is essential for building responsive, high-performance applications that interact with LLMs.

Key Benefits

  • Concurrent Requests: Process multiple LLM requests simultaneously
  • Better Performance: Reduce total wait time when making multiple requests
  • Scalability: Handle more requests with the same resources
  • Responsive Applications: Don't block while waiting for LLM responses
  • Resource Efficiency: Better CPU and network utilization

Performance Comparison

See the dramatic performance difference between synchronous and asynchronous requests:

Synchronous (Sequential)

5 requests × 2 seconds each = 10 seconds total

Asynchronous (Concurrent)

5 requests in parallel = ~2 seconds total

Basic Async Usage

SimplerLLM provides async versions of all major functions:

import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider

async def main():
    # Create LLM instance
    llm = LLM.create(
        provider=LLMProvider.OPENAI,
        model_name="gpt-4o"
    )

    # Generate response asynchronously
    response = await llm.generate_response_async(
        prompt="Explain quantum computing in simple terms"
    )

    print(response)

# Run the async function
asyncio.run(main())

Key Differences

  • Use async def to define async functions
  • Use await before async function calls
  • Use asyncio.run() to run async functions
  • Async methods end with _async suffix

Concurrent Requests

Process multiple LLM requests concurrently for dramatic performance improvements:

Using asyncio.gather

import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider

async def main():
    llm = LLM.create(
        provider=LLMProvider.OPENAI,
        model_name="gpt-4o"
    )

    # Define multiple prompts
    prompts = [
        "Explain machine learning",
        "What is cloud computing?",
        "Describe blockchain technology",
        "What are microservices?",
        "Explain neural networks"
    ]

    # Create async tasks for all prompts
    tasks = [
        llm.generate_response_async(prompt=prompt)
        for prompt in prompts
    ]

    # Execute all tasks concurrently
    responses = await asyncio.gather(*tasks)

    # Display results
    for prompt, response in zip(prompts, responses):
        print(f"\nPrompt: {prompt}")
        print(f"Response: {response[:100]}...")

asyncio.run(main())

Performance Gain

If each request takes 2 seconds, 5 synchronous requests take 10 seconds total. With async, all 5 requests complete in approximately 2 seconds - a 5x speedup!

Using asyncio.create_task

import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider

async def generate_response(llm, prompt, name):
    """Generate response and print with identifier"""
    print(f"Starting: {name}")
    response = await llm.generate_response_async(prompt=prompt)
    print(f"Completed: {name}")
    return response

async def main():
    llm = LLM.create(
        provider=LLMProvider.ANTHROPIC,
        model_name="claude-3-5-sonnet-20241022"
    )

    # Create tasks
    task1 = asyncio.create_task(
        generate_response(llm, "Explain AI", "Task 1")
    )
    task2 = asyncio.create_task(
        generate_response(llm, "Explain ML", "Task 2")
    )
    task3 = asyncio.create_task(
        generate_response(llm, "Explain DL", "Task 3")
    )

    # Wait for all tasks to complete
    responses = await asyncio.gather(task1, task2, task3)

    for i, response in enumerate(responses, 1):
        print(f"\nTask {i} response: {response[:100]}...")

asyncio.run(main())

Async Structured Output

Generate validated JSON responses asynchronously:

import asyncio
from pydantic import BaseModel, Field
from SimplerLLM.language.llm import LLM, LLMProvider
from SimplerLLM.language.llm_addons import generate_pydantic_json_model_async

class MovieRecommendation(BaseModel):
    title: str = Field(description="Movie title")
    genre: str = Field(description="Movie genre")
    year: int = Field(description="Release year")
    rating: float = Field(description="Rating out of 10")
    reason: str = Field(description="Why recommended")

async def main():
    llm = LLM.create(
        provider=LLMProvider.OPENAI,
        model_name="gpt-4o"
    )

    # Generate structured output asynchronously
    recommendation = await generate_pydantic_json_model_async(
        llm_instance=llm,
        prompt="Recommend a great sci-fi movie from 2020s",
        model_class=MovieRecommendation
    )

    print(f"Title: {recommendation.title}")
    print(f"Genre: {recommendation.genre}")
    print(f"Year: {recommendation.year}")
    print(f"Rating: {recommendation.rating}")
    print(f"Reason: {recommendation.reason}")

asyncio.run(main())

Concurrent Structured Output

import asyncio
from pydantic import BaseModel, Field
from SimplerLLM.language.llm import LLM, LLMProvider
from SimplerLLM.language.llm_addons import generate_pydantic_json_model_async

class ProductAnalysis(BaseModel):
    product: str = Field(description="Product name")
    category: str = Field(description="Product category")
    sentiment: str = Field(description="Sentiment: positive/negative/neutral")
    score: float = Field(description="Score 0-10")

async def analyze_review(llm, review_text):
    """Analyze a single review"""
    return await generate_pydantic_json_model_async(
        llm_instance=llm,
        prompt=f"Analyze this product review: {review_text}",
        model_class=ProductAnalysis
    )

async def main():
    llm = LLM.create(
        provider=LLMProvider.OPENAI,
        model_name="gpt-4o"
    )

    reviews = [
        "iPhone 15 Pro is amazing! Best phone I've ever owned.",
        "Samsung Galaxy S24 has great camera but battery life is poor.",
        "MacBook Pro M3 is incredibly fast for video editing."
    ]

    # Analyze all reviews concurrently
    tasks = [analyze_review(llm, review) for review in reviews]
    analyses = await asyncio.gather(*tasks)

    # Display results
    for analysis in analyses:
        print(f"\nProduct: {analysis.product}")
        print(f"Category: {analysis.category}")
        print(f"Sentiment: {analysis.sentiment}")
        print(f"Score: {analysis.score}/10")

asyncio.run(main())

Error Handling in Async

Properly handle errors in async operations:

Basic Error Handling

import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider

async def safe_generate(llm, prompt):
    """Generate response with error handling"""
    try:
        response = await llm.generate_response_async(prompt=prompt)
        return {"success": True, "response": response}
    except Exception as e:
        return {"success": False, "error": str(e)}

async def main():
    llm = LLM.create(
        provider=LLMProvider.OPENAI,
        model_name="gpt-4o"
    )

    prompts = ["Explain AI", "Explain ML", "Explain DL"]

    tasks = [safe_generate(llm, prompt) for prompt in prompts]
    results = await asyncio.gather(*tasks)

    for i, result in enumerate(results):
        if result["success"]:
            print(f"Prompt {i+1} succeeded: {result['response'][:50]}...")
        else:
            print(f"Prompt {i+1} failed: {result['error']}")

asyncio.run(main())

Handling Partial Failures

import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider

async def main():
    llm = LLM.create(
        provider=LLMProvider.OPENAI,
        model_name="gpt-4o"
    )

    prompts = ["Explain AI", "Explain ML", "Explain DL"]

    tasks = [
        llm.generate_response_async(prompt=prompt)
        for prompt in prompts
    ]

    # return_exceptions=True prevents one failure from stopping others
    results = await asyncio.gather(*tasks, return_exceptions=True)

    for i, result in enumerate(results):
        if isinstance(result, Exception):
            print(f"Prompt {i+1} failed: {result}")
        else:
            print(f"Prompt {i+1} succeeded: {result[:50]}...")

asyncio.run(main())

Rate Limiting and Throttling

Control concurrency to avoid rate limits:

import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider

async def process_with_semaphore(llm, prompt, semaphore):
    """Process request with concurrency limit"""
    async with semaphore:
        return await llm.generate_response_async(prompt=prompt)

async def main():
    llm = LLM.create(
        provider=LLMProvider.OPENAI,
        model_name="gpt-4o"
    )

    # Limit to 5 concurrent requests
    semaphore = asyncio.Semaphore(5)

    prompts = [f"Explain topic {i}" for i in range(20)]

    tasks = [
        process_with_semaphore(llm, prompt, semaphore)
        for prompt in prompts
    ]

    # Only 5 requests will run concurrently at a time
    responses = await asyncio.gather(*tasks)

    print(f"Processed {len(responses)} prompts with max 5 concurrent requests")

asyncio.run(main())

Pro Tip

Use asyncio.Semaphore to control concurrency and avoid hitting API rate limits. Adjust the limit based on your API tier and provider limits.

Real-World Examples

Example 1: Batch Content Generation

import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider

async def generate_blog_post(llm, topic):
    """Generate a blog post for a topic"""
    prompt = f"Write a 200-word blog post about {topic}"
    return await llm.generate_response_async(prompt=prompt)

async def main():
    llm = LLM.create(
        provider=LLMProvider.ANTHROPIC,
        model_name="claude-3-5-sonnet-20241022"
    )

    topics = [
        "Machine Learning Basics",
        "Cloud Computing Trends",
        "Cybersecurity Best Practices",
        "AI in Healthcare",
        "Future of Remote Work"
    ]

    print("Generating blog posts...")
    tasks = [generate_blog_post(llm, topic) for topic in topics]
    posts = await asyncio.gather(*tasks)

    for topic, post in zip(topics, posts):
        print(f"\n{'='*60}")
        print(f"Topic: {topic}")
        print(f"{'='*60}")
        print(post)

asyncio.run(main())

Example 2: Multi-Language Translation

import asyncio
from SimplerLLM.language.llm import LLM, LLMProvider

async def translate_text(llm, text, target_language):
    """Translate text to target language"""
    prompt = f"Translate this to {target_language}: '{text}'"
    return await llm.generate_response_async(prompt=prompt)

async def main():
    llm = LLM.create(
        provider=LLMProvider.GEMINI,
        model_name="gemini-1.5-pro"
    )

    text = "Hello, how are you today?"
    languages = ["Spanish", "French", "German", "Italian", "Portuguese"]

    print(f"Original: {text}\n")

    # Translate to all languages concurrently
    tasks = [translate_text(llm, text, lang) for lang in languages]
    translations = await asyncio.gather(*tasks)

    for language, translation in zip(languages, translations):
        print(f"{language}: {translation}")

asyncio.run(main())

Example 3: Data Processing Pipeline

import asyncio
from pydantic import BaseModel, Field
from SimplerLLM.language.llm import LLM, LLMProvider
from SimplerLLM.language.llm_addons import generate_pydantic_json_model_async

class CustomerFeedback(BaseModel):
    customer_name: str = Field(description="Customer name")
    sentiment: str = Field(description="positive/negative/neutral")
    category: str = Field(description="feedback category")
    priority: str = Field(description="low/medium/high")
    summary: str = Field(description="brief summary")

async def process_feedback(llm, feedback_text):
    """Process customer feedback"""
    return await generate_pydantic_json_model_async(
        llm_instance=llm,
        prompt=f"Analyze this customer feedback: {feedback_text}",
        model_class=CustomerFeedback
    )

async def main():
    llm = LLM.create(
        provider=LLMProvider.OPENAI,
        model_name="gpt-4o"
    )

    feedback_list = [
        "John Smith: Your product is amazing! Best purchase this year.",
        "Jane Doe: Delivery was late and item was damaged. Very disappointed.",
        "Bob Wilson: Good quality but price is a bit high for what you get.",
        "Alice Brown: Customer service was excellent and resolved my issue quickly."
    ]

    print("Processing customer feedback concurrently...\n")

    # Process all feedback concurrently
    tasks = [process_feedback(llm, feedback) for feedback in feedback_list]
    results = await asyncio.gather(*tasks)

    # Display results
    for result in results:
        print(f"Customer: {result.customer_name}")
        print(f"Sentiment: {result.sentiment}")
        print(f"Category: {result.category}")
        print(f"Priority: {result.priority}")
        print(f"Summary: {result.summary}")
        print("-" * 60)

asyncio.run(main())

Integration with Web Frameworks

Use async SimplerLLM with async web frameworks:

FastAPI Example

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from SimplerLLM.language.llm import LLM, LLMProvider

app = FastAPI()

# Initialize LLM (do this once at startup)
llm = LLM.create(
    provider=LLMProvider.OPENAI,
    model_name="gpt-4o"
)

class PromptRequest(BaseModel):
    prompt: str

class PromptResponse(BaseModel):
    response: str

@app.post("/generate", response_model=PromptResponse)
async def generate_response(request: PromptRequest):
    """Generate LLM response endpoint"""
    try:
        response = await llm.generate_response_async(
            prompt=request.prompt
        )
        return PromptResponse(response=response)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# Run with: uvicorn app:app --reload

Django Async View

from django.http import JsonResponse
from SimplerLLM.language.llm import LLM, LLMProvider
import json

# Initialize LLM
llm = LLM.create(
    provider=LLMProvider.OPENAI,
    model_name="gpt-4o"
)

async def generate_view(request):
    """Async Django view"""
    if request.method == 'POST':
        try:
            data = json.loads(request.body)
            prompt = data.get('prompt')

            response = await llm.generate_response_async(
                prompt=prompt
            )

            return JsonResponse({
                'success': True,
                'response': response
            })

        except Exception as e:
            return JsonResponse({
                'success': False,
                'error': str(e)
            }, status=500)

    return JsonResponse({'error': 'Method not allowed'}, status=405)

Best Practices

1. Use Semaphores for Rate Limiting

Prevent hitting API rate limits by controlling concurrent requests

2. Handle Exceptions Gracefully

Use return_exceptions=True in asyncio.gather to handle partial failures

3. Monitor Performance

Track response times and adjust concurrency levels accordingly

4. Use Connection Pooling

Reuse LLM instances across requests to reduce overhead

5. Test with Small Batches First

Start with small batch sizes and scale up gradually

6. Implement Timeout Mechanisms

Use asyncio.wait_for to set timeouts for async operations

What's Next?

Need More Help?

Check out our full documentation, join the Discord community, or browse example code on GitHub.