Ollama Provider

Run LLM models locally on your own hardware using Ollama with SimplerLLM's unified interface.

Overview

Ollama lets you run large language models locally on your own machine. SimplerLLM provides seamless integration with Ollama, using the same unified interface as cloud providers but without API costs or internet dependency.

Benefits of Local Models

  • Privacy: Your data never leaves your machine
  • No API Costs: Run models for free after initial setup
  • Offline Usage: No internet connection required
  • Full Control: Choose exact model versions and configurations

Setup and Installation

1. Install Ollama

Visit ollama.com and download the installer for your operating system:

  • macOS: Download .dmg installer
  • Linux: Run curl -fsSL https://ollama.com/install.sh | sh
  • Windows: Download Windows installer

2. Pull a Model

Download a model to your local machine:

# Pull Llama 2
ollama pull llama2

# Pull Mistral
ollama pull mistral

# Pull CodeLlama
ollama pull codellama

# View available models at https://ollama.com/library

3. Start Ollama Server

Ollama runs as a local server. Start it with:

ollama serve

On macOS and Windows, Ollama starts automatically after installation. On Linux, you may need to start it manually.

Using Ollama Models

Model Examples

# Llama 2 (example)
model_name="llama2"

# Mistral (example)
model_name="mistral"

# CodeLlama (example)
model_name="codellama"

# Phi (example)
model_name="phi"

# Any model from Ollama library

Finding Available Models

Browse all available models at Ollama's model library. Models include Llama, Mistral, Phi, Gemma, and many more.

Basic Usage

from SimplerLLM.language.llm import LLM, LLMProvider

# Create Ollama LLM instance
# No API key needed - connects to local Ollama server
llm = LLM.create(
    provider=LLMProvider.OLLAMA,
    model_name="llama2"
)

# Generate a response
response = llm.generate_response(
    prompt="Explain machine learning in simple terms"
)

print(response)

With Custom Parameters

llm = LLM.create(
    provider=LLMProvider.OLLAMA,
    model_name="mistral",  # Use any Ollama model name
    temperature=0.7,
    max_tokens=1000
)

response = llm.generate_response(
    prompt="Write a Python function to sort a list"
)

print(response)

Custom Ollama Host

If Ollama is running on a different host or port:

llm = LLM.create(
    provider=LLMProvider.OLLAMA,
    model_name="llama2",
    base_url="http://localhost:11434"  # Default Ollama URL
)

# Or connect to remote Ollama instance
llm = LLM.create(
    provider=LLMProvider.OLLAMA,
    model_name="llama2",
    base_url="http://192.168.1.100:11434"
)

Advanced Features

Structured JSON Output

from pydantic import BaseModel, Field
from SimplerLLM.language.llm_addons import generate_pydantic_json_model

class CodeReview(BaseModel):
    language: str = Field(description="Programming language")
    issues: list[str] = Field(description="Code issues found")
    suggestions: list[str] = Field(description="Improvement suggestions")

llm = LLM.create(provider=LLMProvider.OLLAMA, model_name="codellama")

result = generate_pydantic_json_model(
    llm_instance=llm,
    prompt="Review this code: def add(a, b): return a + b",
    model_class=CodeReview
)

print(f"Language: {result.language}")

Model Management

Common Ollama Commands

# List downloaded models
ollama list

# Pull/download a model
ollama pull llama2

# Remove a model
ollama rm llama2

# Show model information
ollama show llama2

# Run Ollama interactively (for testing)
ollama run llama2

Performance Considerations

Hardware Requirements

  • 8GB+ RAM recommended for 7B models
  • 16GB+ RAM for 13B models
  • 32GB+ RAM for 30B+ models
  • GPU support (CUDA/Metal) significantly improves speed

Model Sizes

Smaller models (7B parameters) run faster but may be less capable. Larger models (30B+) are more powerful but require more resources. Choose based on your hardware and use case.

Best Practices

1. Choose Appropriate Model Size

Match model size to your hardware capabilities

2. Keep Ollama Updated

Update Ollama regularly for performance improvements and new models

3. Test with Smaller Models First

Start with 7B models before trying larger ones

4. Monitor Resource Usage

Keep an eye on RAM and GPU usage when running models

5. Use for Development and Testing

Great for prototyping without API costs

Troubleshooting

Common Issues

Connection refused

Ensure Ollama server is running: ollama serve

Model not found

Pull the model first: ollama pull model-name

Out of memory

Try a smaller model or close other applications

Slow performance

Consider using a smaller model or upgrading hardware

What's Next?