Ollama Provider
Run LLM models locally on your own hardware using Ollama with SimplerLLM's unified interface.
Overview
Ollama lets you run large language models locally on your own machine. SimplerLLM provides seamless integration with Ollama, using the same unified interface as cloud providers but without API costs or internet dependency.
Benefits of Local Models
- Privacy: Your data never leaves your machine
- No API Costs: Run models for free after initial setup
- Offline Usage: No internet connection required
- Full Control: Choose exact model versions and configurations
Setup and Installation
1. Install Ollama
Visit ollama.com and download the installer for your operating system:
- macOS: Download .dmg installer
- Linux: Run
curl -fsSL https://ollama.com/install.sh | sh
- Windows: Download Windows installer
2. Pull a Model
Download a model to your local machine:
# Pull Llama 2
ollama pull llama2
# Pull Mistral
ollama pull mistral
# Pull CodeLlama
ollama pull codellama
# View available models at https://ollama.com/library
3. Start Ollama Server
Ollama runs as a local server. Start it with:
ollama serve
On macOS and Windows, Ollama starts automatically after installation. On Linux, you may need to start it manually.
Using Ollama Models
Model Examples
# Llama 2 (example)
model_name="llama2"
# Mistral (example)
model_name="mistral"
# CodeLlama (example)
model_name="codellama"
# Phi (example)
model_name="phi"
# Any model from Ollama library
Finding Available Models
Browse all available models at Ollama's model library. Models include Llama, Mistral, Phi, Gemma, and many more.
Basic Usage
from SimplerLLM.language.llm import LLM, LLMProvider
# Create Ollama LLM instance
# No API key needed - connects to local Ollama server
llm = LLM.create(
provider=LLMProvider.OLLAMA,
model_name="llama2"
)
# Generate a response
response = llm.generate_response(
prompt="Explain machine learning in simple terms"
)
print(response)
With Custom Parameters
llm = LLM.create(
provider=LLMProvider.OLLAMA,
model_name="mistral", # Use any Ollama model name
temperature=0.7,
max_tokens=1000
)
response = llm.generate_response(
prompt="Write a Python function to sort a list"
)
print(response)
Custom Ollama Host
If Ollama is running on a different host or port:
llm = LLM.create(
provider=LLMProvider.OLLAMA,
model_name="llama2",
base_url="http://localhost:11434" # Default Ollama URL
)
# Or connect to remote Ollama instance
llm = LLM.create(
provider=LLMProvider.OLLAMA,
model_name="llama2",
base_url="http://192.168.1.100:11434"
)
Advanced Features
Structured JSON Output
from pydantic import BaseModel, Field
from SimplerLLM.language.llm_addons import generate_pydantic_json_model
class CodeReview(BaseModel):
language: str = Field(description="Programming language")
issues: list[str] = Field(description="Code issues found")
suggestions: list[str] = Field(description="Improvement suggestions")
llm = LLM.create(provider=LLMProvider.OLLAMA, model_name="codellama")
result = generate_pydantic_json_model(
llm_instance=llm,
prompt="Review this code: def add(a, b): return a + b",
model_class=CodeReview
)
print(f"Language: {result.language}")
Model Management
Common Ollama Commands
# List downloaded models
ollama list
# Pull/download a model
ollama pull llama2
# Remove a model
ollama rm llama2
# Show model information
ollama show llama2
# Run Ollama interactively (for testing)
ollama run llama2
Performance Considerations
Hardware Requirements
- 8GB+ RAM recommended for 7B models
- 16GB+ RAM for 13B models
- 32GB+ RAM for 30B+ models
- GPU support (CUDA/Metal) significantly improves speed
Model Sizes
Smaller models (7B parameters) run faster but may be less capable. Larger models (30B+) are more powerful but require more resources. Choose based on your hardware and use case.
Best Practices
1. Choose Appropriate Model Size
Match model size to your hardware capabilities
2. Keep Ollama Updated
Update Ollama regularly for performance improvements and new models
3. Test with Smaller Models First
Start with 7B models before trying larger ones
4. Monitor Resource Usage
Keep an eye on RAM and GPU usage when running models
5. Use for Development and Testing
Great for prototyping without API costs
Troubleshooting
Common Issues
Connection refused
Ensure Ollama server is running: ollama serve
Model not found
Pull the model first: ollama pull model-name
Out of memory
Try a smaller model or close other applications
Slow performance
Consider using a smaller model or upgrading hardware
What's Next?
← OpenRouter
Learn about OpenRouter integration
Reliable LLM →
Add automatic failover between providers