Overview
When you specifyengine in agent.yaml, Orpheus:
- Starts the model server if not running
- Monitors health continuously
- Restarts on failures with backoff
- Injects
MODEL_URLenvironment variable
Configuration
Ollama Setup
macOS (Metal)
Linux
Agent Configuration
MODEL_URL environment variable:
vLLM Setup
vLLM requires Linux + NVIDIA GPU with CUDA.
Requirements
- Ubuntu 22.04+
- NVIDIA GPU (8GB+ VRAM)
- CUDA 12.0+
- Python 3.10+
Installation
Agent Configuration
Supervision Policy
Orpheus supervises model servers with production-grade policies:Circuit Breaker
Prevents restart storms:- 5 restarts max per 5 minutes
- Opens circuit if threshold exceeded
- Resets after cool-down period
Exponential Backoff
Delays between restart attempts:OOM Handling
When model server exits with code 137 (OOM):- Triggers 60-second minimum backoff
- Logs warning for memory investigation
Monitoring
Check model server status:Troubleshooting
Model Server Won’t Start
Check logs:- Model not pulled (
ollama pull mistral) - Port already in use
- Insufficient VRAM (for vLLM)
Slow Model Loading
First request may be slow while model loads into memory. Subsequent requests are fast. Tip: Setmin_workers: 1 to keep a worker warm with model loaded.
Observability
Monitor model server health →

