Model Servers

Orpheus can manage local model servers (Ollama, vLLM) with automatic lifecycle and supervision.

Overview

When you specify engine in agent.yaml, Orpheus:

Starts the model server if not running
Monitors health continuously
Restarts on failures with backoff
Injects MODEL_URL environment variable

Configuration

name: my-agent
runtime: python3
module: agent.py
entrypoint: handler

# Model server configuration
engine: ollama           # ollama or vllm
model: mistral           # Model name

Ollama Setup

macOS (Metal)

# Install Ollama
brew install ollama

# Pull a model
ollama pull mistral

# Orpheus will start it automatically

Linux

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull mistral

Agent Configuration

name: my-agent
runtime: python3
module: agent.py
entrypoint: handler

engine: ollama
model: mistral

Your agent receives MODEL_URL environment variable:

import os
import requests

MODEL_URL = os.environ.get("MODEL_URL", "http://localhost:11434")

def handler(input_data):
    response = requests.post(
        f"{MODEL_URL}/api/generate",
        json={"model": "mistral", "prompt": input_data["query"]}
    )
    return {"response": response.json()["response"]}

vLLM Setup

vLLM requires Linux + NVIDIA GPU with CUDA.

Requirements

Ubuntu 22.04+
NVIDIA GPU (8GB+ VRAM)
CUDA 12.0+
Python 3.10+

Installation

pip install vllm

Agent Configuration

name: my-agent
runtime: python3
module: agent.py
entrypoint: handler

engine: vllm
model: mistralai/Mistral-7B-Instruct-v0.2

Supervision Policy

Orpheus supervises model servers with production-grade policies:

Circuit Breaker

Prevents restart storms:

5 restarts max per 5 minutes
Opens circuit if threshold exceeded
Resets after cool-down period

Exponential Backoff

Delays between restart attempts:

2s → 4s → 8s → 16s → 32s → 60s (max)

With ±20% jitter to prevent thundering herd.

OOM Handling

When model server exits with code 137 (OOM):

Triggers 60-second minimum backoff
Logs warning for memory investigation

Monitoring

Check model server status:

# View agent stats (includes service status)
orpheus stats my-agent

Prometheus metrics available:

orpheus_service_up{agent="my-agent"} 1
orpheus_service_uptime_seconds{agent="my-agent"} 3600

Troubleshooting

Model Server Won’t Start

Check logs:

orpheus logs my-agent

Common causes:

Model not pulled (ollama pull mistral)
Port already in use
Insufficient VRAM (for vLLM)

Slow Model Loading

First request may be slow while model loads into memory. Subsequent requests are fast. Tip: Set min_workers: 1 to keep a worker warm with model loaded.

Observability

Monitor model server health →

Getting Started

Guides

Concepts

Reference

Examples

Troubleshooting

Overview

Configuration

Ollama Setup

macOS (Metal)

Linux

Agent Configuration

vLLM Setup

Requirements

Installation

Agent Configuration

Supervision Policy

Circuit Breaker

Exponential Backoff

OOM Handling

Monitoring

Troubleshooting

Model Server Won’t Start

Slow Model Loading

Observability

Getting Started

Guides

Concepts

Reference

Examples

Troubleshooting

​Overview

​Configuration

​Ollama Setup

​macOS (Metal)

​Linux

​Agent Configuration

​vLLM Setup

​Requirements

​Installation

​Agent Configuration

​Supervision Policy

​Circuit Breaker

​Exponential Backoff

​OOM Handling

​Monitoring

​Troubleshooting

​Model Server Won’t Start

​Slow Model Loading

Observability

Overview

Configuration

Ollama Setup

macOS (Metal)

Linux

Agent Configuration

vLLM Setup

Requirements

Installation

Agent Configuration

Supervision Policy

Circuit Breaker

Exponential Backoff

OOM Handling

Monitoring

Troubleshooting

Model Server Won’t Start

Slow Model Loading