Scale Under Load

Orpheus scales workers based on queue depth, not CPU.

Default Scaling

scaling:
  min_workers: 1      # Always keep 1 ready
  max_workers: 10     # Scale up to 10

With defaults, Orpheus:

Adds workers when queue backs up
Removes idle workers after 5 minutes
Always keeps at least min_workers warm

How Scaling Works

Every 5 seconds, Orpheus checks:

utilization = (queued + processing) / current_workers

If utilization > 3.0 → add a worker
If utilization < 0.5 → remove a worker

Example:

12 requests queued, 3 workers
Utilization = 12/3 = 4.0
4.0 > 3.0 → scale up

Monitor Scaling

orpheus stats my-agent

Output:

{
  "workers": {
    "current": 5,
    "healthy": 5,
    "min": 1,
    "max": 10
  },
  "queue": {
    "depth": 3,
    "processing": 5
  }
}

Tuning for Your Workload

Fast Tasks (< 1 second)

scaling:
  min_workers: 2      # More warm workers
  max_workers: 20     # Higher ceiling

Slow Tasks (> 30 seconds)

scaling:
  min_workers: 1
  max_workers: 5      # Fewer workers (each busy longer)

Bursty Traffic

scaling:
  min_workers: 3      # Handle burst immediately
  max_workers: 15

Cost-Sensitive

scaling:
  min_workers: 0      # Scale to zero when idle
  max_workers: 5

min_workers: 0 means cold starts. First request after idle period will be slower.

Zero Cold Starts

For instant responses:

scaling:
  min_workers: 1      # Always one worker warm

This keeps one worker running even with no traffic.

Test Scaling

Send concurrent requests:

# Send 20 requests in parallel
for i in {1..20}; do
  orpheus run my-agent '{"id": '$i'}' &
done
wait

# Check how many workers scaled up
orpheus stats my-agent

Scaling Limits

Setting	Effect
`min_workers`	Minimum warm workers (even when idle)
`max_workers`	Hard ceiling (won’t exceed)

If max_workers reached and queue still growing, requests wait.

Troubleshooting

Fix common issues →

Getting Started

Guides

Concepts

Reference

Examples

Troubleshooting

Scale Under Load

Default Scaling

How Scaling Works

Monitor Scaling

Tuning for Your Workload

Fast Tasks (< 1 second)

Slow Tasks (> 30 seconds)

Bursty Traffic

Cost-Sensitive

Zero Cold Starts

Test Scaling

Scaling Limits

Troubleshooting

Getting Started

Guides

Concepts

Reference

Examples

Troubleshooting

​Default Scaling

​How Scaling Works

​Monitor Scaling

​Tuning for Your Workload

​Fast Tasks (< 1 second)

​Slow Tasks (> 30 seconds)

​Bursty Traffic

​Cost-Sensitive

​Zero Cold Starts

​Test Scaling

​Scaling Limits

Troubleshooting

Default Scaling

How Scaling Works

Monitor Scaling

Tuning for Your Workload

Fast Tasks (< 1 second)

Slow Tasks (> 30 seconds)

Bursty Traffic

Cost-Sensitive

Zero Cold Starts

Test Scaling

Scaling Limits