AgentOps Forensics · Cost anatomy

Where does a self-hosted Hermes agent's per-run cost go?

Cost per useful run = token cost per run, plus monthly infrastructure divided by runs per month, plus retry overhead: this page gives that model and a worked example you can apply to your own logs. The largest input is usually fixed overhead, paid on every run before the task starts; a public field report on Hermes v0.6.0 put it near 73 percent of per-run tokens, tool definitions the largest single share (issue #4379).

Run the free self-check the full cost guide

✓ A model, not a sales number ✓ Every figure sourced ✓ Read from your own logs

The model

How much does one Hermes Agent run cost?

The number that matters is cost per useful run, not your raw monthly spend. A run that retries three times costs three runs; a run that fails costs and earns nothing. The working figure is:

cost per useful run = token cost per run + (monthly infrastructure / runs per month) + retry overhead.

Illustrative inputs only, not a measured benchmark and not your numbers: at 20,000 runs a month, $1,800 of infrastructure, and $0.32 of model tokens per run, the formula returns about $0.41 per run. If a managed plan quoted $0.13 per run, that example would be roughly 3x. The real comparison is your own figure against a price you are actually quoted; the decision is not the multiple, it is whether you can carry the operational work for the gap.

To compute yours: take one representative window of your own usage logs, itemise token cost per run, and divide by the runs that produced a useful result. This page gives a model and a public field report, not a measurement of your deployment. The same model drives the free self-check.

What this page claims, and does not: it reports one public field report (issue #4379) and a transparent model. It does not report measurements of your system, and the per-run dollar figures above are illustrative, not observed.

The anatomy

Why is so much of a Hermes Agent run fixed overhead?

Because tool definitions, the system prompt, and context are re-sent on every call, before any task-specific work. The field report on Hermes v0.6.0 itemised it at roughly 73 percent fixed (issue #4379):

Per-run token split (#4379)

The task is the minority

Two thirds of every run is paid before the agent does the thing you asked. Trim the part that repeats and you trim most of the bill.

Read issue #4379

Tool definitions46%

re-sent on every call, whether the run uses the tool or not

System prompt27%

instructions and formatting, paid before the task starts

Task-specific work27%

the only part that varies with what you actually asked

Per-run token split, Hermes v0.6.0 public field report (issue #4379). One deployment, one version; measure your own.
Component	Share of per-run tokens	Paid when
Tool definitions	about 46 percent	re-sent on every call, whether the run uses the tool or not
System prompt	about 27 percent	before any task work, on every call
Task-specific work	about 27 percent	the only part that scales with what you asked

The mechanism is vendor-documented. Anthropic's tool-use pricing documentation states that input-token pricing includes the tools parameter: the names, descriptions, and schemas of every tool included in the request. In traces from this site's own self-hosted Hermes runs, Hermes sent the full schema of every registered tool on every call, consistent with the field report's split. That is a qualitative read, not a second measurement. The exact split is still yours to measure.

For each component, what inflates it and the figure to pull from your own logs, see the per-component fixed-overhead breakdown.

Bill drivers

What makes a Hermes Agent bill creep up without warning?

The expensive failures look like normal runs. Each one re-pays the fixed overhead, so a reliability problem shows up first as a cost problem.

Cold restarts

State resets on restart, so a run starts over and re-pays the full fixed overhead.

Read fromrestart timestamps against run-id continuity.

Retry storms

A flaky tool or a drifting retrieval step triggers retries, each one a full-priced run.

Read fromretry count per successful task, before and after an upgrade (issue #34084).

iii

Tool bloat

Every tool you register is re-sent on every run, so unused tools tax every call.

Read fromtool-definition tokens divided by tools the run actually used.

Idle context

A long system prompt or growing context that no longer earns its tokens.

Read fromsystem-prompt tokens as a share of the run.

The full six-check taxonomy (state, migration, data exposure, tool safety, failure trail) is on the guide home; the cost angle is the four bill-creep drivers.

The levers

How do I cut Hermes Agent token costs?

Three levers can materially reduce the fixed overhead, because they attack the part that repeats on every call:

Load fewer tools per run. Tool definitions were the largest overhead share (#4379). Register only what a given run can use, not the whole catalogue.

Trim the system prompt. Every token of instruction is paid on every call. Cut what the model already knows.

Cap retries. A retry is a full-priced run. Bound them, and fix the flaky step rather than paying for it repeatedly.

Then compare your cost per useful run to a quoted managed price and make the keep, migrate, or retire call. If you want that call made independently on your own numbers, a written Verdict does it; the framework on this page is free.

Questions

Hermes Agent cost: frequently asked

How much does one self-hosted Hermes Agent run cost?+

It depends on your model, your token volume, and your infrastructure, so the honest answer is a model, not a single number. Cost per useful run is roughly: token cost per run, plus monthly infrastructure divided by runs per month, plus the cost of retries. The number that matters is cost per successful run, not your raw monthly bill.

Why is most of a Hermes Agent run fixed overhead?+

Because tool definitions, the system prompt, and context are re-sent on every call, before any task-specific work. A public field report on Hermes v0.6.0 put that fixed overhead near 73 percent of per-run tokens, with tool definitions the largest single share (issue #4379). The task you actually asked for is often the minority of the spend.

How do I calculate my cost per useful run?+

Take one representative window of usage logs. Sum the token cost across runs and divide by the number of runs that produced a useful result, not the total runs. Add infrastructure divided by runs, and count retries as full runs. Compare that figure to a managed price per run, not to your monthly invoice.

What is the 73 percent overhead figure based on?+

A public field report on Hermes v0.6.0 in the project issue tracker (issue #4379), which itemised per-run tokens and found roughly 73 percent went to fixed setup and context overhead, tool definitions the largest part. Your number will differ; the point is to measure it from your own logs.

How do I cut Hermes Agent token costs fast?+

Load fewer tools per run, trim the system prompt, and cap retries. Those three can materially reduce the fixed overhead because they attack the part that repeats on every call. Then compare cost per useful run to a managed alternative before deciding to keep self-hosting, migrate, or stop.

Is self-hosting Hermes Agent cheaper than managed for my workload?+

Only if your cost per useful run, computed from your own logs, is below a managed price you are actually quoted, and you can carry the operational work. The raw monthly bill is the wrong comparison; cost per useful run against a quoted per-run price is the right one. High, steady volume with a stable set of tools is the case most likely to favor self-hosting.

Who writes this

byJed, building and operating production software since 2007 and running self-hosted Hermes workflows since February 2026. The cost model and the framework are given away here; a paid Verdict applies them to your specific numbers, read from your logs.

Figures sourced from the public Hermes issue tracker (#4379, #34084). Illustrative numbers are labelled; your Verdict is computed from your own data.

Run the free self-check~5 min · no signup · in your browser Start