AgentOps Forensics · Cost anatomy
Most of it is fixed overhead, paid on every run before the task starts. A public field report on Hermes v0.6.0 put that overhead near 73 percent of per-run tokens, tool definitions the largest single share (issue #4379). Here is the anatomy of one run, a model to estimate your own, and the failures that quietly inflate the bill.
The model
The number that matters is cost per useful run, not your raw monthly spend. A run that retries three times costs three runs; a run that fails costs and earns nothing. The working figure is:
cost per useful run = token cost per run + (monthly infrastructure / runs per month) + retry overhead.
Illustrative inputs only, not a measured benchmark and not your numbers: at 20,000 runs a month, $1,800 of infrastructure, and $0.32 of model tokens per run, the formula returns about $0.41 per run. If a managed plan quoted $0.13 per run, that example would be roughly 3x. The real comparison is your own figure against a price you are actually quoted; the decision is not the multiple, it is whether you can carry the operational work for the gap.
To compute yours: take one representative window of your own usage logs, itemise token cost per run, and divide by the runs that produced a useful result. This page gives a model and a public field report, not a measurement of your deployment. The same model drives the free self-check.
What this page claims, and does not: it reports one public field report (issue #4379) and a transparent model. It does not report measurements of your system, and the per-run dollar figures above are illustrative, not observed.
The anatomy
Because tool definitions, the system prompt, and context are re-sent on every call, before any task-specific work. The field report on Hermes v0.6.0 itemised it at roughly 73 percent fixed (issue #4379):
Two thirds of every run is paid before the agent does the thing you asked. Trim the part that repeats and you trim most of the bill.
Read issue #4379For each component, what inflates it and the figure to pull from your own logs, see the per-component fixed-overhead breakdown.
Bill drivers
The expensive failures look like normal runs. Each one re-pays the fixed overhead, so a reliability problem shows up first as a cost problem.
State resets on restart, so a run starts over and re-pays the full fixed overhead.
Read fromrestart timestamps against run-id continuity.
A flaky tool or a drifting retrieval step triggers retries, each one a full-priced run.
Read fromretry count per successful task, before and after an upgrade (issue #34084).
Every tool you register is re-sent on every run, so unused tools tax every call.
Read fromtool-definition tokens divided by tools the run actually used.
A long system prompt or growing context that no longer earns its tokens.
Read fromsystem-prompt tokens as a share of the run.
The full six-check taxonomy (state, migration, data exposure, tool safety, failure trail) is on the guide home; the cost angle is the four drivers above.
The levers
Three levers can materially reduce the fixed overhead, because they attack the part that repeats on every call:
Load fewer tools per run. Tool definitions were the largest overhead share (#4379). Register only what a given run can use, not the whole catalogue.
Trim the system prompt. Every token of instruction is paid on every call. Cut what the model already knows.
Cap retries. A retry is a full-priced run. Bound them, and fix the flaky step rather than paying for it repeatedly.
Then compare your cost per useful run to a quoted managed price and make the keep, migrate, or retire call. If you want that call made independently on your own numbers, a written Verdict does it; the framework on this page is free.
Questions
It depends on your model, your token volume, and your infrastructure, so the honest answer is a model, not a single number. Cost per useful run is roughly: token cost per run, plus monthly infrastructure divided by runs per month, plus the cost of retries. The number that matters is cost per successful run, not your raw monthly bill.
Because tool definitions, the system prompt, and context are re-sent on every call, before any task-specific work. A public field report on Hermes v0.6.0 put that fixed overhead near 73 percent of per-run tokens, with tool definitions the largest single share (issue #4379). The task you actually asked for is often the minority of the spend.
Take one representative window of usage logs. Sum the token cost across runs and divide by the number of runs that produced a useful result, not the total runs. Add infrastructure divided by runs, and count retries as full runs. Compare that figure to a managed price per run, not to your monthly invoice.
A public field report on Hermes v0.6.0 in the project issue tracker (issue #4379), which itemised per-run tokens and found roughly 73 percent went to fixed setup and context overhead, tool definitions the largest part. Your number will differ; the point is to measure it from your own logs.
Load fewer tools per run, trim the system prompt, and cap retries. Those three can materially reduce the fixed overhead because they attack the part that repeats on every call. Then compare cost per useful run to a managed alternative before deciding to keep self-hosting, migrate, or stop.
Only if your cost per useful run, computed from your own logs, is below a managed price you are actually quoted, and you can carry the operational work. The raw monthly bill is the wrong comparison; cost per useful run against a quoted per-run price is the right one. High, steady volume with a stable set of tools is the case most likely to favor self-hosting.