AgentOps Forensics · Cost reference
On a public field report for Hermes v0.6.0, about 73 percent of per-run tokens were fixed overhead, paid on every call before the task: roughly 46% tool definitions, 27% system prompt, 27% task work (issue #4379). This is the component view: what each part is, what inflates it, and how to measure your own.
Definition
Scope: this is the token-component reference only, no dollar figures. For per-run cost in dollars, the bill-creep drivers, and the keep, migrate, or stop decision, see the Hermes Agent cost breakdown.
Fixed overhead is everything the model is sent before it does the task you asked for, on every call: the tool definitions and the system prompt, plus any standing context. It is fixed in the sense that it repeats run after run, largely independent of what each run actually does.
The opposite is task-specific work: the input and the reasoning for the one thing you asked. On the public field report, the fixed part was roughly two thirds of the run, which is why a reliability problem that re-runs the agent shows up first as a cost problem.
What this page reports, and does not: the 46 / 27 / 27 split is a single field report (n=1) on one Hermes version (issue #4379), not a universal constant. Your split will differ by model, tool count, and prompt. Treat it as the shape to expect, then measure your own.
The split
Three buckets, one of them dominant. Tool definitions alone were nearly half of every run (issue #4379):
The schemas of registered tools are the single largest line, ahead of the system prompt and the task itself. Attack that line and you attack most of the overhead.
Read issue #4379By component
For each part: what it is, what inflates it, the figure to pull from your own logs, and the one lever that moves it.
The JSON schema of every tool you register: names, descriptions, and parameters. It is re-sent on every call, whether or not that run uses the tool.
Grows withMore tools registered, and more verbose schemas (long descriptions, many parameters).
MeasureTool-definition tokens divided by the number of tools the run actually called.
LeverRegister only the tools a given run can use, not the whole catalogue; trim verbose schemas.
Standing instructions: role, formatting rules, and any worked examples. Paid in full before the task starts.
Grows withAccumulated "just in case" instructions, embedded few-shot examples, long formatting rules.
MeasureSystem-prompt tokens as a share of the whole run.
LeverCut what the model already knows; move rare instructions to the runs that need them.
The actual input and the work you asked for. The only part that scales with the task, not strictly overhead.
Grows withLarger inputs and longer task context. This is the part you are paying to get.
MeasureWhat is left after tool definitions and the system prompt are accounted for.
LeverNone worth pulling: shrinking this shrinks the work itself. Optimise the two above instead.
This is the component reference. For the full per-run cost model, the bill-creep drivers, and the keep-migrate-stop decision, see the Hermes Agent cost breakdown.
Questions
Tool definitions. On a public field report for Hermes v0.6.0, the JSON schemas of registered tools were the largest single share of per-run tokens, near 46 percent of the run (issue #4379). They are re-sent on every call regardless of whether the run uses each tool, so they tax every run.
Yes. Every tool you register has its schema re-sent on every call, used or not. That is why tool definitions dominate the fixed overhead: a large catalogue is paid for on every run, even when a run touches only one tool.
Take one representative window of usage logs and bucket the tokens into three: tool definitions, the system prompt, and the task input. The first two are your fixed overhead, paid before any task work. The field report figures (roughly 46 / 27 / 27) are one deployment on one version; measure yours rather than assuming them.
No. Task-specific tokens are the only part that scales with what you asked, so they are not overhead in the strict sense. They are shown alongside tool definitions and the system prompt only to give the proportion: on the field report, the work you actually asked for was the minority of the run.