Useful-run cost
Cost per successful run, not raw monthly spend. Fixed overhead repeats on every call.
AgentOps Forensics · Field Report 01
The demo shows whether Hermes can work. Production shows what each useful run costs, where failures hide, and which checks to run before you decide what to do next.
Self-hosted Hermes agent cost is not just server spend. In production, the useful number is cost per successful run: model tokens, fixed prompt and tool overhead, retries, hosting, and the time spent finding failures. AgentOps Forensics is the six-check view of a deployment: useful-run cost, state integrity, migration safety, data and privacy exposure, tool safety, and failure trail. The checks and the fixes below are free. Run the self-check when you want to inspect your own setup.
The method
Read from evidence, not from whoever has a stake in the answer.
The framework
Cost per successful run, not raw monthly spend. Fixed overhead repeats on every call.
Checkpoints that survive a restart. Context that does not silently reset mid-session.
Dry-run and diff before you cut over, so a migration can't drop data without telling you.
Prompts, tool inputs, and logs scrubbed of PII and live keys, not piped raw into a place nobody audits.
Per-tool limits, allow/deny lists, and a kill switch. Permissive defaults are a blast radius.
Log the decision chain, not just the call, so you can reconstruct why an agent chose a tool.
Section 01 · Cost
There is no single number. Start with the formula: cost per useful run = hosting + input tokens + output tokens + retries + failed runs + operator time.
The surprise is usually fixed overhead, the tokens you pay on every run regardless of the task. In one public field report on Hermes v0.6.0, an operator profiled 6 request dumps and about 207 live API calls and found roughly 73 percent of each call was fixed overhead: about 13.9K tokens, mostly tool definitions (about 46 percent, across 31 tools) and the system prompt (about 27 percent). Treat that as a warning sign to measure your own setup, not a benchmark.
Go deeper: the full per-run anatomy and a cost model you can apply to your own logs is in the Hermes Agent cost breakdown; the per-component overhead breakdown shows where the fixed tokens go; and self-host vs managed walks the keep, migrate, or retire decision.
Tool definitions dominate. Work the largest, most-repeated chunk first. It compounds across every call you make.
issue #4379Section 02 · Failure modes
Most production Hermes failures are not clean crashes. Some throw errors. The expensive ones often look like normal runs until you compare state, outputs, and spend. One public issue, for example, reports holographic memory quietly dropping to keyword search when a dependency is missing, with no alert, so retrieval quality falls while everything looks fine. ↗ issue #34084
These are the same control gaps linked to public agent incidents (for example Replit's Day-9 data loss and the disputed Amazon Kiro report), framed as controls whose absence has been linked to incidents, not universal benchmarks. The free self-check scores five of the six (cost, state, data and privacy, tool safety, failure trail); migration safety has its own preflight.
Section 03 · Remedy
Work the fixed overhead first, because it repeats on every run. In the field report above, tool definitions were the largest chunk (about 46 percent, across 31 tools). Loading only the tools a run needs, or using tool search instead of inlining every definition where your setup supports it, takes the biggest bite. Trimming the system prompt (about 27 percent there) and the carried context comes next. Then cap retries with a budget, so a loop cannot quietly amplify the bill.
Section 04 · Decision
Your useful-run cost is controlled and the six checks pass. You can carry the operational work.
The operational work is larger than the savings. Compare cost per useful run to the managed price, not your raw monthly bill.
The workflow is not earning its run cost. Say so early, while you can still stop paying for it.
The self-check and the checklist above show you where you stand. If you want a written outside call on keep, move, or stop, there is a paid written Verdict.
Free · runs in your browser
Five scenarios across cost, state, data and privacy, tool safety, and failure trail, plus a migration check, name the gap most likely to bite first, what it breaks, and a fix you can run this week. Your answers never leave your browser; the result is computed on your device.
Reference
Usually fixed overhead, the tokens spent on every run regardless of the task. One public field report on Hermes v0.6.0 put it near 73 percent, with tool definitions the largest part (issue #4379). Loading fewer tools per run, trimming the system prompt, and capping retries remove most of it.
The expensive ones look like normal runs: state that resets, outputs that drift after an update, retrieval that quietly degrades (issue #34084). They show up as a surprise bill or a regression, not an alert.
Only if your cost per useful run is controlled and you can carry the operational work. Compare cost per useful run to the managed price, not your raw monthly bill.
Yes. The free self-check walks the readiness categories and gives a readout. The framework on this page is the same one a paid Verdict uses.
A six-check way to judge a self-hosted agent: useful-run cost, state integrity, migration safety, data and privacy exposure, tool safety, and failure trail, read from evidence rather than from whoever has a stake in the answer.
byJed, building and operating production software since 2007, and running self-hosted Hermes workflows since February 2026. The framework is given away here; a paid Verdict applies it to your specific numbers.