AgentOps Forensics · The decision
First, both options have to clear your security, compliance, and reliability requirements. After that it is economics: compare your cost per useful run, computed from your own logs, against a managed price you were actually quoted. If self-hosting keeps a meaningful margin after retries and your time, keep it; if not, managed is usually the safer default. Monthly totals alone are not enough.
Before the math
Cost only decides between options that already work. Before comparing a single number, check that each option clears your hard requirements. If one fails, it is out regardless of price.
Where the data and the model may run, what has to be auditable, and which certifications are non-negotiable.
The response time and uptime your product needs, and whether each option holds them under load.
Whether both options run the model, tools, and context you actually depend on, not a near-equivalent.
Who can change the system, how fast, and whether you can carry the operational work either way.
This page is step three of the decision. First calculate your run economics, then map the fixed components, then make the keep, migrate, or retire call below.
When it wins
When your cost per useful run sits below a quoted managed price per run, and stays there after you count retries and your own time. That is the comparison that decides among options that passed the gates above, not monthly bill against monthly bill.
A useful run is one that produced an acceptable result, not one that errored or had to be redone. Measure cost per useful run over a trailing window that includes your spikes, not a quiet week.
Self-hosting tends to win at high, steady volume with a stable set of tools: you amortise the fixed infrastructure across many runs and you control the token overhead directly.
Managed tends to win at low or spiky volume, for small teams without spare operations capacity, or early on when the workload is still changing shape and pinning down infrastructure is premature.
The per-run figure comes from your logs, not a vendor table. The free self-check walks the same model.
The decision
Three outcomes, each with the condition that points to it, what it costs you, and the signal in your own numbers.
Your cost per useful run beats a quoted managed price by a margin wide enough to pay for the operational work, and reliability is under control.
The tradeYou carry infrastructure, reliability work, and on-call, but the per-run economics and the control are worth it.
The signalHigh, steady volume with a stable set of tools, and a cost-per-useful-run gap that holds after you count retries and your time.
The cost gap is thin or negative once you count the hidden costs, or the operational burden is pulling you off the product.
The tradeA higher visible per-run price, traded for less infrastructure, no on-call, and time back.
The signalLow or spiky volume, a small team without spare operations capacity, or a workload still changing shape week to week.
Neither self-hosting nor your current managed plan fits any more, usually because the workload changed underneath the architecture.
The tradeThe cost of a rebuild now, against the compounding cost of paying for a setup that no longer matches the work.
The signalCost per useful run drifting up for reasons no lever fixes, or an architecture bent around a workload that no longer exists.
This framework is fully usable for free. If you want a second opinion, a written Verdict applies the same rubric to your logs and your quote and returns one call: keep, migrate, or retire.
Questions
Both options have to clear your hard requirements first: security and compliance, data residency, the latency and reliability your product needs, and parity on the model, tools, and context you depend on. If one option fails a hard requirement, cost per run is irrelevant, that option is out regardless of price. Run the cost comparison only on the options that pass.
Long enough to include your spikes, not a quiet week. A trailing window of roughly 30 to 60 days is usually enough, counting only useful runs (runs that produced an acceptable result, not ones that errored or had to be redone). A calm window flatters self-hosting; a spike flatters managed, so measure across both.
When the cost-per-useful-run gap is thin or negative after you count the hidden costs, or when the operational burden is pulling you off your product. Spiky or low volume and small teams without spare operations capacity usually point to managed.
A three-way call read from your numbers, after both options clear your hard requirements: keep self-hosting if the per-run economics and control justify the operational work; migrate to managed if the gap is thin or operations are a drag; retire the current setup if the workload changed and neither option fits, which means rebuild or drop it. A written Verdict applies the same rubric to your logs.