Cloud LLMs price by the token; local LLMs price by the kilowatt. The honest comparison is a rent-versus-buy decision: cloud wins at low, spiky usage with zero capital; ownership wins at sustained volume, where capital amortises and the marginal cost of a question falls to the electricity it took to answer. Everything else is detail — important detail, set out below.
The cost structures, side by side
| Component | Cloud API | Owned appliance |
|---|---|---|
| Capital | £0 | Hardware + integration, one-off |
| Marginal cost | Per token, forever | Electricity, approximately |
| Scaling | Linear with usage | Flat to the hardware's ceiling |
| Price control | Vendor repricing risk | Your energy tariff |
| End of life | Nothing owned | Residual hardware + weights you keep |
A worked example (illustrative)
Numbers below are deliberately round, for the shape of the maths rather than a quote — plug in your own volumes and tariffs.
Take a 40-person professional team making steady internal use of an assistant: drafting, summarisation, retrieval over firm documents. Suppose that's ~500 million tokens a month of combined input and output. At a blended API rate of, say, £3 per million tokens, that's £1,500/month — £18,000 a year, every year, rising with adoption.
The local alternative: a workstation-class appliance — call it £15,000–£25,000 installed, including integration and retrieval over the firm's corpus. Running at a 350 W average for working hours, it draws very roughly 700 kWh a year — £200–£300 of electricity at UK commercial rates, plus a maintenance contract.
Annual cloud spend ÷ appliance cost ≈ payback period. £18k/year against a £20k machine pays back in roughly 13–14 months — after which inference is effectively free at the margin, and heavier usage makes the case better, not worse. Cloud spend scales with success; an owned machine doesn't.
RUN YOUR OWN NUMBERS — COST CALCULATOR →
When cloud genuinely wins
Honesty cuts both ways. The API is the right answer when usage is low or experimental (the capital can't amortise), when you need frontier-model capability for open-ended work that a scoped local model can't match, or when workloads are extremely bursty and elasticity matters more than unit cost. Many of our clients run both: cloud for general low-sensitivity work, an appliance for the data that must not travel.
The costs that don't appear on either invoice
- Data exposure — every cloud token is a disclosure to a third party. For privileged, clinical or classified material that's not a cost line; it's a prohibition. See air-gapped AI, explained.
- Continuity — cloud inference fails with the link, the vendor, or the subscription. An appliance fails only with the power. For resilience deployments this is the entire point.
- Repricing and retirement risk — API prices, rate limits and model availability are someone else's decisions. Owned weights can't be deprecated out from under you.
- Capability ceiling — the cloud's genuine advantage: frontier models, continuously improved. An appliance narrows the gap with retrieval over a curated corpus, and closes it on the questions the corpus was built for.
Rent intelligence where the data is cheap and the stakes are low. Own it where the data is priceless or the connection is mortal.
If you want the maths run against your actual volumes, power costs and hardware class, that's a one-email exercise — see below.