Azure OpenAI Pricing Guide UK 2026

Azure OpenAI has two pricing models that work completely differently from each other and from every other Azure service. Standard (PAYG) charges per token — you pay only for what you use, it scales to zero, and the cost is entirely predictable per request. Provisioned Throughput Units (PTUs) charge per hour regardless of actual usage, like reserving a slot at a restaurant whether or not you show up. PTUs save money only if your application sustains load for roughly 7 or more days per month. Below that threshold, PAYG is cheaper. Microsoft published a dedicated blog post titled “Do PTUs Save You Money?” specifically because this calculation confuses engineers — that is a signal worth paying attention to.

Prices last verified: April 2026

Token-based pricing: what a token is

Everything you send to and receive from Azure OpenAI is billed in tokens. A token is roughly four characters of English text, or about three-quarters of a word. The phrase “Azure OpenAI is expensive” contains seven words and approximately nine tokens. Longer contexts — sending previous conversation history as part of a prompt — rapidly increase your input token count and therefore your costs.

Input tokens and output tokens are priced separately, and output tokens consistently cost more — typically 3–4× the input rate. This reflects the computational cost of generation versus reading. For GPT-4o, input costs £0.0019 per 1,000 tokens and output costs £0.0075 per 1,000 tokens (April 2026). A chat application that sends 1 million input tokens and generates 200,000 output tokens per day costs approximately £80/month.

Unlike compute-based services where you pay for time, token billing is purely consumption-based. An idle deployment costs nothing. A spike in usage costs exactly proportionally more. This is the primary appeal of Standard PAYG for most production workloads.

Model selection: GPT-4o vs mini vs o3

The choice of model is the single biggest cost lever you have. GPT-4o mini costs approximately 5% of GPT-4o's input rate (£0.0001 vs £0.0019 per 1K tokens). For high-volume classification, summarisation, or routing tasks that do not require frontier capability, GPT-4o mini is almost always the right choice.

GPT-4o is appropriate when you need the full capability — complex reasoning, nuanced language, or tasks where accuracy directly affects business outcomes. The quality gap between GPT-4o and GPT-4o mini is significant for complex tasks but negligible for straightforward extraction or classification.

o3-mini is a reasoning model in the o-series family. It thinks before responding, which improves accuracy on mathematical, scientific, and multi-step logical tasks. The trade-off is higher latency and higher cost than GPT-4o-mini, though still below GPT-4o at £0.0008 input / £0.0033 output per 1K tokens. Use it for tasks that specifically benefit from explicit chain-of-thought reasoning.

GPT-4 (8K context, legacy) at £0.0226/£0.0451 per 1K tokens is significantly more expensive than GPT-4o for equivalent quality. New deployments should use GPT-4o. GPT-4 legacy pricing reflects older infrastructure costs and will eventually be retired.

Standard PAYG: when it is the right choice

Standard PAYG is the correct choice for the majority of Azure OpenAI workloads. It has no minimum commitment, scales to zero during idle periods, and is completely predictable — every request costs a known amount based on its token count. For a new deployment, PAYG eliminates the risk of committing to capacity before you understand your actual usage patterns.

PAYG becomes less attractive when your workload is predictable and continuous. If your application handles roughly constant load throughout the working day and you are approaching the break-even point for PTU reservations, the capacity guarantees of PTU start to matter alongside the cost argument.

The practical threshold: if your deployment is likely to be idle for more than half the days in a month (for example, a batch processing job that runs every other week), Standard PAYG is almost certainly cheaper. Use the calculator to model your specific token volumes against PTU rates.

PTU reservations: the 7-day break-even rule

A Provisioned Throughput Unit (PTU) reserves dedicated inference capacity for your deployment. You are charged per PTU per hour regardless of whether you use that capacity. For GPT-4o, 1 PTU provides approximately 6 tokens per second of sustained throughput.

The break-even analysis is straightforward. PTU monthly cost is fixed: 100 PTUs at the global rate of £0.7524/hr × 730 hours = £549.25/month. Standard PAYG monthly cost varies with usage. At 1 million input tokens and 200,000 output tokens per day, Standard costs approximately £80/month — so PTU would cost 7× more. At 10 million input tokens and 2 million output tokens per day, Standard costs approximately £800/month, at which point 100 PTUs become the cheaper option.

The “7 days” rule of thumb reflects the observation that most workloads do not sustain uniform load across all 30 days. Weekends, holidays, maintenance windows, and traffic variability mean that a deployment running at full capacity for 22 working days still has 8 idle days where PTU capacity is wasted. The break-even point — where PTU monthly cost equals the Standard PAYG cost of your actual usage — often falls around 7–10 active days for typical production workloads.

PTU commitment terms: hourly (no discount), 1-month (~5% saving), 1-year (~33% saving). Annual PTU commitments make most sense when you have confirmed that PTU is already cost-effective on an hourly basis.

Image and vision tokens

Vision-capable models (GPT-4o, GPT-4o mini) can process images as inputs. Images are not billed in text tokens — they are priced per image at approximately £0.0577 per 1,080-pixel image (April 2026). Higher resolution images cost more. Images are split into 512×512 pixel tiles and each tile costs one tile fee.

This often surprises engineers who assume image processing is an extension of text token billing. A pipeline that processes 100 product images per day would cost approximately £1.73/day for image inputs alone, regardless of the text tokens in the accompanying prompt. Factor this separately when estimating costs for document processing or visual search applications.

Practical estimation before deploying

The hardest part of estimating Azure OpenAI costs is predicting token volumes before you have production data. A practical approach:

Benchmark a representative sample: Take 50–100 representative requests and measure actual token counts using the model tokeniser. Multiply by your expected request volume.
Account for context window growth: Conversational applications accumulate history in each request. A 10-turn conversation includes all previous messages — the 10th message might have 10× the token count of the first. This compounds rapidly.
Separate input from output: Output volume is often unpredictable. If you allow open-ended generation (no max_tokens limit), actual output can vary by 5–10× across requests. Set a max_tokens ceiling for cost predictability.
Use Standard PAYG for the first month: Do not commit to PTU until you have at least 30 days of production data. The insights from actual usage almost always change your estimate.

Azure OpenAI Calculator

Use the Azure OpenAI Calculator → to estimate your costs. The break-even analysis shows the number of active days per month at which PTU becomes cheaper than PAYG at your usage level.

Built and verified by an independent Azure engineer.

smart_toy

Azure OpenAI Calculator

Estimate GPT-4o Standard PAYG vs PTU costs with break-even analysis in GBP.

arrow_forward

ADPluralsightAzure AI · OpenAI · GPT-4o

AZ-900 Microsoft Azure Fundamentals

From £29/month

View courseopen_in_new

AD: We earn a commission on qualifying purchases at no extra cost to you.

Related calculators

calculateAzure OpenAI Calculator