Azure Data Factory Cost Calculator

UK South pricing · Copy Activity (DIU-hours) · Data Flow · Self-hosted IR · Pipeline orchestration

Try a scenario:

Copy Activity

Azure-hosted IR · charged per DIU-hour

Pipeline runs / month

runs

Avg. duration / run

min

DIU count

lightbulb

Cost tip: The "Auto" DIU setting defaults to 4 DIUs minimum. For dev/test pipelines, manually setting 2 DIUs halves your Copy Activity cost. Production workloads benefit from higher DIU counts for parallelism.

Data Flow

Compute-intensive transformations · charged per vCore-hour

Self-hosted IR

On-premises data movement · charged per hour

Pipeline Orchestration

Activity runs · first 1,000/month free (Azure-hosted IR)

Activity runs / month

runs

✓ Within free 1,000 runs/month — no orchestration charge

Monthly estimate (PAYG)

...

ⓘ

Cost estimates provided by this tool are approximations only and are not intended as actual price quotes. Prices are sourced from the Microsoft Azure Retail Prices API and are updated nightly. Actual Azure costs may vary based on the terms of your Microsoft agreement, reserved instance pricing, enterprise discounts, commitment plans, currency exchange rates, and regional availability changes. This tool is not affiliated with, endorsed by, or sponsored by Microsoft Corporation. Always verify pricing through the official Azure Pricing Calculator atazure.microsoft.com/pricing/calculatorbefore making purchasing decisions.

open_in_newVerify on official Azure pricing page

Pricing Guide

Data Factory billing is driven by activity complexity, not just throughput

Azure Data Factory is deceptively priced because the cost depends on _what_ you're doing, not just how often. A simple copy of 1 GB from Blob Storage to SQL Database might cost £0.005, but an equivalent copy routed through a Data Flow (transformation) can cost £0.50+ because it spins up compute. Teams often miscalculate because they focus on ingestion volume and ignore pipeline orchestration, data flow vCore hours, and self-hosted integration runtime fees. The result: invoices that look reasonable but carry hidden architecture costs baked into the transformation logic.

This page keeps the calculator as the primary action, then explains the cost drivers and optimization paths that distinguish a cheap extraction workflow from an expensive replatforming pipeline.

Cost drivers: Copy, Data Flow, and Orchestration

Data Factory bills for three primary activity types:

Copy Activity: Moves data without transformation. Cost scales with DIU-hours (Data Integration Units × hours consumed). DIU are compute slots assigned to the copy task; more DIUs = higher parallelism but higher cost. A simple copy to a managed connector (e.g., SQL Database, Blob Storage) is cheaper than copying to self-managed sources or using custom logic.
Data Flow (Mapping Data Flow): Performs transformations (filtering, joining, aggregating, writing). Spins up Apache Spark clusters per vCore-hour. Data flows are the heaviest cost component and are often overlooked during design because engineers focus on logical correctness, not cost per GB-to-process.
Orchestration: Each pipeline run and activity execution are separately billable. A pipeline that triggers 1,000 times per month with 5 activities = thousands of billable orchestration events. This is usually cheap (£0.0001–0.001 per activity run) but accumulates.

Self-hosted integration runtimes add another layer: a standing monthly charge plus per-execution overhead. Using a self-hosted IR for a heavy workload can triple the cost compared to managed runtime.

DIU-hours: where Copy Activity costs accumulate

A DIU (Data Integration Unit) is a measure of compute that Data Factory assigns to a Copy Activity. You select a "number of DIUs" (1, 2, 4, up to 256 depending on the runtime), and Data Factory consumes DIU-hours at a rate of ~£0.0069 per DIU-hour in UK South. The formula is simple:

Cost = DIUs × Hours × £0.0069/DIU-hour

The trap is that teams often don't optimize the DIU count. Assigning 4 DIUs to copy 100 MB (which could run on 1 DIU in 5 seconds) wastes cluster compute. Conversely, assigning 1 DIU to copy 10 GB from a slow on-premises source may take 30 minutes instead of 5, consuming far more total DIU-hours than parallelizing with more DIUs would.

The per-GB cost depends on network topology, source/sink type, and complexity. A cloud-to-cloud copy (e.g., Blob to SQL Database) is cheap; a copy from a legacy on-premises ERP via a self-hosted IR while applying transformations is much more expensive.

Data Flow (most expensive): vCore-hours for transformations

A Mapping Data Flow executes transformations using Spark clusters. Unlike Copy Activity (which is dumb, just moves bytes), Data Flows must parse, transform, and serialize, which requires compute. Cost is:

Cost = Compute vCores × Runtime hours × £0.26–0.40/vCore-hour

A 4-core Data Flow running for 10 minutes = 4 × 0.1667 hours × £0.30 ≈ £0.20 per run. Multiply by daily triggers, and a "simple" aggregation can cost £6/month. Scale to a weekly full rebuild of a data warehouse with join-heavy logic, and a single Data Flow run can cost £5–20+.

The counterintuitive optimization: sometimes it's cheaper to export raw data from Data Factory to Synapse, Databricks, or SQL and run transformations there, where you can cache, re-use compute, or pausa between jobs. A naive "put all the logic in Data Flow" architecture can easily cost 2–3× as much as a deliberate multi-service design.

On-premises coupling: self-hosted integration runtime costs

A self-hosted integration runtime (IR) runs in your environment (on-premises or on-VM) and connects Azure Data Factory to local data sources. Cost includes:

Infrastructure: You provide or pay for the VM/hardware yourself (not ADF's charge, but your cost).
ADF licensing: ~£0.80 per node per hour. A 2-node cluster costs ~£16/month (assuming 24/7 running).
Activity execution: Per-activity cost applied on top of the IR licensing.

Teams often underbudget this because the IR licensing is conflated with infrastructure costs. A persistent 2-node self-hosted IR costs £500–600/month _before_ DIU or Data Flow costs are added.

Optimization: Limit self-hosted IRs to on-premises sources that absolutely cannot use managed connectors. Consider replicating data to Azure first, then processing, rather than pulling through an IR gate.

Worked example: migrating data warehouse with hidden Data Factory costs

A company migrates a 100 GB data warehouse from on-premises to Azure quarterly:

1. Extract raw data via self-hosted IR to Blob Storage (Copy Activity, 4 DIUs, 2 hours)
2. Transform with Mapping Data Flow (join, aggregate, deduplicate, 4 vCores, 1 hour)
3. Load to SQL Synapse (Copy Activity, 2 DIUs, 30 min)
4. Orchestration: 1 pipeline run + 3 activities = 3 orchestration events

Cost per migration run (estimated):

Self-hosted IR licensing: Sunk cost (£500–600/month for the node itself; let's exclude)
Copy Activity (DIU): 4 DIUs × 2.5 hours × £0.0069 = £0.07 + activity execution overhead
Data Flow (vCore-hours): 4 vCores × 1 hour × £0.35 = £1.40 + activity execution overhead
Orchestration: 3 events × £0.0001 ≈ £0.0003
Per-run total: ~£1.50–2.00

One quarterly run = £2. But in reality, teams do weekly test migrations, monthly incremental loads, and ad-hoc re-runs to verify logic. End result: £8–15/month for a process that "should" cost £0.50 because the engineering team didn't optimize DIU counts or question whether Data Flow was the right tool.

Cost optimization strategies

Right-size DIU and vCore allocation. Start small (1–2 DIUs), measure runtime, then scale up only if bottlenecked by network/serialization, not raw throughput.

Prefer Copy Activity over Data Flow for simple moves. If you're not transforming (just filtering or moving), a Copy Activity is 10–20× cheaper.

Push logic to the target (e.g., SQL, Synapse, Databricks). Receive raw data with a cheap Copy Activity, then transform in-place. This avoids Data Flow licensing and lets you batch, cache, and optimize at scale.

Minimize self-hosted IR usage. Only use for connectivity that cannot reach Azure from managed runtimes. If you migrate on-prem data, load it to Azure first, then process from there.

Schedule triggers carefully. Each pipeline execution is billable. Batch multiple jobs into one run instead of triggering atomically every few minutes.

Common questions

Is there a free tier for Data Factory? Data Factory has no free tier, but the orchestration and activity execution charges are small. The bulk of the cost comes from DIU-hours (Copy) and vCore-hours (Data Flow).

Can I reduce costs by switching to Azure Synapse pipelines? Synapse has similar activity costs but integrates native SQL and Spark compute differently. If you're already running Synapse, embedding pipelines can be cheaper than separate Data Factory orchestration, but the Decision depends on workload.

Is Data Flow cheaper in Azure Databricks? Databricks has its own pricing but offers more flexibility for optimization and cost-sharing with other tenants. Direct substitution is not straightforward; cost depends on cluster size and utilization.

What if I use a scheduled Copy Activity vs a Mapping Data Flow for the same task? Copy Activity is almost always cheaper for data movement alone. Use Data Flow only when you need transformations that SQL in the target cannot handle efficiently.

Methodology

Rates are sourced from the Azure Retail Prices API for UK South in GBP. The calculator includes DIU-hours for Copy Activity, vCore-hours for Data Flow, self-hosted IR licensing, and orchestration overhead. Actual costs depend on data complexity, transformation logic, and integration runtime topology.

How prices are calculated →

arrow_forwardBlob Storage Calculator arrow_forwardLog Analytics Calculator arrow_forwardHome