I needed VPN access to Azure resources across multiple subscriptions. The requirement was simple: secure access without managing additional credentials, no PSKs floating around, and everything infrastructure-as-code. Here’s how I built it.

The Problem

The Azure environment spans many subscriptions with overlapping IP ranges - a legacy of growth. We had:

  • SQL Managed Instances requiring private connectivity
  • Public SQL databases needing IP whitelisting
  • Development teams needing ad-hoc access
  • Zero appetite for managing VPN credentials separately from Azure AD

Traditional VPN solutions would require:

  • Separate credential management
  • Manual user provisioning
  • Shared secrets or certificate distribution
  • Additional MFA implementation

We needed something better.

The Solution: Azure VPN Gateway + Terraform + OIDC

The architecture leverages three key components:

  1. Azure VPN Gateway with Azure AD authentication
  2. Terraform for complete infrastructure-as-code
  3. GitHub Actions OIDC for zero-credential deployments

Why Azure VPN Gateway Over Alternatives

We evaluated WireGuard on a VM ($10/month) versus Azure VPN Gateway ($140/month). The cost difference is real, but Azure VPN Gateway won because:

  • Native Azure AD authentication (users authenticate with existing credentials)
  • Built-in MFA enforcement via Conditional Access policies
  • No certificate or PSK management
  • Enterprise support and SLAs
  • Automatic failover and redundancy

The “WireGuard is cheaper” argument falls apart when you factor in:

  • Time spent managing user certificates
  • Building Azure AD integration yourself
  • Maintaining the VM and WireGuard updates
  • On-call burden when it breaks at 2 AM

For a smaller engineering team, the labor cost of managing WireGuard exceeds the Azure service cost within the first month.

Implementation: Terraform Configuration

Network Design

We created a transit VNet in non-overlapping space:

locals {
  transit_vnet_cidr = "10.99.40.0/24"
  gateway_subnet    = "10.99.40.0/27"  # Azure minimum for VPN Gateway
  vpn_client_pool   = "10.99.50.0/26"  # 62 client IPs
  location          = "centralus"
}

Key decision: VPN client pool must be completely outside all Azure VNet ranges. Azure won’t let you use overlapping space for client addressing.

Azure AD Application Setup

The VPN Gateway authenticates users via an Azure AD application:

resource "azuread_application" "vpn" {
  display_name    = "Azure VPN"
  identifier_uris = ["api://<Client_ID>"]

  api {
    oauth2_permission_scope {
      admin_consent_description  = "Allow the application to access Azure VPN on behalf of the signed-in user."
      admin_consent_display_name = "Access Azure VPN"
      id                         = "00000000-0000-0000-0000-000000000001"
      enabled                    = true
      type                       = "User"
      value                      = "user_impersonation"
    }
  }

  web {
    redirect_uris = ["https://portscan.atollo.net/"]
  }
}

The redirect URI https://portscan.atollo.net/ is Microsoft’s hardcoded callback for the Azure VPN Client. Yes, it looks weird. No, you can’t change it.

VPN Gateway Configuration

resource "azurerm_virtual_network_gateway" "vpn" {
  name                = "eus-vng-clientvpn"
  location            = azurerm_resource_group.transit.location
  resource_group_name = azurerm_resource_group.transit.name

  type     = "Vpn"
  vpn_type = "RouteBased"
  sku      = "VpnGw1"

  ip_configuration {
    name                          = "vnetGatewayConfig"
    public_ip_address_id          = azurerm_public_ip.vpn_gateway.id
    private_ip_address_allocation = "Dynamic"
    subnet_id                     = azurerm_subnet.gateway.id
  }

  vpn_client_configuration {
    address_space        = [local.vpn_client_pool]
    vpn_client_protocols = ["OpenVPN"]

    aad_tenant   = "https://login.microsoftonline.com/${data.azurerm_client_config.current.tenant_id}/"
    aad_audience = azuread_application.vpn.client_id
    aad_issuer   = "https://sts.windows.net/${data.azurerm_client_config.current.tenant_id}/"
  }
}

Provisioning takes 30-45 minutes. Plan accordingly.

Handling Multi-Subscription Complexity

Our infrastructure spans many Azure subscriptions. Terraform needs to peer VNets across all of them to the transit VNet.

Provider Aliases for Cross-Subscription Resources

provider "azurerm" {
  features {}
}

provider "azurerm" {
  alias           = "data_prod"
  subscription_id = "<SUBSCRIPTION_ID_1>"
  features {}
}

provider "azurerm" {
  alias           = "infra_prod"
  subscription_id = "<SUBSCRIPTION_ID_2>"
  features {}
}

Each data source and peering resource targeting a different subscription needs the provider alias:

data "azurerm_virtual_network" "prod2" {
  provider            = azurerm.infra_prod
  name                = "crds_prod2-vnet"
  resource_group_name = "crds_prod2"
}

resource "azurerm_virtual_network_peering" "prod2_to_transit" {
  provider                     = azurerm.infra_prod
  name                         = "peer-prod2-to-transit"
  resource_group_name          = "prod2"
  virtual_network_name         = "prod2-vnet"
  remote_virtual_network_id    = azurerm_virtual_network.transit.id
  allow_forwarded_traffic      = true
  use_remote_gateways          = true
  allow_virtual_network_access = true
  depends_on                   = [azurerm_virtual_network_gateway.vpn]
}

GitHub Actions: Zero-Credential Deployment

OIDC Configuration

Traditional approaches store Azure service principal secrets in GitHub. We use OpenID Connect instead - GitHub generates short-lived tokens that Azure validates.

Azure Configuration:

APP_NAME="github-actions-oidc"
REPO="org/iac-infrastructure"

# Create app registration
az ad app create --display-name "$APP_NAME"
APP_ID=$(az ad app list --display-name "$APP_NAME" --query "[0].appId" -o tsv)
OBJECT_ID=$(az ad app list --display-name "$APP_NAME" --query "[0].id" -o tsv)

# Create service principal
az ad sp create --id $APP_ID

# Grant Owner on all subscriptions
for SUB in "${SUBSCRIPTION_IDS[@]}"; do
  az role assignment create \
    --assignee $APP_ID \
    --role Owner \
    --scope /subscriptions/$SUB
done

# Configure OIDC trust
az ad app federated-credential create \
  --id $OBJECT_ID \
  --parameters "{
    \"name\": \"github-main\",
    \"issuer\": \"https://token.actions.githubusercontent.com\",
    \"subject\": \"repo:$REPO:ref:refs/heads/main\",
    \"audiences\": [\"api://AzureADTokenExchange\"]
  }"

az ad app federated-credential create \
  --id $OBJECT_ID \
  --parameters "{
    \"name\": \"github-pr\",
    \"issuer\": \"https://token.actions.githubusercontent.com\",
    \"subject\": \"repo:$REPO:pull_request\",
    \"audiences\": [\"api://AzureADTokenExchange\"]
  }"

The subject claim enforces which repo and branch can authenticate. Someone can’t steal the token because:

  • GitHub signs the JWT with its private key
  • Azure validates the signature against GitHub’s public key
  • The subject claim must match exactly
  • Tokens expire in 15 minutes

Workflow Configuration

name: "Terraform CI/CD"
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

permissions:
  id-token: write
  contents: read
  pull-requests: write

jobs:
  terraform:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Azure Login (OIDC)
        uses: azure/login@v1
        with:
          client-id: ${{ vars.AZURE_CLIENT_ID }}
          tenant-id: ${{ vars.AZURE_TENANT_ID }}
          subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
      
      - name: Terraform Init
        run: terraform init
      
      - name: Terraform Plan
        run: terraform plan -out=tfplan
      
      - name: Terraform Apply
        if: github.ref == 'refs/heads/main'
        run: terraform apply -auto-approve tfplan

Key points:

  • permissions: id-token: write enables OIDC token generation
  • No secrets in the workflow - everything is variables
  • PRs get plans, main branch gets applies
  • State stored in Azure Storage (separate configuration)

State File Management

We initially used separate state files for PRs (dev.tfstate) and main (prod.tfstate). This caused chaos - PRs couldn’t see what was actually deployed.

Fix: Single state file for all branches:

- name: Create Backend Config
  run: |
    cat > backend.tf << EOF
    terraform {
      backend "azurerm" {
        resource_group_name  = "${{ vars.TERRAFORM_STATE_RG }}"
        storage_account_name = "${{ vars.TERRAFORM_STATE_STORAGE }}"
        container_name      = "tfstate"
        key                 = "${{ github.event.repository.name }}/${{ github.event.repository.name }}.tfstate"
      }
    }
    EOF    

Now PRs plan against actual deployed infrastructure. Much better.

User Experience

Client Setup

  1. Download Azure VPN Client from Mac App Store (free)
  2. Admin downloads VPN profile from Azure Portal
  3. Admin distributes single config file to all users
  4. Users import config, click Connect
  5. Azure AD authentication flow (with MFA)
  6. Connected

The config file contains zero secrets - just the gateway address and tenant info. Distribute it freely.

Access Control

Conditional Access policies control who can connect:

  • Require MFA
  • Require compliant device
  • Restrict by group membership
  • Restrict by location
  • Require managed device

All enforced at authentication time. No VPN-specific access control needed.

What I Learned

1. Azure VPN Gateway Can’t Route Public IPs

I initially wanted to route all traffic through the VPN, including public internet. Azure VPN Gateway doesn’t support this - it only routes:

  • The transit VNet CIDR
  • Peered VNet CIDRs

For public endpoint access (like SQL databases), traffic goes directly from the user’s internet connection. The VPN gateway’s public IP gets whitelisted on those resources.

This is actually fine. Users don’t need VPN for browsing the web.

2. Split Tunneling Is Sufficient

We don’t force all traffic through the VPN. Only traffic destined for Azure private networks routes through the tunnel. Everything else uses the local internet connection.

Benefits:

  • Better performance for general internet use
  • Lower bandwidth costs
  • Simpler troubleshooting

3. Provider Aliases Are Required for Multi-Subscription

You cannot pass subscription_id directly to azurerm resources or data sources. Provider aliases are the only way to manage resources across multiple subscriptions in a single Terraform configuration.

This is verbose but explicit. The alternative is separate Terraform configurations per subscription, which is worse.

4. Plan for 45-Minute VPN Gateway Deploys

VPN Gateway provisioning is slow. Plan accordingly:

  • Initial creation: 30-45 minutes
  • Configuration changes: 15-30 minutes
  • Don’t iterate rapidly on gateway config

Test everything in a separate subscription first.

Cost Analysis

Monthly costs:

  • VPN Gateway (VpnGw1): ~$140
  • Public IP: ~$3
  • Bandwidth: ~$0.087/GB outbound (first 100GB free)
  • Transit VNet: Free

Total: ~$145/month base + bandwidth

Compare to alternatives:

  • WireGuard on VM: ~$10/month + labor cost + on-call burden
  • Third-party VPN service: $6-18/user/month = $300-900/month for 50 users
  • Azure Virtual WAN + Firewall: ~$1,500+/month

For this use case (smaller amount of engineers, enterprise support requirements), Azure VPN Gateway hits the sweet spot.

Conclusion

I built a production-grade VPN solution with:

  • Zero stored credentials (OIDC everywhere)
  • Azure AD authentication with MFA
  • Complete infrastructure-as-code
  • Self-service access via Conditional Access policies
  • Automatic failover and enterprise SLAs

The GitOps workflow means network changes go through PR review, get tested in CI, and deploy automatically on merge. Adding a new VNet peering is a 5-line code change.

Total implementation time: ~2 days (including all the mistakes documented here).

Would we do it differently? Maybe skip the overlapping IP peering attempts and go straight to the “some things stay on public IPs” design. Otherwise, this architecture is solid.

The code is production-ready. The workflow is reliable. The users are happy. Ship it.