Ensure zone resilient outbound connectivity with NAT gateway

29th September 2022 Anthony Mashford 0 Comments

Our customers—across all industries—have a critical need for highly available and resilient cloud frameworks to ensure business continuity and adaptability of ever-growing workloads. One way that customers can achieve resilient and reliable infrastructures in Microsoft Azure (for outbound connectivity) is by setting up their deployments across availability zones in a region.

When customers need to connect outbound to the internet from their Azure infrastructures, Network Address Translation (NAT) gateway is the best way. NAT gateway is a zonal resource that is configured to subnets from the same virtual network, which means that it can be deployed to individual zones to allow outbound connectivity. Subnets and virtual networks, on the other hand, are regional constructs that are not restricted to individual zones. Subnets can contain virtual machine instances or scale sets spanning across multiple availability zones.

Even without being able to traverse multiple availability zones, NAT gateway still provides a highly resilient and reliable way to connect outbound to the internet. This is because it does not rely on any single compute instance like a virtual machine. Instead, NAT gateway leverages software-defined networking to operate as a fully managed and distributed service with built-in redundancy. This built-in redundancy means that customers are unlikely to experience individual NAT gateway resource outages or downtime in their Azure infrastructures.

To ensure that you have the optimal outbound configuration to meet your availability and security needs while also safeguarding against zonal outages, let’s look at how to create zone resilient setups in Azure with NAT gateway.

Zone resilient outbound connectivity scenarios with NAT gateway

Customer setup

Let's say you are a retailer who is preparing for an upcoming Black Friday event. You anticipate that traffic to your retail website will increase significantly on the day of the sale. You decide to deploy a virtual machine scale set (VMSS) so that way your compute resources can automatically scale out to meet the increased traffic demands. Scalability is not the only requirement you have in preparation for this event, but also resiliency and security. To ensure that you safeguard against potential zonal outages that could impact traffic flow, you decide to deploy these VMSS across multiple availability zones. In addition to using VMSS in multiple availability zones, you plan to use NAT gateway to handle all outbound traffic flow in a scalable, secure, and reliable manner.

How should you set up your NAT gateway with your VMSS across multiple availability zones? Let’s take a look at a few different configurations along with which setups will and won’t work.

Scenario 1: Set up a single zonal NAT gateway with your zone-spanning VMSS

First, you decide to deploy a single NAT gateway resource to availability zone 1 and your VMSS across all three availability zones within the same subnet. You then configure your NAT gateway to this single subnet and to a /28 public IP prefix, which provides you a contiguous set of 16 public IP addresses for connecting outbound. Does this setup safeguard you against potential zone outages? No.

Figure 1: A single zonal NAT gateway configured to a zone-spanning set of virtual machines does not provide optimal zone resiliency. NAT gateway is deployed out of zone 1 and configured to a subnet that contains a VMSS that spans across all three availability zones of the Azure region. If availability zone 1 goes down, outbound connectivity across all three zones will also go down.

Here’s why:

If the zone that goes down is also the zone in which NAT gateway has been deployed then all outgoing traffic from virtual machines across all zones will be blocked.
If the zone that goes down is different than the zone that NAT gateway has been deployed in, then outgoing traffic from the other zones will still occur and only virtual machines from the zone that has gone down will be impacted.

Scenario 2: Attach multiple NAT gateways to a single subnet

Since the previous configuration will not provide the highest degree of resiliency, you decide you will instead deploy 3 NAT gateway resources, one in each availability zone, and attach them to the subnet that contains the VMSS. Will this setup work? Unfortunately, no.

Figure 2: Multiple NAT gateways cannot be attached to a single subnet by design.

Here’s why:

A subnet cannot have more than one NAT gateway attached to it and it is not possible to set up multiple NAT gateways on a single subnet. When NAT gateway is configured to a subnet, NAT gateway becomes the default next hop type for network traffic before reaching the internet. Consequently, virtual machines in a subnet will source NAT to the public IP address(es) of NAT gateway before egressing to the internet. If more than one NAT gateway were to be attached to the same subnet, the subnet would not know which NAT gateway to use to send outbound traffic.

Scenario 3: Deploy zonal NAT gateways with zonally configured VMSS for optimal zone resiliency

What is the optimal solution then for creating a secure, resilient, and scalable outbound setup? The solution is to deploy a VMSS in each availability zone, configure each to their own respective subnet and then attach each subnet to a zonal NAT gateway resource.

Figure 3: Zonal NAT gateways configured to individual subnets for zonal VMSS provide optimal zone resiliency for outbound connectivity.

Deploying zonal NAT gateways to match the zones of the VMSS provides the greatest protection against zonal outages. Should one of the availability zones go down, the other two zones will still be able to egress outbound traffic from the other two zonal NAT gateway resources.

Summary of zone resilient scenarios with NAT gateway

Scenario	Description	Rating
Scenario 1	Set up a single zonal NAT gateway with your VMSS that spans across multiple availability zones but confined to a single subnet.	Not recommended: if the zone that NAT gateway is located in goes down then outbound connectivity for all VMs in the scale set goes down.
Scenario 2	Attach multiple zonal NAT gateways to a subnet that contains zone-spanning virtual machines.	Not possible: multiple NAT gateways cannot be associated to a single subnet by design.
Scenario 3	Deploy zonal NAT gateways to separate subnets with zonally configured VMSS.	Optimal configuration to provide zone resiliency and protect against outages.

FAQ on NAT gateway and availability zones

What does it mean to have a "no zone" NAT gateway?
- "No zone" is the default availability zone selected when you deploy a NAT gateway resource. No zone means that Azure places the NAT gateway resource into a zone for you, but you do not have visibility into which zone it is specifically placed. It is recommended that you deploy your NAT gateway to specific zones so that you know in which zone your NAT gateway resource resides. Once NAT gateway is deployed, the availability zone designation cannot be changed.
If I have Load Balancer or instance-level public IPs (IL PIPs) on virtual machines and NAT gateway deployed in the same virtual network and NAT gateway or an availability zone goes down, will Azure fall back to using Load Balancer or IL PIPs for all outbound traffic?
- Azure will not failover to using Load Balancer or IL PIPs for handling outbound traffic when NAT gateway is configured to a subnet. After NAT gateway has been attached to a subnet, the user-defined route (UDR) at the source virtual machine will always direct virtual machine–initiated packets to the NAT gateway even if the NAT gateway goes down.

Learn more

Source: Azure Blog Feed

Mashford's Musings