Cut EKS & NAT Gateway Costs in 2026: An Advanced Guide

In this article, we cover advanced strategies to significantly reduce your EKS and NAT Gateway costs in 2026. You will learn how to implement VPC Endpoints to bypass NAT, leverage shared NAT Gateways across multiple EKS clusters, and optimize EKS node provisioning with Karpenter. We also discuss monitoring and production readiness for these cost-saving measures.

Ahmet Çelik

11 min read
0

/

Cut EKS & NAT Gateway Costs in 2026: An Advanced Guide

Most teams deploy an EKS cluster with a dedicated VPC and NAT Gateways in each public subnet for outbound internet access. But this default pattern leads to substantial, often overlooked, data processing and hourly charges at scale, particularly when services only need to reach other AWS endpoints.


TL;DR Box


  • Identify and eliminate unnecessary NAT Gateway traffic by routing internal AWS service communication through VPC Endpoints.

  • Consolidate multiple NAT Gateways into a single, shared NAT Gateway VPC, leveraging VPC peering for significant cost savings.

  • Optimize EKS compute costs and reduce overall resource footprint using Karpenter for intelligent, right-sized node provisioning.

  • Implement robust monitoring and alerting for NAT Gateway data transfer and VPC Endpoint usage to maintain cost efficiency.

  • Strategically review network architecture and service dependencies to avoid common pitfalls in cost optimization efforts.


The Cost Burden of EKS and NAT Gateways in Production


Operating Amazon EKS at scale involves managing compute, storage, and networking costs. While EC2 instances and EBS volumes often capture immediate attention, NAT Gateway charges frequently become a silent budget drain. Each NAT Gateway incurs an hourly charge ($0.045/hour in `us-east-1` as of 2026) and a data processing charge ($0.045/GB). For highly active EKS clusters, especially those with numerous microservices interacting with AWS services like ECR, S3, or DynamoDB, these data processing charges accumulate rapidly.


Consider a production EKS environment handling significant traffic. Microservices frequently pull container images from ECR, store and retrieve objects from S3, or send metrics to CloudWatch. If your EKS worker nodes reside in private subnets and these AWS service API calls egress through a NAT Gateway, every gigabyte transferred contributes to your bill. Teams commonly report 10-20% of their overall EKS infrastructure costs attributed to NAT Gateway charges when default networking configurations are in place. This scenario becomes particularly expensive if you operate multiple EKS clusters, each with its own set of NAT Gateways, leading to redundant hourly and data processing expenses.


How to Reduce EKS and NAT Gateway Costs in 2026


Optimizing EKS and NAT Gateway costs requires a multi-pronged approach that addresses both network architecture and compute resource management.


Eliminating NAT Gateway Traffic with VPC Endpoints


Many AWS services offer VPC Endpoints, allowing private connectivity directly from your VPC without traversing the internet or a NAT Gateway. For EKS clusters, routing traffic to services like ECR, S3, SQS, STS, and CloudWatch through VPC Endpoints directly eliminates the associated NAT Gateway data processing charges. This strategy improves security by keeping traffic within the AWS network and often enhances performance by reducing latency. Interface Endpoints (powered by AWS PrivateLink) are suitable for most services, while S3 uses a Gateway Endpoint.


Your EKS worker nodes, running in private subnets, resolve AWS service domain names to the private IP addresses of the VPC Endpoints. This ensures all communication remains within your VPC and AWS network backbone.


# Deploys VPC Interface Endpoints for common AWS services to avoid NAT Gateway egress in 2026
# Ensure your VPC and private subnets are already defined in your Terraform configuration.

resource "aws_security_group" "vpc_endpoint" {
  name        = "vpc-endpoint-sg-2026"
  description = "Security group for VPC Endpoints"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [aws_vpc.main.cidr_block] # Allow access from within the VPC
    description = "Allow HTTPS from VPC"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "Allow all outbound traffic" # Can be restricted further
  }
  tags = {
    Name = "vpc-endpoint-sg-2026"
  }
}

resource "aws_vpc_endpoint" "ecr_dkr" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${data.aws_region.current.name}.ecr.dkr"
  vpc_endpoint_type   = "Interface"
  security_group_ids  = [aws_security_group.vpc_endpoint.id]
  subnet_ids          = aws_subnet.private.*.id # Use your private subnet IDs
  private_dns_enabled = true
  tags = {
    Name = "eks-ecr-dkr-endpoint-2026"
  }
}

resource "aws_vpc_endpoint" "ecr_api" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${data.aws_region.current.name}.ecr.api"
  vpc_endpoint_type   = "Interface"
  security_group_ids  = [aws_security_group.vpc_endpoint.id]
  subnet_ids          = aws_subnet.private.*.id # Use your private subnet IDs
  private_dns_enabled = true
  tags = {
    Name = "eks-ecr-api-endpoint-2026"
  }
}

resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${data.aws_region.current.name}.s3"
  vpc_endpoint_type = "Gateway" # S3 uses Gateway endpoint for performance
  route_table_ids   = aws_route_table.private.*.id # Use route tables associated with your private subnets
  tags = {
    Name = "eks-s3-gateway-endpoint-2026"
  }
}

# Example for STS (Security Token Service)
resource "aws_vpc_endpoint" "sts" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${data.aws_region.current.name}.sts"
  vpc_endpoint_type   = "Interface"
  security_group_ids  = [aws_security_group.vpc_endpoint.id]
  subnet_ids          = aws_subnet.private.*.id
  private_dns_enabled = true
  tags = {
    Name = "eks-sts-endpoint-2026"
  }
}

This configuration ensures that traffic from your EKS pods to ECR, S3, and STS no longer flows through the NAT Gateway. This change drastically reduces data processing charges for these specific services. Remember to include all relevant services that your EKS workloads interact with.


Optimizing AWS Network Costs: Consolidating NAT Gateways


Many organizations operate multiple EKS clusters, often in separate VPCs for isolation or multi-tenant architectures. By default, each EKS VPC typically has its own set of NAT Gateways (one per AZ for high availability). This leads to duplicated hourly charges and potentially redundant data processing fees across clusters. A more cost-effective strategy involves creating a dedicated "shared services" VPC where NAT Gateways are deployed, and then peering other EKS VPCs to this shared VPC. This centralizes internet egress and reduces the number of NAT Gateways required.


This approach introduces a single point of failure at the shared NAT Gateway level, which you mitigate by deploying NAT Gateways in multiple Availability Zones within the shared VPC. It also adds complexity to routing, as EKS VPCs must have route table entries directing `0.0.0.0/0` traffic towards the peering connection, which then routes to the shared NAT. Careful planning of network ACLs and security groups is crucial.


# Deploys a highly available NAT Gateway in a dedicated shared services VPC in 2026
# This example creates a shared VPC and one NAT Gateway in AZ1.
# For high availability, you would repeat the EIP, subnet, and NAT Gateway for AZ2, AZ3, etc.

data "aws_region" "current" {}

resource "aws_vpc" "shared_services" {
  cidr_block = "10.100.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags = {
    Name = "shared-services-vpc-2026"
  }
}

resource "aws_internet_gateway" "shared_igw" {
  vpc_id = aws_vpc.shared_services.id
  tags = {
    Name = "shared-igw-2026"
  }
}

# Public subnet for NAT Gateway in AZ1
resource "aws_subnet" "shared_public_az1" {
  vpc_id            = aws_vpc.shared_services.id
  cidr_block        = "10.100.1.0/24"
  availability_zone = "${data.aws_region.current.name}a" # Adjust for your region's AZs
  map_public_ip_on_launch = true
  tags = {
    Name = "shared-public-az1-2026"
  }
}

resource "aws_eip" "nat_gateway_eip_az1" {
  vpc        = true
  depends_on = [aws_internet_gateway.shared_igw]
  tags = {
    Name = "shared-nat-eip-az1-2026"
  }
}

resource "aws_nat_gateway" "shared_nat_az1" {
  allocation_id = aws_eip.nat_gateway_eip_az1.id
  subnet_id     = aws_subnet.shared_public_az1.id
  tags = {
    Name = "shared-nat-gateway-az1-2026"
  }
}

resource "aws_route_table" "shared_public" {
  vpc_id = aws_vpc.shared_services.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.shared_igw.id
  }
  tags = {
    Name = "shared-public-rt-2026"
  }
}

resource "aws_route_table_association" "shared_public_az1_assoc" {
  subnet_id      = aws_subnet.shared_public_az1.id
  route_table_id = aws_route_table.shared_public.id
}

# To utilize this:
# 1. Create a VPC Peering Connection between your EKS VPC and aws_vpc.shared_services.
# 2. Update route tables in your EKS VPC's private subnets to route 0.0.0.0/0 traffic
#    through the VPC Peering Connection.
# 3. Ensure appropriate security group and network ACL rules are in place.

This pattern provides a centralized point of egress, leading to fewer NAT Gateway instances and consolidated data processing. It requires careful routing configuration for each EKS VPC that peers with the shared services VPC.


EKS Compute Cost Reduction with Karpenter


While not directly reducing NAT Gateway costs, an optimized EKS compute footprint indirectly lessens network traffic and overall infrastructure spend. Karpenter, an open-source, high-performance Kubernetes node provisioner, focuses on rightsizing and optimizing node utilization. Unlike the Cluster Autoscaler, which reacts to pending pods and provisions nodes based on a pre-defined group, Karpenter watches for unschedulable pods and directly provisions optimal EC2 instances for them. It considers factors like pod requirements, available Spot instance capacity, instance types, and pricing. This ensures your EKS cluster runs on the most cost-effective and appropriate compute resources.


By reducing the number of underutilized nodes and leveraging cheaper Spot instances, Karpenter lowers your overall EC2 bill. This in turn reduces the number of running pods and potential outbound connections, thereby decreasing the aggregated NAT Gateway data processing volume.


# A Karpenter Provisioner for an EKS cluster, leveraging Spot instances in 2026
apiVersion: karpenter.sh/v1beta1
kind: Provisioner
metadata:
  name: default
spec:
  # Provision nodes only for specific Availability Zones
  requirements:
    - key: topology.kubernetes.io/zone
      operator: In
      values: ["${data.aws_region.current.name}a", "${data.aws_region.current.name}b"] # Replace with your AZs
    - key: kubernetes.io/arch
      operator: In
      values: ["amd64"]
    - key: karpenter.k8s.aws/instance-category
      operator: In
      values: ["c", "m", "r"] # Common instance categories for compute, memory, general purpose
    - key: karpenter.k8s.aws/instance-generation
      operator: Gt
      values: ["5"] # Prefer newer generation instances for better performance/cost
    - key: karpenter.k8s.aws/capacity-type
      operator: In
      values: ["spot", "on-demand"] # Prioritize Spot instances for cost savings
  
  # Reference to the AWSNodeTemplate which defines AMI, instance profiles, and other AWS-specific settings
  # This assumes you have a NodeTemplate named 'default' configured elsewhere.
  providerRef:
    name: default
  
  # Configure kubelet settings for the provisioned nodes
  kubeletConfiguration:
    containerRuntime: containerd
    maxPods: 110 # Adjust based on your pod density requirements
  
  # Set a consolidation policy to remove underutilized nodes and replace with cheaper ones
  consolidation:
    enabled: true
    
  # Node expiration policy: terminate empty nodes after 5 minutes, expire all nodes after 7 days
  # This helps maintain fleet hygiene and encourages use of newer instance types/AMIs.
  ttlSecondsAfterEmpty: 300 
  ttlSecondsUntilExpired: 604800 

Karpenter works by directly communicating with the EC2 API to launch instances. For existing EKS clusters, carefully plan the migration from Cluster Autoscaler to Karpenter. Karpenter typically provides better consolidation and faster scaling than Cluster Autoscaler, especially for heterogeneous workloads. According to AWS internal benchmarks, Karpenter can achieve up to a 70% reduction in compute costs for burstable workloads by efficiently using Spot instances and bin-packing.


Step-by-Step Implementation: Deploying VPC Endpoints


Implementing VPC Endpoints is one of the most direct methods to reduce NAT Gateway costs. Here's a practical guide using the `terraform` code provided earlier.


Step 1: Identify Your Current Outbound Traffic Patterns


Before deploying VPC Endpoints, understand which AWS services your EKS pods are frequently interacting with.

  • Method A: CloudWatch Metrics: Monitor `NATGatewayBytesProcessedOut` metric for your existing NAT Gateways. Look for spikes or consistent high usage.

  • Method B: VPC Flow Logs: Analyze VPC Flow Logs (e.g., in CloudWatch Logs Insights or S3) to identify destination IP addresses and ports that correspond to AWS service endpoints. Filter by your EKS worker node private IP ranges.

  • Method C: EKS Pod Activity: Inspect application logs or use tools like `kubectl top pod` and `kubectl exec` to see what external services your applications are calling.


Example CloudWatch Logs Insights query to find top external destinations from EKS subnets:

fields @timestamp, @message
| filter srcAddr like '10.0.0.%' # Replace with your EKS private subnet CIDR
| filter dstPort != 22 and dstPort != 443 # Filter out common internal or known external SSH/HTTPS
| stats sum(bytes) as total_bytes by dstAddr, dstPort
| sort total_bytes desc
| limit 20

This query helps surface non-standard or high-volume traffic destinations that might be good candidates for VPC Endpoints.


Step 2: Deploy VPC Endpoints Using Terraform


Ensure you have a `main.tf` file (or similar) with your VPC and private subnet definitions. Add the `awsvpcendpoint` resources from the "Eliminating NAT Gateway Traffic" section.


# Initialize Terraform in your project directory
$ terraform init

# Review the planned changes before applying
$ terraform plan

# Apply the Terraform configuration to deploy the VPC Endpoints
$ terraform apply --auto-approve


Expected Output (Terraform Apply):

...
aws_security_group.vpc_endpoint: Creating...
aws_security_group.vpc_endpoint: Creation complete after 1s [id=sg-0abcdef1234567890]
aws_vpc_endpoint.ecr_dkr: Creating...
aws_vpc_endpoint.ecr_api: Creating...
aws_vpc_endpoint.s3: Creating...
aws_vpc_endpoint.sts: Creating...
aws_vpc_endpoint.ecr_dkr: Creation complete after 1m15s [id=vpce-0123456789abcdef0]
aws_vpc_endpoint.ecr_api: Creation complete after 1m10s [id=vpce-0abcdef1234567890]
aws_vpc_endpoint.sts: Creation complete after 1m12s [id=vpce-0fedcba9876543210]
aws_vpc_endpoint.s3: Creation complete after 30s [id=vpce-01a2b3c4d5e6f7a8b]

Apply complete! Resources: 5 added, 0 changed, 0 destroyed.


Step 3: Verify Traffic Redirection


After deploying the endpoints, confirm that your EKS traffic is now using them instead of the NAT Gateway.


  1. Check Private DNS Resolution:

* Find an active pod in your EKS cluster:

```bash

$ kubectl get pods -n

```

* Exec into the pod and try resolving an AWS service endpoint:

```bash

$ kubectl exec -it -n -- nslookup ecr.us-east-1.amazonaws.com

```

Expected Output: You should see resolution to a private IP address (e.g., `10.0.x.x`), indicating the VPC Endpoint's private DNS. Without the endpoint, it would resolve to a public AWS IP.


  1. Monitor NAT Gateway Metrics:

* Go to AWS CloudWatch console.

* Navigate to Metrics > All Metrics > EC2 > NAT Gateway.

* Select `NATGatewayBytesProcessedOut` and `NATGatewayBytesProcessedIn` for your NAT Gateways.

* Observe a significant drop in traffic for the services you configured with VPC Endpoints. This reduction confirms the traffic is no longer flowing through the NAT Gateway.


Common mistake: Incorrect security group rules on the VPC Endpoints or the EKS worker nodes. The endpoint security group must allow inbound traffic on port 443 from the EKS private subnets or the EKS worker node security group. Similarly, the EKS worker node security group must allow outbound 443 traffic to the VPC Endpoint's security group. Without these, connectivity will fail, and traffic may still attempt to egress via the NAT Gateway or simply fail.


Production Readiness


Deploying cost optimization measures requires ensuring stability, observability, and security.


Monitoring and Alerting


  • NAT Gateway Data Processed: Monitor `NATGatewayBytesProcessedOut` and `NATGatewayBytesProcessedIn` in CloudWatch. Set up alarms for unexpected spikes after implementing VPC Endpoints or for abnormally high traffic on shared NAT Gateways.

  • VPC Endpoint Usage: While not directly exposed as "usage" metrics, monitor network interface metrics (`NetworkIn`, `NetworkOut`, `PacketsIn`, `PacketsOut`) for the ENIs created by Interface Endpoints. This helps confirm traffic is flowing through them.

  • Karpenter Node Provisioning: Monitor Karpenter's logs (in CloudWatch Logs or your chosen logging solution) for provisioning failures, pod unschedulability, or unexpected node terminations. Use `kubectl get events` for Karpenter-related events.

  • Cost Explorer: Regularly review your AWS Cost Explorer to track actual spend. Create filters for NAT Gateway and EC2 costs to directly attribute savings.


Security and Edge Cases


  • VPC Endpoint Policies: For Interface Endpoints, attach an endpoint policy to control which IAM principals and resources can access the AWS service through the endpoint. Restrict access to only your EKS cluster's IAM roles. For S3 Gateway Endpoints, policy statements can filter access based on source VPC, source IP, or specific S3 buckets.

  • Shared NAT Gateway Blast Radius: In a shared NAT Gateway setup, a misconfigured firewall rule or an application generating excessive traffic could impact multiple peered EKS clusters. Implement robust network ACLs (NACLs) and security groups within the shared services VPC and EKS VPCs to segment traffic. Consider using a network firewall appliance in the shared VPC for advanced traffic inspection and egress filtering.

  • Private DNS Resolution: Ensure that DNS resolution within your VPC (via Route 53 Resolver or EC2 DNS) correctly directs traffic to VPC Endpoints. If `privatednsenabled` is true on Interface Endpoints, this is typically handled automatically, but confirm it in case of custom DNS configurations.

  • Service Link: Some AWS services require specific permissions or configurations (e.g., Service-Linked Roles) to interact correctly with EKS or other services. Verify these remain intact after network changes.

  • IPv6: If your EKS cluster uses IPv6, ensure your NAT Gateway and VPC Endpoints are configured for IPv6 traffic if necessary, as IPv6 NAT (Egress-Only Internet Gateway) is separate.


Summary & Key Takeaways


The default EKS networking configuration, while functional, often leads to significant, unoptimized costs for NAT Gateway data processing and hourly charges. Proactive architectural adjustments are essential for long-term budget health.


  • Implement VPC Endpoints broadly: Prioritize AWS services like ECR, S3, STS, SQS, and CloudWatch that your EKS workloads frequently interact with. This directly eliminates data processing charges for traffic staying within AWS.

  • Consolidate NAT Gateways: For multi-EKS cluster environments, centralize internet egress through a shared NAT Gateway VPC. Carefully plan peering connections, routing, and security policies to balance cost savings with operational complexity.

  • Optimize EKS Compute with Karpenter: Leverage Karpenter to dynamically provision the most appropriate and cost-effective EC2 instances for your EKS pods. This reduces overall compute costs, which indirectly contributes to lower network traffic.

  • Monitor and Iterate: Continuously monitor NAT Gateway data transfer, VPC Endpoint usage, and your AWS bill. Treat cost optimization as an ongoing process, adapting your strategies as your EKS workloads evolve.

  • Avoid Unchecked Growth: Do not allow new EKS clusters or services to default to unoptimized network patterns. Establish architectural guidelines that mandate VPC Endpoint usage and shared network services where appropriate from the outset.

WRITTEN BY

Ahmet Çelik

Former AWS Solutions Architect, 8 years in cloud and infrastructure. Computer Engineering graduate, Bilkent University. Lead writer for AWS, Terraform and Kubernetes content.Read more

Responses (0)

    Hottest authors

    View all

    Ahmet Çelik

    Lead Writer · ex-AWS Solutions Architect, 8 yrs · AWS, Terraform, K8s

    Alp Karahan

    Contributor · MongoDB certified, NoSQL specialist · MongoDB, DynamoDB

    Ayşe Tunç

    Lead Writer · Engineering Manager, ex-Meta, Google · System Design, Interviews

    Berk Avcı

    Lead Writer · Principal Backend Eng., API design · REST, GraphQL, gRPC

    Burak Arslan

    Managing Editor · Content strategy, developer marketing

    Cansu Yılmaz

    Lead Writer · Database Architect, 9 yrs Postgres · PostgreSQL, Indexing, Perf

    Popular posts

    View all
    Emre Yıldız
    ·

    Cloud Architecture Review Checklist for High-Growth Startups

    Cloud Architecture Review Checklist for High-Growth Startups
    Cansu Yılmaz
    ·

    PostgreSQL Performance Tuning Guide: Query & Index Optimization

    PostgreSQL Performance Tuning Guide: Query & Index Optimization
    Deniz Şahin
    ·

    Cloud Run Cold Start Optimization for API Workloads

    Cloud Run Cold Start Optimization for API Workloads
    Deniz Şahin
    ·

    BigQuery Partitioning & Clustering Best Practices 2026

    BigQuery Partitioning & Clustering Best Practices 2026
    Ahmet Çelik
    ·

    CloudFront vs ALB vs API Gateway: Choosing the Right API Front Door

    CloudFront vs ALB vs API Gateway: Choosing the Right API Front Door
    Ahmet Çelik
    ·

    Multi-Account AWS VPC Design Best Practices for 2026

    Multi-Account AWS VPC Design Best Practices for 2026