TL;DR
- VPC Peering offers direct, low-latency connectivity between two VPCs, ideal for simple, point-to-point communication but rapidly becomes unmanageable as the number of VPCs grows.
- AWS Transit Gateway establishes a centralized hub for routing traffic between tens or hundreds of VPCs, multiple AWS accounts, and on-premises networks, simplifying network architecture and management.
- AWS PrivateLink provides secure, private consumption of services across VPCs and accounts without requiring peering connections, route tables, or IP address overlap management.
- Select the appropriate solution based on your network's scale, security isolation requirements, and the desired level of operational overhead for inter-VPC communication.
- Misapplying these AWS networking constructs can lead to significant management complexity, security vulnerabilities, and unforeseen operational costs as your infrastructure evolves.
Most engineering teams initially address inter-VPC communication needs with VPC Peering. This approach often seems straightforward for connecting a few isolated environments, perhaps a shared services VPC to an application VPC. However, relying solely on VPC Peering quickly leads to an unmanageable mesh of connections, making network topology opaque, difficult to troubleshoot, and a significant security burden as an organization scales its AWS footprint.
The Escalating Complexity of Cloud Networking
As organizations mature in AWS, they typically adopt a multi-account, multi-VPC strategy. This often involves dedicated VPCs for different environments (development, staging, production), various application teams, and shared services (like logging, monitoring, or identity). The challenge then shifts from simply deploying resources to enabling secure and efficient communication between these isolated network segments.
Consider a scenario where an enterprise has 10 distinct VPCs across different accounts, each requiring communication with several others. If every VPC needs to talk to every other VPC directly, you'd end up with N * (N-1) / 2 peering connections. For 10 VPCs, this translates to 45 separate peering connections, each requiring manual route table updates and security group configurations in both directions. For 20 VPCs, this jumps to 190 connections. This "spaghetti network" becomes a significant operational burden, prone to misconfigurations, and incredibly difficult to audit for security compliance. Teams commonly report a 30-50% increase in network troubleshooting time and a corresponding rise in security incidents due to complex, distributed access controls when this pattern emerges.
This escalating complexity drives the need for a more architectural approach to inter-VPC connectivity. Engineers require solutions that scale gracefully, enforce consistent security policies, and reduce the operational overhead inherent in managing distributed network topologies.
How It Works: Deconstructing Inter-VPC Connectivity
Understanding the fundamental mechanisms of VPC Peering, Transit Gateway, and PrivateLink is crucial for making informed architectural decisions. Each serves a distinct purpose and comes with its own set of trade-offs.
VPC Peering: The Direct Link
VPC Peering creates a direct network connection between two VPCs. This connection allows instances in either VPC to communicate with each other as if they are within the same network, using private IP addresses. Crucially, VPC Peering is non-transitive. If VPC A is peered with VPC B, and VPC B is peered with VPC C, VPC A cannot directly communicate with VPC C through VPC B. This fundamental limitation drives much of its scaling challenge.
Benefits:
Simplicity: For connecting two VPCs, setup is straightforward.
Low Latency: Traffic flows directly between the peered VPCs.
Cost-Effective: Primarily incurs data transfer costs across Availability Zones or regions, with no hourly connection charges.
Drawbacks:
Non-Transitive: No hub-and-spoke or mesh connectivity without creating a direct connection between every pair of VPCs.
Scaling Nightmare: As discussed, the N(N-1)/2 problem makes it impractical for more than a handful of VPCs.
IP Overlap: Peering connections cannot be established between VPCs with overlapping CIDR blocks, a common issue in large organizations.
Management Overhead: Requires managing route tables and security groups in every peered VPC.
Let's illustrate setting up a VPC peering connection using Terraform for two hypothetical VPCs: a `provider-vpc` and a `consumer-vpc`.
Define the provider VPC
resource "awsvpc" "providervpc" {
cidr_block = "10.0.0.0/16"
enablednshostnames = true
enablednssupport = true
tags = {
Name = "backendstack-provider-vpc-2026"
}
}
Define a subnet for the provider VPC
resource "awssubnet" "providersubnet" {
vpcid = awsvpc.provider_vpc.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
tags = {
Name = "backendstack-provider-subnet-2026"
}
}
Define the consumer VPC
resource "awsvpc" "consumervpc" {
cidr_block = "10.1.0.0/16"
enablednshostnames = true
enablednssupport = true
tags = {
Name = "backendstack-consumer-vpc-2026"
}
}
Define a subnet for the consumer VPC
resource "awssubnet" "consumersubnet" {
vpcid = awsvpc.consumer_vpc.id
cidr_block = "10.1.1.0/24"
availability_zone = "us-east-1a"
tags = {
Name = "backendstack-consumer-subnet-2026"
}
}
Request a VPC peering connection from provider to consumer
resource "awsvpcpeeringconnection" "examplepeering" {
peervpcid = awsvpc.consumervpc.id
vpcid = awsvpc.provider_vpc.id
auto_accept = false # Set to true if both VPCs are in the same account and permissions allow
tags = {
Name = "backendstack-provider-consumer-peering-2026"
}
}
Accept the VPC peering connection in the consumer account (or VPC if same account)
resource "awsvpcpeeringconnectionaccepter" "examplepeeringaccepter" {
vpcpeeringconnectionid = awsvpcpeeringconnection.example_peering.id
auto_accept = true # Automatically accept if permissions are granted
tags = {
Name = "backendstack-consumer-provider-peering-accepter-2026"
}
}
Get the ID of the route table for the provider subnet
data "awsroutetable" "providerroutetable" {
subnetid = awssubnet.provider_subnet.id
}
Update the provider VPC's route table to direct traffic for the consumer VPC's CIDR block
resource "awsroute" "providervpctoconsumer" {
routetableid = data.awsroutetable.providerroutetable.id
destinationcidrblock = awsvpc.consumervpc.cidr_block
vpcpeeringconnectionid = awsvpcpeeringconnection.example_peering.id
}
Get the ID of the route table for the consumer subnet
data "awsroutetable" "consumerroutetable" {
subnetid = awssubnet.consumer_subnet.id
}
Update the consumer VPC's route table to direct traffic for the provider VPC's CIDR block
resource "awsroute" "consumervpctoprovider" {
routetableid = data.awsroutetable.consumerroutetable.id
destinationcidrblock = awsvpc.providervpc.cidr_block
vpcpeeringconnectionid = awsvpcpeeringconnection.example_peering.id
}
Transit Gateway: The Centralized Hub
AWS Transit Gateway (TGW) acts as a highly scalable network transit hub that connects VPCs and on-premises networks through a central gateway. It simplifies your network architecture by eliminating complex peering relationships and providing a single point of control for routing. TGW supports transitive routing, meaning VPCs attached to the TGW can communicate with each other, and with any other network connected to the TGW (e.g., Direct Connect, VPN).
Benefits:
Scalability: Connects thousands of VPCs and on-premises networks without the N(N-1)/2 issue.
Simplified Routing: Centralized route tables manage traffic flow across all attached networks.
Transitive Routing: Enables any attached network to communicate with any other attached network.
Cross-Account Support: Easily shares a TGW across multiple AWS accounts, fostering a true hub-and-spoke model.
Inter-Region Peering: Allows TGWs in different regions to connect, extending the global network.
Multicast Support: Unique capability among AWS networking services for certain applications.
Drawbacks:
Cost: Higher hourly charges for the TGW itself, plus data processing charges. Can be more expensive than peering for very small, simple setups.
Centralized Point: While highly available, misconfigurations on the TGW can impact a wide range of connected networks.
Traffic Inspection: Requires additional services (like Gateway Load Balancer or VPC Network Firewall) for centralized traffic inspection between attached VPCs.
Here's how to set up a Transit Gateway connection between our two VPCs using Terraform, replacing the peering setup.
Define the provider VPC (same as before)
resource "awsvpc" "providervpc" {
cidr_block = "10.0.0.0/16"
enablednshostnames = true
enablednssupport = true
tags = {
Name = "backendstack-provider-vpc-2026"
}
}
Define a subnet for the provider VPC (same as before)
resource "awssubnet" "providersubnet" {
vpcid = awsvpc.provider_vpc.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
tags = {
Name = "backendstack-provider-subnet-2026"
}
}
Define the consumer VPC (same as before)
resource "awsvpc" "consumervpc" {
cidr_block = "10.1.0.0/16"
enablednshostnames = true
enablednssupport = true
tags = {
Name = "backendstack-consumer-vpc-2026"
}
}
Define a subnet for the consumer VPC (same as before)
resource "awssubnet" "consumersubnet" {
vpcid = awsvpc.consumer_vpc.id
cidr_block = "10.1.1.0/24"
availability_zone = "us-east-1a"
tags = {
Name = "backendstack-consumer-subnet-2026"
}
}
Create a Transit Gateway
resource "awsec2transitgateway" "exampletgw" {
description = "BackendStack TGW 2026"
amazonsideasn = 64512 # Example private ASN
autoacceptshared_attachments = true
defaultroutetable_association = "enable"
defaultroutetable_propagation = "enable"
tags = {
Name = "backendstack-example-tgw-2026"
}
}
Attach the provider VPC to the Transit Gateway
resource "awsec2transitgatewayvpcattachment" "providervpc_attachment" {
vpcid = awsvpc.provider_vpc.id
transitgatewayid = awsec2transitgateway.exampletgw.id
subnetids = [awssubnet.provider_subnet.id]
transitgatewaydefaultroutetable_association = false # We'll manage routes explicitly
transitgatewaydefaultroutetable_propagation = false # We'll manage routes explicitly
tags = {
Name = "backendstack-provider-vpc-tgw-attachment-2026"
}
}
Attach the consumer VPC to the Transit Gateway
resource "awsec2transitgatewayvpcattachment" "consumervpc_attachment" {
vpcid = awsvpc.consumer_vpc.id
transitgatewayid = awsec2transitgateway.exampletgw.id
subnetids = [awssubnet.consumer_subnet.id]
transitgatewaydefaultroutetable_association = false
transitgatewaydefaultroutetable_propagation = false
tags = {
Name = "backendstack-consumer-vpc-tgw-attachment-2026"
}
}
Create a Transit Gateway Route Table
resource "awsec2transitgatewayroutetable" "maintgw_rt" {
transitgatewayid = awsec2transitgateway.exampletgw.id
tags = {
Name = "backendstack-main-tgw-rt-2026"
}
}
Associate VPC attachments with the TGW Route Table
resource "awsec2transitgatewayroutetableassociation" "providervpcassociation" {
transitgatewayattachmentid = awsec2transitgatewayvpcattachment.providervpcattachment.id
transitgatewayroutetableid = awsec2transitgatewayroutetable.maintgw_rt.id
}
resource "awsec2transitgatewayroutetableassociation" "consumervpcassociation" {
transitgatewayattachmentid = awsec2transitgatewayvpcattachment.consumervpcattachment.id
transitgatewayroutetableid = awsec2transitgatewayroutetable.maintgw_rt.id
}
Propagate routes from VPC attachments to the TGW Route Table
resource "awsec2transitgatewayroutetablepropagation" "providervpcpropagation" {
transitgatewayattachmentid = awsec2transitgatewayvpcattachment.providervpcattachment.id
transitgatewayroutetableid = awsec2transitgatewayroutetable.maintgw_rt.id
}
resource "awsec2transitgatewayroutetablepropagation" "consumervpcpropagation" {
transitgatewayattachmentid = awsec2transitgatewayvpcattachment.consumervpcattachment.id
transitgatewayroutetableid = awsec2transitgatewayroutetable.maintgw_rt.id
}
Get the ID of the route table for the provider subnet
data "awsroutetable" "providerroutetable" {
subnetid = awssubnet.provider_subnet.id
}
Update provider VPC route table to send traffic to the TGW
resource "awsroute" "providervpctotgw" {
routetableid = data.awsroutetable.providerroutetable.id
destinationcidrblock = "0.0.0.0/0" # Or specific CIDRs of other VPCs
transitgatewayid = awsec2transitgateway.exampletgw.id
}
Get the ID of the route table for the consumer subnet
data "awsroutetable" "consumerroutetable" {
subnetid = awssubnet.consumer_subnet.id
}
Update consumer VPC route table to send traffic to the TGW
resource "awsroute" "consumervpctotgw" {
routetableid = data.awsroutetable.consumerroutetable.id
destinationcidrblock = "0.0.0.0/0" # Or specific CIDRs of other VPCs
transitgatewayid = awsec2transitgateway.exampletgw.id
}
PrivateLink: Private Service Consumption
AWS PrivateLink is distinct from VPC Peering and Transit Gateway as it focuses on service consumption rather than general network routing. PrivateLink allows you to expose a service (backed by an AWS Network Load Balancer) in your VPC (the service provider) to consumers in other VPCs, whether they are in the same account, different accounts, or even different AWS organizations. It creates an interface endpoint (powered by an Elastic Network Interface or ENI) in the consumer's VPC, making the service appear as if it's hosted natively within their network, without requiring any complex routing, peering connections, or exposing traffic to the public internet.
Benefits:
Enhanced Security: Traffic stays entirely within the AWS network and doesn't traverse the public internet, even for cross-account or cross-region access.
Simplified Network Architecture: No VPC peering, VPN connections, or Transit Gateway required between provider and consumer VPCs. No route table updates needed in the consumer VPC for the service endpoint.
No IP Overlap Issues: Since PrivateLink uses ENIs with private IP addresses in the consumer VPC, IP address range overlaps between provider and consumer VPCs are not a concern.
Scalability for Services: Consumers only need to create an endpoint, abstracting away the service provider's network complexity.
Granular Control: Service providers can approve or reject connections from specific AWS accounts.
Drawbacks:
Service-Oriented: Designed for exposing specific services via NLBs, not for general inter-VPC communication or routing of arbitrary network traffic.
Cost: Each interface endpoint and data processing incurs charges, which can accumulate if many consumers access many different services via PrivateLink.
NLB Requirement: The service must be exposed via a Network Load Balancer (NLB) in the provider VPC.
DNS Management: While AWS provides private DNS for endpoints, custom DNS solutions might be needed for seamless integration.
Consider exposing a private API service from the `provider-vpc` to the `consumer-vpc` using PrivateLink.
Define the provider VPC (same as before)
resource "awsvpc" "providervpc" {
cidr_block = "10.0.0.0/16"
enablednshostnames = true
enablednssupport = true
tags = {
Name = "backendstack-provider-vpc-2026"
}
}
Define a subnet for the provider VPC (same as before)
resource "awssubnet" "providersubnet" {
vpcid = awsvpc.provider_vpc.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
tags = {
Name = "backendstack-provider-subnet-2026"
}
}
Define the consumer VPC (same as before)
resource "awsvpc" "consumervpc" {
cidr_block = "10.1.0.0/16"
enablednshostnames = true
enablednssupport = true
tags = {
Name = "backendstack-consumer-vpc-2026"
}
}
Define a subnet for the consumer VPC (same as before)
resource "awssubnet" "consumersubnet" {
vpcid = awsvpc.consumer_vpc.id
cidr_block = "10.1.1.0/24"
availability_zone = "us-east-1a"
tags = {
Name = "backendstack-consumer-subnet-2026"
}
}
Create a security group for the NLB in the provider VPC
resource "awssecuritygroup" "nlb_sg" {
vpcid = awsvpc.provider_vpc.id
name = "backendstack-nlb-sg-2026"
description = "Allow private traffic to NLB"
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # This will be restricted by the endpoint's SG in practice
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "backendstack-nlb-sg-2026"
}
}
Create a Network Load Balancer (NLB) in the provider VPC
resource "awslb" "examplenlb" {
name = "backendstack-private-api-nlb-2026"
internal = true
loadbalancertype = "network"
subnets = [awssubnet.providersubnet.id]
securitygroups = [awssecuritygroup.nlbsg.id]
tags = {
Name = "backendstack-private-api-nlb-2026"
}
}
Create a target group for the NLB
resource "awslbtargetgroup" "exampletarget_group" {
name = "backendstack-private-api-tg-2026"
port = 80
protocol = "TCP"
vpcid = awsvpc.provider_vpc.id
health_check {
protocol = "HTTP"
path = "/health"
matcher = "200"
interval = 30
timeout = 5
healthy_threshold = 3
unhealthy_threshold = 3
}
tags = {
Name = "backendstack-private-api-tg-2026"
}
}
Create a listener for the NLB
resource "awslblistener" "examplenlblistener" {
loadbalancerarn = awslb.examplenlb.arn
port = 80
protocol = "TCP"
default_action {
type = "forward"
targetgrouparn = awslbtargetgroup.exampletarget_group.arn
}
}
Create a VPC Endpoint Service for the NLB (provider side)
resource "awsvpcendpointservice" "exampleservice" {
acceptance_required = true # Requires explicit acceptance by service provider
networkloadbalancerarns = [awslb.example_nlb.arn]
tags = {
Name = "backendstack-private-api-endpoint-service-2026"
}
}
Create a security group for the VPC Endpoint in the consumer VPC
resource "awssecuritygroup" "endpoint_sg" {
vpcid = awsvpc.consumer_vpc.id
name = "backendstack-endpoint-sg-2026"
description = "Allow outbound traffic from endpoint"
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "backendstack-endpoint-sg-2026"
}
}
Create a VPC Endpoint (consumer side)
resource "awsvpcendpoint" "example_endpoint" {
vpcid = awsvpc.consumer_vpc.id
servicename = awsvpcendpointservice.exampleservice.servicename
vpcendpointtype = "Interface"
subnetids = [awssubnet.consumer_subnet.id]
securitygroupids = [awssecuritygroup.endpoint_sg.id]
privatednsenabled = true # Enable private DNS for the endpoint
tags = {
Name = "backendstack-private-api-endpoint-2026"
}
}
Grant consumer VPC account permissions to connect to the endpoint service (required for acceptance)
resource "awsvpcendpointserviceallowedprincipal" "exampleallowed_principal" {
vpcendpointserviceid = awsvpcendpointservice.example_service.id
principal = data.awscalleridentity.current.account_id # Assumes consumer is same account for simplicity
}
Data source for current account ID
data "awscalleridentity" "current" {}
Output the DNS name of the VPC Endpoint for consumption
output "privateapiendpointdnsname" {
description = "The DNS name of the PrivateLink endpoint for the API service"
value = awsvpcendpoint.exampleendpoint.dnsentry[0].dns_name
}
Step-by-Step Implementation: Choosing Your Connectivity Path
We'll demonstrate the key steps for each solution. For brevity, we'll focus on the networking components.
Scenario: Provider VPC (API Service) and Consumer VPC (Client Application)
Imagine you have an API service running in `provider-vpc` (CIDR: `10.0.0.0/16`) that needs to be consumed by an application in `consumer-vpc` (CIDR: `10.1.0.0/16`). Both VPCs are in `us-east-1`.
Step 1: Setup Provider and Consumer VPCs
First, ensure you have your basic VPC infrastructure in place. The Terraform code shown in the "How It Works" section for defining `provider-vpc` and `consumer-vpc` and their respective subnets will create this foundation.
Command:
$ terraform init
$ terraform plan -out vpc-setup.tfplan
$ terraform apply vpc-setup.tfplan
Expected Output (truncated):
...
awsvpc.consumervpc: Creation complete after 1s [id=vpc-0abcdef1234567890]
awssubnet.consumersubnet: Creation complete after 1s [id=subnet-0fedcba9876543210]
awsvpc.providervpc: Creation complete after 1s [id=vpc-0123456789abcdef0]
awssubnet.providersubnet: Creation complete after 1s [id=subnet-0abcdef1234567890]
...
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
Step 2: Implement VPC Peering (if chosen)
After `terraform apply` for the VPC setup, you would apply the VPC peering specific resources.
Terraform Code (from "How It Works" - VPC Peering section):
... (VPC and Subnet definitions go here) ...
Request a VPC peering connection
resource "awsvpcpeeringconnection" "examplepeering" { ... }
Accept the VPC peering connection
resource "awsvpcpeeringconnectionaccepter" "examplepeeringaccepter" { ... }
Update route tables in both VPCs
resource "awsroute" "providervpctoconsumer" { ... }
resource "awsroute" "consumervpctoprovider" { ... }
Command:
$ terraform apply -auto-approve
Expected Output (truncated):
...
awsvpcpeeringconnection.examplepeering: Creation complete after 3s [id=pcx-0abcdef1234567890]
awsvpcpeeringconnectionaccepter.examplepeeringaccepter: Creation complete after 0s [id=pcx-0abcdef1234567890]
awsroute.consumervpctoprovider: Creation complete after 0s [id=r-rtb-0123456789abcdef01080289495]
awsroute.providervpctoconsumer: Creation complete after 0s [id=r-rtb-0fedcba98765432101080289496]
...
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
Testing Connectivity:
Deploy a basic EC2 instance in each VPC subnet. From the consumer EC2, you should be able to ping the private IP of the provider EC2 (after ensuring security groups allow ICMP traffic).
Common mistake: Forgetting to update both VPC's route tables to point to the peering connection for the destination CIDR block of the other VPC. This is a frequent cause of "no route to host" errors. Also, ensure security groups on both sides permit the necessary traffic.
Step 3: Implement Transit Gateway (if chosen)
To use TGW instead of peering, ensure any peering resources are destroyed first. Then, apply the TGW-related Terraform code.
Terraform Code (from "How It Works" - Transit Gateway section):
... (VPC and Subnet definitions go here) ...
Create a Transit Gateway
resource "awsec2transitgateway" "exampletgw" { ... }
Attach VPCs to TGW
resource "awsec2transitgatewayvpcattachment" "providervpc_attachment" { ... }
resource "awsec2transitgatewayvpcattachment" "consumervpc_attachment" { ... }
Create TGW Route Table and associate/propagate routes
resource "awsec2transitgatewayroutetable" "maintgw_rt" { ... }
resource "awsec2transitgatewayroutetableassociation" "providervpcassociation" { ... }
resource "awsec2transitgatewayroutetableassociation" "consumervpcassociation" { ... }
resource "awsec2transitgatewayroutetablepropagation" "providervpcpropagation" { ... }
resource "awsec2transitgatewayroutetablepropagation" "consumervpcpropagation" { ... }
Update VPC route tables to point to TGW
resource "awsroute" "providervpctotgw" { ... }
resource "awsroute" "consumervpctotgw" { ... }
Command:
$ terraform apply -auto-approve
Expected Output (truncated):
...
awsec2transitgateway.exampletgw: Creation complete after 30s [id=tgw-0abcdef1234567890]
awsec2transitgatewayvpcattachment.providervpc_attachment: Creation complete after 15s [id=tgw-attach-0123456789abcdef0]
awsec2transitgatewayvpcattachment.consumervpc_attachment: Creation complete after 15s [id=tgw-attach-0fedcba9876543210]
awsec2transitgatewayroutetable.maintgw_rt: Creation complete after 0s [id=tgw-rtb-0abcdef1234567890]
awsec2transitgatewayroutetableassociation.providervpcassociation: Creation complete after 0s
awsec2transitgatewayroutetableassociation.consumervpcassociation: Creation complete after 0s
awsec2transitgatewayroutetablepropagation.providervpcpropagation: Creation complete after 0s
awsec2transitgatewayroutetablepropagation.consumervpcpropagation: Creation complete after 0s
awsroute.providervpctotgw: Creation complete after 0s [id=r-rtb-0123456789abcdef01080289495]
awsroute.consumervpctotgw: Creation complete after 0s [id=r-rtb-0fedcba98765432101080289496]
...
Apply complete! Resources: 9 added, 0 changed, 0 destroyed.
Testing Connectivity:
Similar to peering, deploy EC2 instances in each VPC. Ping or make a HTTP request between them. The traffic will now traverse the TGW.
Common mistake: Incorrect `destinationcidrblock` in VPC route tables. Often, teams point `0.0.0.0/0` to the TGW if it's the primary egress. If not, only specific CIDRs should be routed to the TGW. Also, misconfigurations in TGW route table associations or propagations can prevent communication.
Step 4: Implement PrivateLink (if chosen)
If you only need a specific service in the provider VPC to be accessible from the consumer VPC, PrivateLink is the choice. Again, ensure previous connectivity resources are removed before applying PrivateLink.
Terraform Code (from "How It Works" - PrivateLink section):
... (VPC and Subnet definitions go here) ...
Security Group for NLB
resource "awssecuritygroup" "nlb_sg" { ... }
Network Load Balancer (NLB)
resource "awslb" "examplenlb" { ... }
NLB Target Group
resource "awslbtargetgroup" "exampletarget_group" { ... }
NLB Listener
resource "awslblistener" "examplenlblistener" { ... }
VPC Endpoint Service (provider side)
resource "awsvpcendpointservice" "exampleservice" { ... }
Security Group for Endpoint
resource "awssecuritygroup" "endpoint_sg" { ... }
VPC Endpoint (consumer side)
resource "awsvpcendpoint" "example_endpoint" { ... }
Grant permissions for consumer account
resource "awsvpcendpointserviceallowedprincipal" "exampleallowed_principal" { ... }
Data source for current account ID
data "awscalleridentity" "current" {}
Output the DNS name of the VPC Endpoint
output "privateapiendpointdnsname" { ... }
Command:
$ terraform apply -auto-approve
Expected Output (truncated):
...
awslb.examplenlb: Creation complete after 1m0s [id=arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/network/backendstack-private-api-nlb-2026/0123456789abcdef]
awslbtargetgroup.exampletarget_group: Creation complete after 0s [id=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/backendstack-private-api-tg-2026/0123456789abcdef]
awslblistener.examplenlblistener: Creation complete after 0s [id=arn:aws:elasticloadbalancing:us-east-1:123456789012:listener/network/backendstack-private-api-nlb-2026/0123456789abcdef/0123456789abcdef]
awsvpcendpointservice.exampleservice: Creation complete after 0s [id=vpce-svc-0abcdef1234567890]
awsvpcendpoint.example_endpoint: Creation complete after 1m30s [id=vpce-0fedcba9876543210]
awsvpcendpointserviceallowedprincipal.exampleallowed_principal: Creation complete after 0s
...
Apply complete! Resources: 10 added, 0 changed, 0 destroyed.
Outputs:
privateapiendpointdnsname = "vpce-0fedcba9876543210-01234567.vpce-svc-0abcdef1234567890.us-east-1.vpce.amazonaws.com"
Testing Connectivity:
From an EC2 instance in the consumer VPC, you can make an HTTP request to the `privateapiendpointdnsname` outputted by Terraform. This traffic will privately reach your service behind the NLB in the provider VPC.
Common mistake: Security group misconfigurations. The NLB's security group must allow inbound traffic from the PrivateLink endpoint (specifically, the CIDR of the endpoint's ENIs, which are usually within the consumer VPC's subnet CIDRs). Additionally, the endpoint's security group must allow outbound traffic to the NLB. The service behind the NLB also needs a security group allowing traffic from the NLB.
Production Readiness: Beyond Basic Connectivity
Implementing inter-VPC connectivity requires careful consideration of monitoring, cost, and security to ensure a robust production environment.
Cost Management
The financial implications often drive the choice between these solutions.
VPC Peering: Generally the cheapest for minimal setups. Costs are primarily for inter-AZ or inter-region data transfer. For example, if you have two VPCs in the same region but different AZs, you pay standard inter-AZ data transfer rates.
Transit Gateway: Incurs an hourly charge for the TGW itself (e.g., $0.05/hour in `us-east-1`) plus a data processing charge per GB (e.g., $0.02/GB). While this can be more expensive than a single peering connection, it often becomes significantly more cost-effective than managing a sprawl of peering connections and their associated data transfer costs at scale. The administrative overhead savings are also substantial.
PrivateLink: Charges an hourly fee per VPC endpoint (e.g., $0.01/hour per AZ) and a data processing charge per GB (e.g., $0.01/GB). For a single service, this is manageable. However, if every microservice requires its own PrivateLink endpoint, or if numerous consumers access the same service, these costs can add up quickly. It's a premium for simplicity and isolation.
Guidance: For fewer than 5 VPCs with infrequent, low-volume communication, peering might still be cost-effective. For complex, hub-and-spoke networks with many VPCs, TGW almost always provides superior cost-to-management ratio. PrivateLink is justified for specific secure service exposure, accepting a potentially higher unit cost for strong isolation and simplified network routing.
Monitoring & Alerting
Effective monitoring is crucial for diagnosing network issues and ensuring operational stability.
VPC Flow Logs: Indispensable for all three solutions. Enable Flow Logs on all relevant VPCs/subnets to capture traffic metadata. This allows you to see source/destination IPs, ports, protocols, and whether traffic was accepted or rejected. Analyze Flow Logs in CloudWatch Logs or export to S3 for further analysis in tools like Athena or Splunk.
CloudWatch Metrics:
Transit Gateway:* CloudWatch provides metrics for `BytesIn`, `BytesOut`, `PacketDropCount`, `ActiveFlowCount` for TGW attachments. Monitor these for unusual spikes, drops, or errors, indicating potential issues or traffic shifts.
Network Load Balancer (for PrivateLink):* Monitor NLB metrics like `ActiveFlowCount`, `NewFlowCount`, `HealthyHostCount`, `ConsumedLCUs` to understand service health and traffic patterns.
VPC Endpoints (for PrivateLink):* While direct endpoint metrics are limited, monitoring network interface metrics on the ENIs created by PrivateLink endpoints can provide insights into traffic volume.
Alerting: Set up CloudWatch alarms on key metrics. For example, alert on `PacketDropCount` for TGW attachments exceeding a threshold, or a sudden drop in `ActiveFlowCount` on an NLB. Alert on Flow Log patterns indicating rejected connections between expected endpoints.
Security
Security postures vary significantly across these options.
VPC Peering: Security relies entirely on distributed controls: Network Access Control Lists (NACLs) at the subnet level and Security Groups at the instance level. Managing these across many peered VPCs can be error-prone and lead to security gaps. There is no central point for network policy enforcement.
Transit Gateway: Offers a centralized point for routing, which can be leveraged for security. You can attach network security appliances (like a firewall VPC with a Gateway Load Balancer or AWS Network Firewall) to the TGW and route all inter-VPC traffic through it for centralized inspection and policy enforcement. TGW route tables provide granular control over which VPCs can communicate with each other. For example, you can create separate TGW route tables for "shared services," "production," and "development" VPCs, defining explicit access rules.
PrivateLink: Provides the highest level of network isolation. Traffic never leaves the Amazon network, nor does it traverse public internet. IP addresses are private. The connection is established at the service level, meaning only the specific service is exposed, minimizing the attack surface. Service providers maintain control over which AWS accounts can create endpoints to their service.
Edge Cases & Failure Modes
IP Overlaps: VPC Peering fundamentally fails if CIDR blocks overlap. Transit Gateway can route between non-overlapping CIDRs but does not solve the IP overlap problem within a VPC itself. PrivateLink completely bypasses IP routing conflicts as it uses interface endpoints with private IPs within the consumer VPC, making it ideal for multi-account scenarios where IP planning is difficult.
Transitive Routing: VPC Peering explicitly does not support transitive routing, requiring a full mesh for all-to-all communication. Transit Gateway is built for transitive routing, simplifying complex network topologies. PrivateLink is for direct service consumption, not general routing.
Cross-Region Connectivity: Both VPC Peering and Transit Gateway can be configured for cross-region connectivity, though data transfer costs increase significantly. PrivateLink supports cross-region endpoints for certain AWS services and custom services exposed via NLBs.
Debugging: When network issues arise, VPC Flow Logs are your first diagnostic tool. For TGW, the AWS Network Access Analyzer can trace potential network paths and identify misconfigurations in TGW route tables or security groups. For PrivateLink, ensure NLB targets are healthy and security groups on both the endpoint and NLB sides allow traffic. DNS resolution of the endpoint name is also a common troubleshooting point.
Summary & Key Takeaways
Choosing the right AWS inter-VPC connectivity solution is a critical architectural decision that impacts scalability, security, and operational overhead. There is no one-size-fits-all answer; the optimal choice depends on your specific requirements and growth trajectory.
For simple, low-VPC environments (2-3 VPCs) with limited, point-to-point communication needs: Start with VPC Peering. It's straightforward and cost-effective for these minimal use cases.
For growing, multi-account, multi-VPC environments (more than 5-10 VPCs) requiring hub-and-spoke connectivity or integration with on-premises networks: Adopt AWS Transit Gateway. It provides a scalable, manageable, and secure backbone for your cloud network, dramatically reducing operational complexity.
For highly secure, private consumption of specific services across VPCs or accounts, particularly to avoid IP overlap issues and simplify consumer-side routing: Leverage AWS PrivateLink. It's ideal for exposing APIs, databases, or third-party SaaS offerings privately.
What to avoid:
Avoid creating a spaghetti mesh of VPC Peering connections when your environment grows beyond a few VPCs. The management and security overhead will quickly become unsustainable.
Do not attempt to use PrivateLink for general inter-VPC routing. It's a service-specific solution, not a general network connectivity fabric.
Don't underestimate the ongoing operational cost of an ill-suited networking solution. The initial "cheaper" option can lead to far greater costs in troubleshooting, security incidents, and lost productivity in the long run.























Responses (0)