How I Cut a Startup's AWS Bill by 55% Without Touching Their Product
I'm Eli Arama, a DevOps engineer with 8+ years helping startups stop bleeding money on cloud infrastructure they don't need.
The Call
A Series B startup reached out in a panic. Their AWS bill had crept from $12K to $38K/month over 18 months, and their investors were asking questions. The engineering team was 14 people. Nobody owned infrastructure. Sound familiar?
I spent three weeks inside their AWS accounts. Here's exactly what I found and what I did about it.
Week 1: The Audit
Before changing anything, I mapped every dollar. Most teams skip this step because it's tedious. That's exactly why their bills are out of control.
What I Found
$7,400/month in idle resources. Three staging environments nobody had used in four months — each running its own EKS control plane ($73/mo), three m5.large worker nodes ($210/mo), a db.t3.medium RDS instance ($53/mo), a NAT Gateway ($33/mo), and an ALB ($25/mo). That's roughly $1,200 per environment, totaling $3,600. On top of that: a db.r5.large Multi-AZ RDS instance ($555/mo) running a test database that two engineers had forgotten about, a NAT Gateway in a VPC with no active workloads ($33/mo), seven unattached EBS volumes totaling 2TB ($160/mo), and three unused Elastic IPs ($11/mo). The remaining $3,000 was scattered across forgotten Lambda functions, idle Elasticache nodes, and CloudWatch log groups retaining data indefinitely.
$5,800/month in oversized instances. Their production EKS cluster was running 12 m5.2xlarge nodes ($0.384/hr each) at 14% average CPU utilization — that's $3,365/mo in compute mostly sitting idle. Downsizing to 5 m5.xlarge nodes ($0.192/hr) with a cluster autoscaler would handle the same workload for $701/mo, saving $2,664. Their primary RDS was a db.r5.4xlarge Multi-AZ ($4.00/hr, $2,920/mo) handling 200 queries per second — a db.r5.xlarge Multi-AZ ($1.16/hr, $847/mo) can handle that easily, saving $2,073. A second analytics RDS on db.r5.2xlarge Single-AZ ($1.16/hr, $847/mo) was running at 8% utilization and could drop to db.r5.large ($0.58/hr, $423/mo), saving another $424.
$5,100/month in missing commitments. Zero Reserved Instances. Zero Savings Plans. After rightsizing, their steady-state compute spend was roughly $15K/mo — all at full on-demand pricing for workloads running 24/7 for over a year. A 1-year Compute Savings Plan (no upfront) would save roughly 30%, and RDS Reserved Instances (1-year, no upfront) would save about 30% on the databases.
$2,100/month in data transfer waste. Kubernetes services were spread across three AZs with no topology-aware routing — at $0.01/GB per direction, their 30TB/mo of cross-AZ service mesh traffic was costing $600/mo unnecessarily. The bigger hit: all S3 traffic was routed through NAT Gateways at $0.045/GB processing. With 25TB/mo of S3 transfers, that's $1,125/mo in NAT processing fees for something a free VPC Gateway Endpoint handles at zero cost. The rest was miscellaneous egress that could be reduced with CloudFront caching.
That's $20,400/month in clear waste — 54% of their bill — before making a single architectural change.
Week 2: The Quick Wins
I prioritize by impact and risk. Kill the obvious waste first, resize second, commit last.
Day 1-2: Delete the Dead Weight
Terminated the three unused staging environments, removed the forgotten test RDS instance, deleted the orphaned EBS volumes and Elastic IPs, tore down the empty VPC and its NAT Gateway, and cleaned up the idle Lambda functions and Elasticache nodes. No approvals needed — nobody was using any of it.
Savings: $7,400/month.
Day 3-4: Right-Size the Compute
Downsized the EKS nodes from 12x m5.2xlarge to 5x m5.xlarge and added a cluster autoscaler so nodes scale with actual demand instead of sitting idle. Moved the primary RDS from db.r5.4xlarge to db.r5.xlarge and the analytics DB from db.r5.2xlarge to db.r5.large, both during maintenance windows.
This is where most engineers get nervous. "What if we need the headroom?" You don't. If CPU has never exceeded 20% in six months, you're not going to spike to 80% tomorrow. And if you do, the autoscaler handles it.
Savings: $5,800/month.
Day 5: Fix the Network
Added an S3 VPC Gateway Endpoint (free — zero hourly or per-GB charges). Enabled topology-aware routing in the service mesh to prefer same-AZ communication. Added CloudFront in front of the API for cacheable responses.
Savings: $2,100/month.
Week 3: Lock In the Rates
Once the environment was right-sized and I knew exactly what was running, I set up a 1-year Compute Savings Plan (no upfront) for the baseline EKS workload and 1-year Reserved Instances (no upfront) for the production databases. Both give roughly 30% off on-demand pricing without locking into specific instance types.
Savings: $5,100/month.
The Result
| | Before | After | |---|---|---| | Monthly AWS bill | $38,000 | $17,600 | | EKS node count (avg) | 12 x m5.2xlarge | 5 x m5.xlarge | | Primary RDS | db.r5.4xlarge Multi-AZ | db.r5.xlarge Multi-AZ | | Savings Plans | None | 1-year Compute (no upfront) | | Idle resources | 15+ | 0 | | S3 data path | Through NAT Gateway | VPC Gateway Endpoint |
Total monthly savings: $20,400. Annual savings: $245,000.
Zero downtime. No product changes. No performance impact. The P99 latency actually improved after the network changes because we eliminated unnecessary cross-AZ hops.
Why This Keeps Happening
Every startup I've worked with has some version of this problem. The pattern is always the same:
- Nobody owns the bill. Engineering spins up resources. Finance sees the invoice 30 days later. Nobody connects the two.
- Fear of downsizing. "We might need it" is the most expensive sentence in cloud computing.
- Dev environments that never die. Every feature branch gets a staging environment. Nobody writes the teardown script.
- On-demand everything. Savings Plans feel like a commitment. So does paying 30% more for the same compute.
What I'd Tell a CTO
If your AWS bill is growing faster than your revenue, you don't have a cloud problem — you have a visibility problem. You need someone to sit in your account for a week, map every dollar to a workload, and tell you what's waste.
It's not glamorous work. But $245K/year buys a lot of engineering time.
Think your cloud bill is higher than it should be? It probably is. Get in touch and I'll tell you where the money is going.
Need help with this?
I help startups and scale-ups build AI solutions and reliable cloud infrastructure.
Book a Free Call