Use cases

$228,000 in 6 Months. One 10-Minute Audit. Here's What Was Hiding in Plain Sight

A mid-sized ecommerce company was seeing its AWS bill swing between $150K and $250K monthly with no clear explanation. This post walks through a 10-minute forensic cloud cost audit using Deductive AI connected to AWS via MCP. The audit uncovered three structural cost drivers invisible to traditional dashboards: a massively over-provisioned ElastiCache cluster (544% over-provisioned), cross-region ECR pulls taxing every container image deployment, and an aggregate EKS footprint of 100 clusters whose baseline costs were silently compounding. All individual resource-level optimizations were working perfectly. The waste was architectural, not resource-level. The post makes the case that cloud forensics is a reasoning problem, not a data problem, and AI-driven tools that think in systems are what surface it.

Pratyush Verma
Sameer Agarwal
Vandit Gandotra
  -  
6 min read
Share this article

TL;DR

A mid-sized ecommerce company had a cloud bill swinging between $150K and $250K a month. Nobody could explain why. We pointed Deductive AI at their AWS infrastructure and ran a 10-minute forensic audit. Here's what was hiding:

A cache cluster that grew in the dark. ElastiCache had ballooned to ~150 cache.r5.large nodes across four regions, 544% over-provisioned for the actual load it was protecting. Aurora Serverless and DynamoDB were handling traffic fine. The cache was just there, billing quietly, for months.

A cross-region tax nobody had mapped. The ECR container registry was set up in a different region than the Kubernetes clusters pulling from it. Every image pull was crossing a region boundary. This doesn't show up anywhere obvious in Cost Explorer. It only surfaces when you correlate registry location with cluster topology.

100 clusters with a baseline cost problem. The issue wasn't that any individual cluster was expensive. It was the aggregate footprint: 100 EKS clusters across four regions, each carrying worker nodes, load balancers, and networking overhead, including staging environments that didn't need dedicated clusters.

The IaC paradox. Every resource-level optimization was working perfectly. Spot adoption near 100%, Aurora auto-pause active, S3 costs at $0.16 total. The inefficiency wasn't at the resource level. It was architectural. No amount of instance right-sizing would have found any of this.

Cloud forensics is a reasoning problem, not a data problem. You already have the data. You just need something that thinks in systems, not silos.

The Setup: A Simple Question with a Complicated Answer

It started with a deceptively simple question: "How can Deductive AI help us reduce our cloud cost?" asked a mid-sized ecommerce company.

Like most engineering teams, they had a general sense of AWS costs, but "general sense" is the enemy of optimization. They had ~100 Kubernetes clusters, up to 10,000 nodes during peak load, 700 services spread across four AWS regions, and a monthly bill that swung wildly between $150k and $250k. Nobody could explain why.

Here is what we suggested to them: Instead of manually clicking through the AWS Cost Explorer or building yet another dashboard, provide Deductive access to AWS Cost Explorer and connect to Deductive MCP, an AI-powered infrastructure investigation platform, ask it to perform a full forensic analysis of our cloud spend.

What followed was a masterclass in why AI-driven infrastructure analysis isn't just faster, it's fundamentally different.

The 10-Minute Audit That Replaced a Week of Work

Within minutes, Deductive pulled our complete 12-month cost history, broke it down by service, region, and usage type, and surfaced the first surprise:

Month wise spend analysis

Total 6-month spend: $228,000 (excluding all credits and refunds — raw usage only).

AWS spend: 6-month forensic graph

A human analyst would look at this table and see "costs went up, then down." Deductive saw something more precise: a 155% cost volatility between the lowest and highest months, with a ~20% anomalous drop in September and January that warranted investigation.

But the real story wasn't in the totals. It was in the breakdown.

Anomalies Beneath the Surface

What Deductive found in 10 minutes

Traditional cost dashboards show you where your money goes. Deductive showed us where it shouldn't be going.

  • ElastiCache volatility: 155% cost swing, a caching layer that was scaling unpredictably. Deductive flagged it as the highest-volatility service and tied it to regional deployment patterns.
  • EKS extended support: 142% volatility, Kubernetes costs fluctuating without clear workload correlation. The AI identified inefficient scaling and recommended workload optimization.
  • EC2 m6a.2xlarge (us-west-1): 126% volatility, an instance type costing more than its utilization justified.

Crucially, Deductive didn't stop at "ElastiCache is expensive." It correlated ElastiCache with ECR cross-region pulls, EKS cluster scaling, and a hub-spoke architecture (us-west-1 as primary hub) that was driving unnecessary data transfer costs.

It also discovered we were running ~100 EKS clusters across four US regions (us-west-1, us-east-1, us-east-2, us-west-2), each carrying a baseline cost, at minimum a few worker nodes, load balancers, and networking overhead. The question wasn't whether individual clusters were expensive — they weren't. The question was whether the aggregate footprint was justified, especially for staging environments that may not need dedicated clusters.

The Cache Cluster That Grew in the Dark

Deductive's next finding was the most dramatic. The ElastiCache infrastructure had undergone a 155% cost increase from September to December, growing from a modest footprint to ~150 cache.r5.large nodes spread across four regions:

ElastiCache cost by region

Here's where Deductive's cross-system correlation became invaluable. It correlated the traffic pattern to ElastiCache with DynamoDB and Aurora usage, leveraging its understanding of the system architecture and using Aurora Serverless v2 and DynamoDB usage as the baseline.

The conclusion was unavoidable: our customers were incorrectly scaling their ElastiCache cluster to protect an infrastructure which is low and predictable. It was like hiring 10 security guards for a lemonade stand.

The IaC Cross-Reference

This is where Deductive's ability to correlate infrastructure-as-code with actual cloud spend proved most valuable. Deductive used the knowledge from the IaC (infrastructure-as-code) repository, containing code related to pulumi and terraform and asked it to compare what was provisioned versus what was consumed.

IaC said it. The bill proved it.

The IaC-to-spend correlation revealed that our cost optimizations for individual resources were excellent. Spot instances, serverless databases, lifecycle policies — all doing exactly what they should.

The cost inefficiency wasn't at the resource level. It was at the architecture level: too many regions, incorrect caching/cache-scaling strategy, centralized ECS registries with cross-region pulls, and no cost allocation tags to track spend per customer. No amount of instance right-sizing would fix structural costs.

Why This Matters: The Case for AI-Driven Cloud Forensics

The Old Way Doesn't Scale

Traditional cloud cost analysis follows a predictable pattern: export a CSV from Cost Explorer, open it in a spreadsheet, sort by cost, squint at the numbers, and maybe catch the obvious stuff. This approach has three fatal flaws:

  1. It's siloed. Cost data lives in billing. Utilization data lives in CloudWatch. Kubernetes metrics live in Prometheus. Cross-referencing them manually is tedious and error-prone.
  2. It's backward-looking. By the time you notice a cost spike in your monthly review, you've already been paying it for weeks.
  3. It misses correlations. A human looking at ECR costs won't instinctively check which regions the Kubernetes clusters are deployed in. Deductive did.
What Makes AI Different

In our 10-minute investigation, Deductive performed analysis that would have taken a human team days:

In 10 minutes, Deductive audited an entire cloud stack and ranked every savings opportunity by impact.

The key insight isn't that AI is faster (though it is). It's that AI reasons across data boundaries that humans naturally silo. The connection between ECR region, cluster location, and data transfer costs exists in three different AWS services. Deductive found it because it doesn't think in service silos, it thinks in systems.

The Hidden Cost of Not Looking

Our customers’ monthly expenses aren't outrageous by any standard. But the 155% cost volatility between months tells a story of infrastructure that's reacting rather than planning. The October-December spike wasn't caused by a traffic surge, it was caused by provisioning decisions that nobody reviewed.

Every engineering team has a version of this story. The cache cluster grew because someone scaled up during an incident and never scaled back down, added scaling policy by just looking at request patterns of certain hot shards. The container images shipping across regions because the ECR was set up in a different region than the clusters it serves.

These aren't failures of engineering. They're failures of visibility. And without cost allocation tags, budgets, and multi-region Config, you can't do per-customer unit economics, you can't detect anomalies early, and you can't catch drift. Governance isn't a cost, it's the infrastructure that prevents costs from becoming surprises.

Conclusion

Cloud forensics is a reasoning problem, not a data problem. You already have the data: Cost and Usage Reports, CloudWatch, resource inventory, infrastructure-as-code definitions. What you need is something that can query it conversationally, correlate across services, regions, and time, and surface anomalies you didn't know to look for.

Deductive did all of this in one session. No scripts. No manual aggregation. No week-long spreadsheet exercises.

The transformation: From "We should probably look at our cloud costs" to "Here's a prioritized optimization plan with specific savings, implementation timelines, and the architectural changes to get started."

Every infrastructure has stories hiding in the billing data, cost anomalies that explain themselves when you correlate them with the right metrics, provisioning decisions that made sense once but don't anymore, and architectural patterns that quietly drain budget month after month.

The question isn't whether your cloud bill has hidden costs. It's whether you've looked.

Connect Deductive to your infrastructure and find out.

This analysis was performed using Deductive AI against production AWS infrastructure. All cost figures reflect actual resource consumption with credits and refunds excluded.

Get ready to redefine the way your developers and SREs
root cause software issues

Request early access and offload your debugging to Deductive AI

Codeblock
Deductive monogram
Codeblock
CTA Image