Lambda Provisioned Concurrency AutoScaling is Awesome. Make sure you understand how it works!
A quick intro to Provisioned Concurrency
At re:Invent 2019, AWS added a flavor of AWS Lambda that lets you pre-provision Lambda capacity ahead of time — its called Provisioned Concurrency (PC). You tell AWS how much concurrency you want to keep warm and AWS will handle everything for you. This lets you execute Lambda functions with super low latency and no cold starts.
Since AWS is setting aside capacity on your behalf, Provisioned Concurrency (PC) has a cost. You can read about the pricing here — but at a high level, as long as you are actively using roughly 60% of your allocation you’ll break even, with the added benefit of always warm functions. Remember, concurrency is measured by TPS * Invoke Duration (in seconds). The shorter your functions run, the less concurrency you need.
Auto Scale for Cost control
You should use Auto scaling to optimize your PC costs. There’s a feature in Auto scaling called TargetTracking.
This flavor of auto scaling tries to maintain a utilization percentage you specify. When you enable this feature two CloudWatch alarms are deployed and both monitor the ProvisionedConcurrencyUtilization metric:
- A scale up alarm that requires 3 data points over 1 minute each
- A scale down alarm that requires 15 data points over 15 minutes