Member-only story
Lambda Provisioned Concurrency AutoScaling is Awesome. Make sure you understand how it works!
A quick intro to Provisioned Concurrency
At re:Invent 2019, AWS added a flavor of AWS Lambda that lets you pre-provision Lambda capacity ahead of time — its called Provisioned Concurrency (PC). You tell AWS how much concurrency you want to keep warm and AWS will handle everything for you. This lets you execute Lambda functions with super low latency and no cold starts.
Since AWS is setting aside capacity on your behalf, Provisioned Concurrency (PC) has a cost. You can read about the pricing here — but at a high level, as long as you are actively using roughly 60% of your allocation you’ll break even, with the added benefit of always warm functions. Remember, concurrency is measured by TPS * Invoke Duration (in seconds). The shorter your functions run, the less concurrency you need.
Auto Scale for Cost control
You should use Auto scaling to optimize your PC costs. There’s a feature in Auto scaling called TargetTracking.
This flavor of auto scaling tries to maintain a utilization percentage you specify. When you enable this feature two CloudWatch alarms are deployed and both monitor the ProvisionedConcurrencyUtilization metric:
- A scale up alarm that requires 3 data points over 1 minute each
- A scale down alarm that requires 15 data points over 15 minutes
Both of these alarms use the Average statistic by default.
Make sure you evaluate your traffic patterns and confirm that they will actually trigger the above alarms. The scale up alarm averages the ProvisionedConcurrencyUtilization metric over 3 minutes. If the metrics exceed your target setting the alarm will trigger and auto scaling will calculate the scaling adjustment based on the metric and the target value.


Be aware of your traffic patterns
Functions that execute quickly with traffic patterns that burst quickly may not trigger your Provisioned Concurrency…