Optimize Everything + re-optimize (The Ultimate Guide to AWS Lambda Development Chapter 3)

George Mao
9 min readFeb 2, 2024

Chapter 2 of this guide focused on themes you should follow to improve the way you build and deploy packages for Lambda. This chapter focuses on one of the most overlooked but critical things you need to do for your Lambda functions: Optimization.

We’ll start by covering Lambda billing and performance basics to get an idea of what to optimize. Then cover optimization techniques and finally talk about why you should re-optimize on a regular basis.

If you don’t read anything else in this guide, just remember:

Optimization and Cost should be required parts of your development process. We are all responsible for the cost of our systems.

Lambda Billing Basics

Lambda pricing has two components:

  1. The first is simple — 20 cents per Million invokes. This is usually negligible for most use cases. I don’t even really account for this when estimating the cost of my function.
  2. The second is Duration of execution, measured in GB/seconds. GB/seconds is how long your function ran for at the configured memory setting. Think of it like this:

Duration Cost = Memory Configured * Duration (Seconds)

There only 3 things you need to know:

  • Lambda supports both x86 and arm64 architectures but the pricing formula is the same. Just reduce all x86 prices by 20% to get an estimate for the equivalent cost on arm.
  • The two components: Memory and Duration are directly proportional and equal weight.
  • AWS uses memory CONFIGURED not memory consumed in their pricing.

Lets look at an example:

If you have a function configured at 128 MB and it runs for 1 second this is actually the same cost for a function at 256 MB that executes for 1/2 a second.

Hit the AWS pricing calculator if you don’t believe me :)

Lambda Performance Basics

AWS doesn’t allow us to configure anything related to CPU. Instead, the Memory you allocate to the function also means AWS provides a predefined, equivalent CPU allocation. The more Memory you set, the more CPU power AWS gives you. AWS does not publish exactly what allocation they use but in recent training events they’ve shared an approximation:

From a recent AWS developer session

All you need to know is that its linear — double the Memory, double your CPU allocation. Half your memory, half your CPU allocation.

However, Allocation does not directly equal performance. The code you write and the workload characteristics will result in a “sweet” spot for your Lambda memory configuration. You can keep increasing memory but at some point there will not be any further performance improvements.

Optimization Techniques

Our goal is to identify the workload pattern and choose the best Memory setting for optimal efficiency. Workloads typically come in two flavors, they’re either CPU constrained (ie. lots of IO) or Memory constrained.

In many cases it can be cheaper to use more memory than your workload needs, so the workload can complete faster. In most cases, we want to execute as fast as possible — but you should consider your workload SLA. Also is your workload synchronous or asynchronous? Do you just need to complete the work in a certain window of time? If so, its likely you can lower Memory, run the workload slower, still meet your SLA, and save money!

Let’s look at a piece of code that represents a CPU intensive task. This code simply computes SHA 512 hashes in a large loop — which is heavily impacted by CPU power.

import json

import hashlib
from datetime import datetime
import string
import random

N = 51200

def lambda_handler(event, context):
m = hashlib.sha512()
# Generate a random long string
rstr = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N))

t1 = datetime.now()
print("Start time: " + str(t1))

# repeatedly compute the digest
for i in range(10000):
m.update(rstr.encode())
m.digest()

t2 = datetime.now()
print("End time: " + str(t2))
print("Difference: " + str(t2 - t1))

This code will perform nearly identical every time its executed since there are no external dependencies. The baseline for this test is 128MB and ~18 seconds of execution duration.

Let’s observe what happens when we start increasing memory:

  • 2x memory to 256 MB → 2x performance to ~9 seconds
  • 4x memory to 1024 MB → 4x performance to ~2.3 seconds
  • 2x memory to 2048 MB → 2x performance to 1.3 seconds
  • All memory values past 2048 see no performance gains

Performance scales linearly until we hit the 2048MB mark. If you were to configure memory past 2048MB you’re just spending extra money with no value.

As you can see from this exercise, it is essential to tune Memory configurations as part of your development process. You can accomplish this manually like I did above but I highly recommend using the AWS Lambda Power Tuner. This was originally created by Alex Casalboni at AWS. My organization enhanced the Power Tuner to use the latest AWS SDK (v3), convert to arm based functions, and use ESBuild — all discussed in Chapter 2 of this guide.

Run Power Tuner

The Power Tuner creates multiple versions of your target Lambda (at different Memory settings), invokes them, and analyzes the performance. It produces a chart that helps you visualize the Memory/Performance ratio.

Here’s how to interpret this graph:

  • The x-Axisrepresents different memory configs. These are default settings that you can customize
  • The y-Axis represents Duration (the 2nd component in Lambda pricing)
  • The Red line represents performance (lower is better)
  • The Blue line represents Cost (lower is better)

At 256 MB this function took nearly 10000 MS to run but costs were fairly low. When we double to 512 MB the function performance drastically improved (100%) to under 5000 MS — but pay attention to the cost — it actually stayed pretty flat!

This is where the pricing formula worked in our advantage. We doubled memory (double cost) but ran 2x as fast — resulting in very similar costs.

Lets double memory again to 1024 MB — this results in another 100% improvement in performance (under ~2500ms).

One more double to 2048 MB — this results in further performance gains to ~1300 ms or so.

All memory values past 2048 MB result in exponentially higher costs without any real performance gains. This is the result of our code and CPU constraints that do not improve performance any futher.

Based on this, I would recommend a 2048 MB memory setting for this function. You can run this test again with memory settings between 2048 → 3072 to further refine the memory value.

You may find that you need to INCREASE memory rather than decrease for the optimal cost efficiency.

Here’s another tuning report. Where would you set memory for this function? Join the #BelieveInServerless community on Discord to discuss!

There are a few other high value optimizations you should add to your optimization playbook: Provisioned Concurrency, Arm Architecture, Snapstart, and Log Optimization.

Provisioned Concurrency (PC)

You should enable Autoscaling anytime you enable PC to control costs. This is the same concept as autoscaling a fleet of EC2 servers to scale your compute needs to traffic. To get an idea of concurrency you should use this general formula:

Concurrency = RPS * Average Duration (seconds)

If you’re driving 100 Invoke requests to Lambda and they average 1 second each, then the function will consume 100 Concurrency. If that same function increases to 2 seconds, then concurrency increases to 200. If that function drops to 1/2 a second, then concurrency decreases to 50.

I usually advise running your functions without PC for a period of time to understand the actual concurrency generated by your workload AND the pattern. Look at your ConcurrentExecutions metric using the Max statistic over 1 week or longer to determine this. Here’s an example:

ConcurrentExecutions over 1 week

There’s a very well defined and repeatable daily pattern. Concurrency baselines around 5. It ramps up throughout the day and peaks around 15–16. There’s a couple random spikes to 22 but those are outliers.

Autoscaling can be configured using either a Scheduled mode or a TargetTracking mode. You can read about how to to configure those options at AWS Blogs or the docs. In this case, I would choose Scheduled since the traffic pattern is well defined with a well defined min/max range. If there is no pattern you should use the TargetTracking mode to allow AWS to automatically try to maintain a certain percentage of utilization of your PC allocation.

Finally, just be aware of a few caveats about autoscaling. I wrote in detail about the default scaling configuration here.

Move Architecture to → arm

Lambda supports running your functions on either the x86 or arm hardware architecture. The current default is x86 but there’s about a ~20% cost savings by running on arm, simply due to hardware economics. Most but not all workloads will perform better on arm. You should test your workload once you make the change.

If you’re using any libraries make sure you use the arm optimized version (if available)

Move your function to arm with the Architectureattribute configuration in SAM or CFT:

Type: AWS::Serverless::Function
Properties:
Architectures: arm64 #instead of x86_64
... #other properties

Use Snapstart (for Java runtimes only)

We all know using Java in Lambda usually means long coldstart times, especially if you use any flavor of Spring in your code. I’ve seen cold starts in the 20+ second range :( But most enterprises have a large Java code base and its simply easier to reuse that code. AWS released Snapstart at re:Invent 2022 to address this problem.

Snapstart is a free configuration option available for Java based functions. You have to upgrade to Java 11 or newer. Here’s what you need to know:

  • AWS pre-initializes your function and generates a snapshot that has performed most of the heavy JDK inits. That snapshot is cached and resumed when the function is invoked, resulting in huge performance gains
  • You can reduce cold start timings as much as 80%+
  • It’s now possible to cold start in under a second WITH Java/Spring
  • Its FREE!

Be sure to check out Sam Dengler’s blog post about this feature and Mark Sailes’s re:Invent talk.

Optimize your Logs

By default, Lambda generates a single Log Group in this format: /aws/lambda/[lambda-function-name].Everything you write to standard out ends up in Log Streams that live in the single Log Group.

Cloudwatch Logs charges $0.50 per GB to ingest logs and $0.03 per GB / month to store the logs. These logs never expire and continue to cost you forever! Every function that you’ve ever executed has produced logs and costs you money.

You should prioritize two things: Reduce log output and Expire unnecessary logs.

  • AWS introduced a Infrequent Access log level that is 50% cheaper ($0.25 vs $0.50 per GB) but you lose most of the interactive features such as Subscription Filters, Live tail, and EMF. I don’t recommend this option for anything that is actively running but if you have old accounts that are waiting for decommissioning or if you have a logging strategy that funnels all logs elsewhere, this is a good option to cut costs.

Lastly, set Retention policies on your log groups based on the environment the function operates in. It’s the RetentionInDaysattribute in CloudFormation:

myLogGroup: 
Type: "AWS::Logs::LogGroup"
Properties:
RetentionInDays: 7

You should be more aggressive in environments such as QA/Load testing where Log retention does not matter. Here’s some suggestions you can customize for your organization:

  • Dev (3 days) — I like low retention here.
  • QA (5 days) — QA cycles take time to complete and validate.
  • Load Test (3 days) — These are planned, time boxed events.
  • Prod (30 days) — Customize this to meet your data retention needs.

Re-optimize!

Your business needs, workload pattern, available AWS features, and even pricing will change over time. Make sure you’re re-evaulating all of these on a regular basis to stay optimized.

Summary

The TLDR;

  • Everyone is responsible for cost!
  • Optimization has to be a required part of your dev cycle.
  • Re-evaulate on an ongoing basis as business needs change

Join us Discord #BelieveInServerless to talk about these. Let me know if there are more themes you follow!

--

--

George Mao
George Mao

Written by George Mao

Head of Specialist Architects @ Google Cloud. I lead a team of experts responsible for helping customers solve their toughest challenges and adopt GCP at scale