alexcasalboni / aws-lambda-power-tuning Goto Github PK

AWS Lambda Power Tuning is an open-source tool that can help you visualize and fine-tune the memory/power configuration of Lambda functions. It runs in your own AWS account - powered by AWS Step Functions - and it supports three optimization strategies: cost, speed, and balanced.

License: Apache License 2.0

JavaScript 85.13% Shell 1.09% HCL 7.23% C# 1.73% TypeScript 1.11% Python 1.59% Batchfile 0.22% Go 1.89%

aws aws-lambda cloud cost lambda performance serverless stepfunctions

aws-lambda-power-tuning's Introduction

Hi there 👋

Interested in serverless cost/performance optimization? Check this out:

aws-lambda-power-tuning's People

Contributors

Stargazers

Watchers

Forkers

jmeekhof spuds51 vfonsecapv nickksun sajan-caissa thekant ambroseus hscheib ardeshir punith002 vitbyst vsaliy merapar abozgeyik orenbochman dmytrorybka samhays sofiaisha infor-hct jevans3 rpidanny gitrekm william-tai tudes abduljaleel eelzinaty kkomada benoitsiles yilingyiling lauchlinmac forkkit nikolabravo jmuhire jazzyf ernestjgon ashwindora josefbutts jgisler cferrer101 ishug86 hassaku63 lucadavid075 raghavender4345 mukteshkrmishra mohamedelephant aks1610 aniketh-maddipati phersh montbra kumologicahq sysguard gyj0825 prannoy47 devopsutils anilchinnam marshallm dldinternet leewalter chatchai-komrangded yogaglo trucnguyenlam sseshachala yamnay gabriel7131 gino247 excoriate nideveloper bardawilpeter syeutyu doddipriyambodo iothack gayansampathmanamendra maduflavins simondutertre olileach mikewootini dinakaranonline vieirinhaabr jappievw bassista petersmythe dz902 kiransterling noenthu tusharkalecam sheelkhanna esurov mt-kelvintaywl edrinb jasonmajor ik2sb sagarghagarenhs buttnomaan9 pedramha sent2020 sebastduval l1ahim stacyjeptha claudiopastorini inghumberto

aws-lambda-power-tuning's Issues

Make Node.js 10.x runtime an option

As of May 15 2019 AWS added support for Node.js 10.x in Lambdas. I'm thinking it would be nice to have an option to run the power-tuning Lambdas in a similar Runtime as the Lambda someone is writing.

Link to article AWS What's new

Implement dynamic parallelism

This new feature will make Lambda Power Tuning much more flexible: https://aws.amazon.com/blogs/aws/new-step-functions-support-for-dynamic-parallelism/

Now you can't easily test new memory configurations for each state machine execution, as memory configurations are hard-coded in the state machine structure.

With dynamic parallelism, we'll be able to provide a list of memory configurations as input and dynamically test only those configurations (without any deploy-time parameter).

When cost are the same, should return higher performing power

Here's an example where raising power improves the performance and kept the cost the same:

But in this, the power tuning tool still returned 128MB but logically it should have returned 256MB.

--dry-run

Per discussion in #37 (comment)

This might be an overkill, because num=1 would cover it, but having a possibility to verify if user's setup is correct could benefit a lot of people.

I personally always run dry runs if they exist, especially when im a beginner in an area, and/or stakes of mistake is high.

Automatically set optimal power configuration

Based on the discussion in #36, we could implement an opt-in feature to automatically set the optimal configuration value at the end of the state machine.

Provide a context to Lambda function when running the lambda power tuning

The lambda function that is being tuned requires a context.

How can we pass the context to this lambda function when running the tuning step function using the following:
{
"lambdaARN": "lambda ARN",
"powerValues": [128, 256],
"num": 5,
"payload": {},
"parallelInvocation": true,
"strategy": "balanced"
}

Cannot add memory option

When I deployed via the console, I added a 2048MB option to the comma-separated list:

But the state machine doesn't reflect this:

Is it suppose to work? (thought you were doing something along the line of deploying a custom resource and then using it in the same stack, like what SLS does for event bridge)

Reset $LATEST memory configuration after state machine execution

@alexcasalboni Great job on the tool, very handy & useful!

I ran the test on my lambda function with all possible memory settings. Initially, my function had 512MB memory assigned. After the tests were completed (confirmed that Cleaner & Finalizer are green), function got assigned memory of 3008 MB. I also checked that the versions that got created during the test were removed (which is expected) but the memory got set to the max memory given during the tests.

Is this expected?

Thanks!

Environment variable minRAM must contain string

July 4, 2018
serverless/serverless#5094 (comment)
After updating today to 1.28.0, Serverless (or a dependency) now expects all environment variables to be strings. This sounds reasonable, but it's a breaking change so I'm making people aware.

Serverless: Excluding development dependencies...

  Serverless Error ---------------------------------------

  Environment variable minRAM must contain string

  Get Support --------------------------------------------
     Docs:          docs.serverless.com
     Bugs:          github.com/serverless/serverless/issues
     Issues:        forum.serverless.com

  Your Environment Information -----------------------------
     OS:                     win32
     Node Version:           8.11.1
     Serverless Version:     1.28.0

Manually changing the serverless.base.yml file, to strings, fixes that issue.

    minRAM: '128'
    minCost: '0.000000208'

Use a unique payload for every run

Hi @alexcasalboni

#85 issue didn't seem to convey my problem properly.

Currently, when 6 types of powerValues are given and 5 is given to num, the Lambda Function is invoked 30 times in total.
If this Lambda Function behaves like deleting one record of the ID passed by Payload for each Invoke, the type of Payload needs to prepare more than the total number of calls.

However, according to the current specifications, it is not possible to give more payloads than the value of num.

Therefore, when the execution of the first PowerValues ends, the record with the ID specified in Payload has been deleted from the DB, and the specified ID does not exist when the second and subsequent PowerValues are executed. The correct processing is not performed and the required PowerValue cannot be checked.

In order to solve this problem, it is necessary to modify Payload so that it can be given more than the total number of executions.

Alternatively, by implementing a function that allows you to specify a Lambda Function that performs processing before and after each execution, you can execute it by giving only one type of payload. I think that such a function is very useful.

Improve mocking with Sinon.JS

Replace hard-coded mocking with Sinon.JS to avoid side effects and ugly workarounds.

Multiple optimization strategies

As mentioned in #30, new optimization strategies could be more useful in specific use cases.

The default strategy could remain cost, but a few more can be implemented.

The second most straightforward strategy is speed, and we should implement it in a way that's easy for new contributors to add implement strategies.

The Finalizer function takes all the statistics as input and will return the optimal configuration, so everything can be implemented there.

InvalidParameterValueException on v3.2.3

Just deployed the latest v3.2.3 and every time I run the new state machine I get the following error:

"error": {
    "Error": "InvalidParameterValueException",
    "Cause": "{\"errorType\":\"InvalidParameterValueException\",\"errorMessage\":\"The role defined for the function cannot be assumed by Lambda.\",\"trace\":[\"InvalidParameterValueException: The role defined for the function cannot be assumed by Lambda.\",\"    at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:51:27)\",\"    at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:55:8)\",\"    at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)\",\"    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)\",\"    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:683:14)\",\"    at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)\",\"    at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)\",\"    at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10\",\"    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)\",\"    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:685:12)\"]}"
  }
}

Optionally configure a single Lambda ARN at deploy-time

The current IAM statement for initializer, execution and optimizer looks like this:

          Statement:
            - Effect: Allow
              Action:
                - lambda:GetAlias
                - lambda:PublishVersion
                - lambda:UpdateFunctionConfiguration
                - lambda:CreateAlias
                - lambda:UpdateAlias
              Resource: '*'

I think some security-concerned users would rather avoid that Resource: '*'.

We could allow them to optionally configure a Lambda ARN (or prefix) so that these IAM policies are a bit more fine-grained.

Technically, this would be a CFN Parameter (e.g. lambdaResource), directly referenced via !Ref.

Feature Request - Leverage X-Ray to analyze different segments

I have a lambda that makes an external network call as part of it's execution which is a variable I can't control. It would be awesome if this tool leveraged AWS X-Ray to be able to report on the effect of memory tuning on various segments of execution.

Questions this approach could answer:

How do different memory allocations affect my initialization segement? This is a primary factor in cold start times
How do different memory allocations affect TLS connection setup times? I've heard that it can be significant, would be helpful want to quantify

Restrict Lambda IAM permissions

The current role has full access to AWS Lambda:

iamRoleStatements:
    - Effect: Allow
      Action:
        - 'lambda:*'
      Resource: '*'

Since we want the lambdaARN to be given at runtime, we can't really restrict the Resource parameter. We could restrict the set of actions, though. Also, experienced users can always force Resource to be the Lambda Function(s) they want to optimize.

As far as actions are concerned, Initializer, Executor, Finalizer and Cleaner need the following Lambda permissions (only 7 out of 28):

GetAlias
UpdateFunctionConfiguration
PublishVersion
DeleteFunction (always with Qualifier)
CreateAlias
DeleteAlias
Invoke

Provide results for all power levels

Soetimes it is preferable to opt for a power level that takes shorter execution time rather than one that minimizes the cost. (https://www.jeremydaly.com/15-key-takeaways-from-the-serverless-talk-at-aws-startup-day/). Accordingly it will be nice if the tool produces a CSV file containing for each power level, the corresponding execution time and the cost.

Question not an issue: In output, i see cost as 8.32e-7. Is that the cost for that execution we need to pay?

In output, i see cost as 8.32e-7. Is that the cost for that execution we need to pay?

{
  "power": 512,
  "cost": 8.32e-7,
  "duration": 47.3920138888889,
  "stateMachine": {
    "executionCost": 0.0003,
    "lambdaCost": 0.003546192,
    "visualization": "https://lambda-power-tuning.show/#<Embeded code>"
  }
}

ResourceNotFoundException: Functions from 'us-east-1' are not reachable in this region ('us-west-1')

Seems like cross region usage is not available for lambdas so, the Step Function should be deployed on each region, am I right, or is it just a coding restriction and can be improved?

Full trace from CloudWatch Logs:

`START RequestId: c48d4975-d571-46d7-9835-9a7f84e0300f Version: $LATEST
2019-05-14T13:32:20.857Z c48d4975-d571-46d7-9835-9a7f84e0300f { ResourceNotFoundException: Functions from 'us-east-1' are not reachable in this region ('us-west-1')
at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:48:27)
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:52:8)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:683:14)
at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request. (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
at Request. (/var/runtime/node_modules/aws-sdk/lib/request.js:685:12)
message: 'Functions from 'us-east-1' are not reachable in this region ('us-west-1')',
code: 'ResourceNotFoundException',
time: 2019-05-14T13:32:20.857Z,
requestId: 'b3418c32-764c-11e9-ab60-7505d4afde13',
statusCode: 404,
retryable: false,
retryDelay: 52.181729400208376 }
2019-05-14T13:32:20.932Z c48d4975-d571-46d7-9835-9a7f84e0300f Error: Interrupt
at /var/task/initializer.js:68:27
at
at process._tickDomainCallback (internal/process/next_tick.js:228:7)
2019-05-14T13:32:20.932Z c48d4975-d571-46d7-9835-9a7f84e0300f Error: Interrupt
at /var/task/initializer.js:68:27
at
at process._tickDomainCallback (internal/process/next_tick.js:228:7)
2019-05-14T13:32:20.932Z c48d4975-d571-46d7-9835-9a7f84e0300f Error: Interrupt
at /var/task/initializer.js:68:27
at
at process._tickDomainCallback (internal/process/next_tick.js:228:7)
2019-05-14T13:32:20.932Z c48d4975-d571-46d7-9835-9a7f84e0300f Error: Interrupt
at /var/task/initializer.js:68:27
at
at process._tickDomainCallback (internal/process/next_tick.js:228:7)
2019-05-14T13:32:20.932Z c48d4975-d571-46d7-9835-9a7f84e0300f Error: Interrupt
at /var/task/initializer.js:68:27
at
at process._tickDomainCallback (internal/process/next_tick.js:228:7)
2019-05-14T13:32:20.934Z c48d4975-d571-46d7-9835-9a7f84e0300f
{
"errorMessage": "Interrupt",
"errorType": "Error",
"stackTrace": [
"/var/task/initializer.js:68:27",
"",
"process._tickDomainCallback (internal/process/next_tick.js:228:7)"
]
}

END RequestId: c48d4975-d571-46d7-9835-9a7f84e0300f
REPORT RequestId: c48d4975-d571-46d7-9835-9a7f84e0300f Duration: 1510.06 ms Billed Duration: 1600 ms Memory Size: 128 MB Max Memory Used: 59 MB `

Cleanup after failed execution

I noticed if a lambda is long (200 seconds) running it in default non parallel mode, will cause the executioner to time out, because it will run it in sequence.

note: documentation is wrong, the parameter is parallelInvocation not enableParallel

The failure results in executioner exceeding 300sec run time after the 2nd invocation. If you set num to 10, it will take 200 * 10 seconds to finish which it will never do.

In case of failure, it should still clean up the aliases/versions it created.

And the error message is unknown, it should say that the executioner timed out, or something to indicate that the person should run these in parallel mode.

Error

Lambda.Unknown
Cause

The cause could not be determined because Lambda did not return an error type.

Side note, would be nice to finish the rest and not cancel on first failed. If you have an array of lambdas 128 to whatever, and 128 is too small, it breaks the testing for the rest of them.

Question around parallelInvocations

When i run the framework with parallelinvocation=false the invocation times are great - in millisecs interval. But when we run with the same flag = true - times are like in 100s of secs. Ours is busines critical app and need to be under 75 ms. Can you please explain a bit more around how parallelinvocation is implemented and what happens under the hood? Though reviewing code is easy, please share your thoughts and insights as well so we can learn and adjust our code accordingly.

Document IAM role permissions

As mentioned in #5, the Resource attribute of the default IAM role could be restricted so that the state machine can only interact with the configured Lambda Function(s).

Resource is set to * by default because the original goal was to provide any lambdaARN at runtime.

We should document how to update such configuration manually, or eventually implement an additional parameter at generation-time.

The new parameter could look like this:

$ npm run generate -- -A ACCOUNT_ID -L arn:aws:lambda:*:*:function:MyFunctionName

Integrate visualization data with custom data ingestion?

We could allow users to provide a custom URL or SNS/SQS ARN to be notified about new performance tests and eventually collect data over time for further analysis.

Improve weighted payload logging in case of invocation error

I prepared each function for CRUD operation of data.
Among them, I wanted to collect the statistics of the function for deletion, but the deletion target put in Payload is deleted by the first execution, and an error occurs from the second time.
In order to avoid this, I prepared data to be deleted more than the total number of executions and gave the payload a weight of all 1 and executed it, but I could not execute it because of an Invalid payload weight (num is too small) error was.

Please tell me a good way to handle such a function.

Weighted optimization strategy

Based on @pavelloz's feedback in #31, we could have a configurable weight between speed and cost.

Since there are many different use cases and very subjective ways to optimize for cost vs. speed, such a weight would need to be very well documented, imho.

In the long-term, we might be able to "categorize" a given function into some sort of optimization class based on the speed-cost relation across memory configurations, and come up with a globally optimal strategy for each class.

FYI @matteo-ronchetti is already working on the first iteration of this :)

Add state machine invocation command

For now, you are to manually start the state machine and provide the correct input.

There should be a simple command that would take care of:

generate the input object based on user-provided params
start the state machine and monitor its status
fetch the state machine output and clearly visualize it
eventually, the script could also set the new power level to the optimal one (or reset it to the original value)

Use the correct regional base price

The base price for Lambda executions (128MB, 100ms) is $0.0000002083 in almost every region.

Here are the regions where the price is slightly different:

Hong Kong (ap-east-1): $0.0000002865 (+37%)
Cape Town (af-south-1): $0.0000002763 (+32%)
Bahrain (me-south-1): $0.0000002583 (+24%)

This difference should be considered in two places:

in the state machine output results.stateMachine.lambdaCost
in the visualization (charts)

We should update the utilities utils. computePrice(...) and utils. computeTotalCost(...), used by the Executor function here.

Thanks #75 for bringing this up.

Is correct the relation between RAM Configuration/Cost by ms?

Hi Alex!

After some testing with the tool I have seen incorrect numbers on the graph.

I have compared the time/costs results of one lambda execution in my region (EU-Ireland) with the result that appears hover the graph. And...

Here my calculation:

Data from AWS Lambda Pricing (EU-Ireland):

RAM Configuration: 128MB
Cost by 100ms: $0,0000002083
Cost by 1ms: $0,0000000020830

State Machine execution graph results:

Size: 128MB
Time: 366ms
Cost: 0,00000083$

I guess cost is incorrect because...

Cost with Lambda Pricing (EU-Ireland): 366ms * $0,0000000020830 cost/ms = $0,0000007623780

Also with other RAM configurations the comparation with my calculations fails...

I don't know if I'm doing smth wrong...It was only a doubt during my researching about lambda performance. 👍

Thank you in advance!

Invocation error with Cannot read property of undefined

Though the execution succeeded, it shows error and empty object in Output

The input i passed is below as mentioned in guide: I changed the arn with my lambda arn.

{
"lambdaARN": "your-lambda-function-arn",
"powerValues": [128, 256, 512, 1024, 2048, 3008],
"num": 10,
"payload": {},
"parallelInvocation": true,
"strategy": "cost"
}

Should i pass payload?

Here is the complete error:

{
"error": "Error",
"cause": {
"errorType": "Error",
"errorMessage": "Invocation error: {"errorType":"TypeError","errorMessage":"Cannot read property 'id' of undefined","trace":["TypeError: Cannot read property 'id' of undefined"," at Runtime.module.exports.get [as handler] (/var/task/todos/get.js:13:32)"," at Runtime.handleOnce (/var/runtime/Runtime.js:66:25)"]}",
"trace": [
"Error: Invocation error: {"errorType":"TypeError","errorMessage":"Cannot read property 'id' of undefined","trace":["TypeError: Cannot read property 'id' of undefined"," at Runtime.module.exports.get [as handler] (/var/task/todos/get.js:13:32)"," at Runtime.handleOnce (/var/runtime/Runtime.js:66:25)"]}",
" at /var/task/executor.js:114:19",
" at processTicksAndRejections (internal/process/task_queues.js:97:5)",
" at async Promise.all (index 1)",
" at async runInParallel (/var/task/executor.js:119:5)",
" at async Runtime.module.exports.handler (/var/task/executor.js:31:19)"
]
}
}

ResourceNotFoundException: Function not found

I see this below error when i run the Power tuning with ParallelInvocation : false. But when i run with ParallelInvocation: true..It works.

MyInput:

{
  "<LambdaARN>",
  "powerValues": [128, 256, 512, 1024, 2048, 3008],
  "num": 100,
  "payload": {
    "headers": {
      "Authorization": "<Auth Token>",
      "x-api-key": "<API Key>"
    }
  },
  "parallelInvocation": false,
  "strategy": "balanced",
  "balanceWeight": 0.5
}

Error:

{
  "error": "Lambda.Unknown",
  "cause": "The cause could not be determined because Lambda did not return an error type."
}

{
  "error": "ResourceNotFoundException",
  "cause": {
    "errorType": "ResourceNotFoundException",
    "errorMessage": "Function not found: arn:aws:lambda:<region>:<accno>:function:GetCustomerProfile:RAM256",
    "trace": [
      "ResourceNotFoundException: Function not found: arn:aws:lambda:<region>-<accno>:function:GetCustomerProfile:RAM256",
      "    at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:51:27)",

Can you please help on this?

Default nodejs version runtime

Looking at https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html

I wonder if https://github.com/alexcasalboni/aws-lambda-power-tuning/blob/master/template.yml#L27 should be 10.x by default.

I know there are differences between AMI and AMI2, and even in my case i could not migrate one lambda that uses aws-chrome-lambda because of incompatibilities.

But there is a case to be made for either setting it to a current AWS recommended version, or writing one sentence on it in documentation as a heads up. :)

Or maybe i dont understand how it works yet, im not the best in reading SAM/CF manifests.

Refactor Node.js code (new ES)

The current implementation is not very readable because of promises & callbacks hell.

I'd like to refactor it to use async/await syntax.

Can we use the tool for stress testing?

If you test only one power configuration and use a very large num, this tool could be used for stress testing Lambda functions and visualize average cost and execution time.

We could design a different visualization to be used when testing only one power configuration, where we could visualize more detailed statistics.

Use AWS Step Functions plugin for Serverless Framework

Use this plugin instead of a custom CloudFormation resource.

Compute and report total cost of state machine execution

The state machine could return its own total cost of execution.

I think this would add more transparency to Lambda Power Tuning.

The cost should include both Lambda execution costs and Step Functions execution costs (even though the max step transitions is always around 15-20, which means less than $0.0005 per state machine execution).

Depending on the value of num, Lambda costs will likely outweigh Step Functions cost. For example, even with a "no-op" function and num: 100, we can expect the overall Lambda cost to be around $0.001.

I will add some more documentation about costs too.

Missing or empty optimal value

Running through tuning app with version 3.2.3 I get an error possibly related to new dryRun parameter
I get the following error:

{
  "errorType": "Error",
  "errorMessage": "Missing or empty optimal value",
  "trace": [
    "Error: Missing or empty optimal value",
    "    at validateInput (/var/task/optimizer.js:40:15)",
    "    at Runtime.module.exports.handler (/var/task/optimizer.js:14:5)",
    "    at Runtime.handleOnce (/var/runtime/Runtime.js:66:25)"
  ]
}

when the input is:

{
  "lambdaARN": "lambdaARN",
  "powerValues": [
    128,
    256,
    512,
    1024,
    2048,
    3008
  ],
  "num": 30,
  "payload": {},
  "strategy": "speed",
  "dryRun": true,
  "parallelInvocation": false,
  "stats": [
    {
      "averagePrice": 2.08e-7,
      "averageDuration": 36.37111111111112,
      "totalCost": 0.0000066560000000000045,
      "value": 128
    },
    {
      "averagePrice": 4.16e-7,
      "averageDuration": 2.3211111111111116,
      "totalCost": 0.000012480000000000008,
      "value": 256
    },
    {
      "averagePrice": 8.32e-7,
      "averageDuration": 2.4144444444444444,
      "totalCost": 0.000024960000000000015,
      "value": 512
    },
    {
      "averagePrice": 0.000001664,
      "averageDuration": 2.355,
      "totalCost": 0.00004992000000000003,
      "value": 1024
    },
    {
      "averagePrice": 0.000003328,
      "averageDuration": 2.364444444444444,
      "totalCost": 0.00009984000000000006,
      "value": 2048
    },
    {
      "averagePrice": 0.0000048880000000000005,
      "averageDuration": 2.255555555555555,
      "totalCost": 0.00014664000000000005,
      "value": 3008
    }
  ],
  "analysis": null
}

Not sure why the analysis field is null? In my input to the Step Function I don't define this so whatever generating it seems to have a problem with the dryRun.
Removing dryRun the run works fine

Regional base price selection (Step Functions)

Similarly to #77, we should use the correct regional price of Step Functions based on where the state machine is executed (which might be different from the input function's region!).

Each state transition costs $0.000025 in almost every region, with some exceptions:

default: $0.025
eu-south-1: $0.02625
us-west-1: $0.0279
af-south-1: $0.02975
ap-east-1: $0.0275
ap-south-1: $0.0285
ap-northeast-2: $0.0271
eu-south-1: $0.02625
eu-west-3: $0.0297
me-south-1: $0.0275
sa-east-1: $0.0375
us-gov-east-1: $0.03
us-gov-west-1: $0.03

The state machine execution cost is computed here:

module.exports.stepFunctionsCost = (nPower) => +(0.000025 * (6 + nPower)).toFixed(5);

The formula to compute the # of state transitions is: 6 + COUNT(POWERVALUES), therefore the Step Functions cost will be REGIONAL_COST * NUMBER_OF_TRANSITIONS.

Customizable execution timeout

Currently, the Executor timeout is 300 seconds (5 minutes) and there is no way to customize it if you deploy via SAR.

Also, the same timeout should be configured on the state machine task to make sure the timeout error is properly handled (otherwise Lambda.Unknown is detected instead of States.Timeout).

It could be a simple CloudFormation parameter, used for both values.

3.1.1: Execution output: {} after running execute.sh

Steps:

./deploy.sh
Result (already ran the deploy, just wanted to make sure it was deployed):

Waiting for changeset to be created..
Error: No changes to deploy. Stack lambda-power-tuning is up to date

./execute.sh
Result:

-n .
// etc
SUCCEEDED
Execution output:
{}

Checked the logs: aws stepfunctions get-execution-history --profile default --execution-arn $EXECUTION_ARN

There are 96 entries in the logs like:

        {
            "timestamp": 1578505766.01,
            "type": "LambdaFunctionFailed",
            "id": 124,
            "previousEventId": 96,
            "lambdaFunctionFailedEventDetails": {
                "error": "Error",
                "cause": "{\"errorType\":\"Error\",\"errorMessage\":\"Invocation error: {\\\"errorType\\\":\\\"string\\\",\\\"errorMessage\\\":\\\"{\\\\\\\"statusCode\\\\\\\":\\\\\\\"500\\\\\\\",\\\\\\\"message\\\\\\\":\\\\\\\"An unexpected error occurred\\\\\\\"}\\\",\\\"trace\\\":[]}\",\"trace\":[\"Error: Invocation error: {\\\"errorType\\\":\\\"string\\\",\\\"errorMessage\\\":\\\"{\\\\\\\"statusCode\\\\\\\":\\\\\\\"500\\\\\\\",\\\\\\\"message\\\\\\\":\\\\\\\"An unexpected error occurred\\\\\\\"}\\\",\\\"trace\\\":[]}\",\"    at utils.range.map (/var/task/executor.js:67:19)\",\"    at process._tickCallback (internal/process/next_tick.js:68:7)\"]}"
            }
        },

So it appears to have failed but not caught it and then given an empty output.

Config:

{
    "lambdaARN": "arn:aws:lambda:us-west-2:etcetc",
    "powerValues": "ALL",
    "num": 5,
    "parallelInvocation": true,
    "strategy": "speed",
    "payload": [ {...} ]
}

Tried with explicit powerValues and parallelInvocation: false as well.

Payload does not support GET Methods

I am trying to test with lambda functions that expects Query parameters. I am passing the params via payload, but throws invocation errors with null object references.
Our functions expects Query parameters. Any chance we can support query params to be passed with payload?

Update runtime to Node.js 6.10

The project was created before the new runtime announcement.

Switching to Node 6.10 shouldn't involve any code change, but it would be nice to make the code more ES6-friendly once we do it.

Use new AWS SAM resource: AWS::Serverless::StateMachine

This will simplify the state machine readability and maintainability (potentially even as an external file, if we can handle the Lambda functions references correctly).

Documentation here: https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-resource-statemachine.html

memorySize of the lambdas should be 128.

There is no reason for the tester functions to be so big,

functions:
  initializer:
    handler: lambda/initializer.handler
    memorySize: 128
    timeout: 60
  executor:
    handler: lambda/executor.handler
    memorySize: 128
    timeout: 300
  cleaner:
    handler: lambda/cleaner.handler
    memorySize: 128
    timeout: 60
  finalizer:
    handler: lambda/finalizer.handler
    memorySize: 128
    timeout: 60

Integrate stats visualization (chart)

@matteo-ronchetti has developed a simple web interface that we can integrate in the state machine output. This way, users can simplify click on a link/URL and visualize useful numbers about cost and performance.

This should be easy to implement in the finalizer function, or maybe as a third parallel step.

I've considered making this an opt-in feature, but I think most users will benefit from it and I can't see any relevant data privacy concern since you can simply not click on that link.

The UI is currently hosted as an Amplify Console app here: https://master.d19f2a8daatc3f.amplifyapp.com

You can provide input data including it in the URL hash: https://master.d19f2a8daatc3f.amplifyapp.com/index.html#gAAAAQACAAQABg==;AACAQQAAAEEAAIBAMzMzQGZmBkA=;CtcjPG8SAzwK16M7vHQTPKabRDw=

The hash structure is as follows: <encode(power_values)>;<encode(execution_time);<encode(execution_cost)>.

For example:

let sizes = [128, 256, 512, 1024, 1536];
let times = [16.0, 8.0, 4.0, 2.8, 2.1];
let costs = [0.01, 0.008, 0.005, 0.009, 0.012];
let hash = encode(sizes, Int16Array) + ";" + encode(times) + ";" + encode(costs);

where

const base64js = require('base64-js');

function encode(input, c = Float32Array) {
    input = new c(input);
    if (!(input instanceof Uint8Array)) {
        input = new Uint8Array(input.buffer)
    }
    return base64js.fromByteArray(input);
}

Optimise multible functions at once

As stated in #55 it would be nice to be able to optimize multiple chained functions at once. @alexcasalboni stated that he thinks the optimum for the overall chain would in his opinion correlating with the optimum of all indiviual functions.

I researched a bit and there is research (a bachelor thesis I sadly can't share) that the pareto optimum for all functions differs indeed from just the edge optimization of the single functions individually.

It would be nice to be able to optimize a chain of functions or even a step function state machine with this tool.

API Gateway end-to-end testing?

How could we support this?

Would it be mutually exclusive wrt Lambda or maybe a totally independent branch?

APIGW comes with many configurations that might make performance tuning less reliable such as caching, WAF, endpoint type (regional or edge-optimized), etc.

We could simply invoke the API endpoint instead of the Lambda function, but I'm not 100% sure of what the benefit would be.

Set of test payloads

Lets take the hello world of lambda world for example: compress image from s3 using sharp.

Having a function that does complicated thing which depends so heavily on the input, forces us to test it using multiple different variants, ie:

big file
small file
format 1, format 2, format 3
extremely big file
transformations passed as options

It would be nice if Power Tuning could take multiple events as an inputs, and before recommending power, take into consideration output from all the different tests.

Ideally, I could tell the percentage of tests run by any given test (ie. medium file with jpeg format - 50%, extremely big file - 3%, transformations - 5%) - so that extremely big file would not skew the results too much (average likes to do that).

"The execution failed (check execution logs)" -- where are the logs?

I don't see a logs folder of any kind. Nor a recently created text file. I'm hunting around in the CLI docs, maybe that's where the answer is but it would be very helpful to have some sort of hint in the execution output here...

Upgrade to Node.js 10

Simple YAML edit:

Runtime: nodejs10.x

Plus some integration tests :)

AWS no longer supports nodejs4.3

Deploying gives this error: An error occurred: CleanerLambdaFunction - The runtime parameter of nodejs4.3 is no longer supported for creating or updating AWS Lambda functions. We recommend you use the new runtime (nodejs8.10) while creating or updating functions. (Service: AWSLambdaInternal; Status Code: 400; Error Code: InvalidParameterValueException; Request ID: aa86828b-4748-11e9-9731-7d5c960221fe)

Changing all references to be nodejs8.10 appears to work