Lambda Pattern: Hopper -

Hopper:

A container for a loose bulk material such as grain, rock, or rubbish, typically one that tapers downward and is able to discharge its contents at the bottom.

A person or thing that hops.

A simple pattern I’ve been using lately when working with serverless architecture is what I’ve been calling a hopper, i.e. a Lambda function that takes as argument a path to some semi-structured data that can be iterated over then passed onto another Lambda function with the purpose of performing some well defined and isolated task. The result can then be passed onto some other medium for display, etc.

The main reasons behind this pattern are:

Breaking down a Lambda function that hits the upper bound of five minutes runtime,
Getting around query rate limits on S3, if you’re making queries that involve a lot of objects break the queries into chunks,
Promoting the creation of simpler, modular code with well defined purpose,
Having defined patterns when working with code-based infrastructure means you’re not reinventing the wheel.

The great thing about this pattern is that it’s pretty easy to setup if you have used serverless architecture before, If you’re looking for a good first project with Lambda you could check out my prior blog post about managing the tag configuration of your AWS instances. The hopper pattern is also a first good step if you need to do some bulk processing but want to start small, the hopper can execute as many other Lambda functions it can in a five minute window, running each in a parallel manner (remembering Lambda has a concurrent function limit of 100). This could be a good first step before setting something up on EC2 or EMR.

In this article I’ll go through this reusable pattern, using S3 as the holding place for data sets, a Lambda function with a Python handler as the hopper, related roles and permissions and a CloudWatch Event Rule to trigger runs on the Lambda function to allow it to be run on a regular schedule. I’ll also include a script to automate the creation/destruction of the infrastructure. To begin with we will look at the Python code for the hopper.

Python Handler Code

The handler code is very straightforward, I check the event payload and confirm the bucket containing the configuration is present as is the name of the Lambda function. We then simply iterate over the dataset and send a request through Boto to invoke the passed function, so far so good.

import boto3
import json

S3 = boto3.client('s3')
lambda_client = boto3.client('lambda')

def handler(event, context):
    print 'Event:', event
    if 'config_bucket' not in event:
        raise Exception('Missing config_bucket')
    if 'lambda_function' not in event:
        raise Exception('Missing lambda_function')

    s3 = boto3.resource('s3')
    bucket = s3.Bucket(event['config_bucket'])
    result = bucket.meta.client.list_objects(
        Bucket=bucket.name
    )
    print result
    if not result.get('Contents'):
      raise Exception('Missing S3 content')
    for dataset_prefixes in result.get('Contents'):
        event['dataset_prefix'] = dataset_prefixes['Key']
        response = lambda_client.invoke(
            FunctionName=event['lambda_function'],
            InvocationType='Event',
            Payload=json.dumps(event)
        )
        print response

Now that we have the handler we need to put it into a compressed format and send it onto S3 for access, but before this I’ll define the infrastructure that needs to exist for the Lambda function to operate. To do this I will use Cloudformation templating language, specifying our infrastructure as a code artifact.

Hopper Cloudformation Template

For the hopper to work the following AWS infrastructure resources are required:

IAM role with S3 and lambda access,
Lambda permissions to allow this Lambda function to invoke others,
The Lambda function in question,
The Cloudwatch Event Rule to allow the hopper to be triggered on a regular basis, in this case every fifteen minutes.

---
  Description: 'Lambda function Hopper, looks for available datasets and runs a Lambda function.'
  Parameters:
    ConfigBucketName:
      Description: The name of the S3 bucket containing configuration.
      Type: String
    LambdaFunctionName:
      Description: The name of the Lambda function being executed.
      Type: String
  Resources:
    HopperLambdaRole:
      Type: AWS::IAM::Role
      Properties:
        ManagedPolicyArns:
          - arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
          - arn:aws:iam::aws:policy/AWSLambdaFullAccess
        AssumeRolePolicyDocument:
          Statement:
            - Action: sts:AssumeRole
              Effect: Allow
              Principal:
                Service: lambda.amazonaws.com
    HopperLambdaPermission:
      Type: "AWS::Lambda::Permission"
      Properties:
        Principal: "events.amazonaws.com"
        Action: lambda:InvokeFunction
        FunctionName: "hopper"
        SourceArn: !GetAtt HopperEventRule.Arn
    HopperLambdaFunction:
      Type: AWS::Lambda::Function
      Properties:
        FunctionName: "hopper"
        Handler: handler.handler
        Role:
          !GetAtt HopperLambdaRole.Arn
        Code:
          S3Bucket:
            !Ref ConfigBucketName
          S3Key: hopper.zip
        Runtime: python2.7
        MemorySize: "512"
        Timeout: "240"
    HopperEventRule:
      Type: AWS::Events::Rule
      Properties:
        Description: CloudWatch Event Rule to initiate the hopper which initiates the target Lambda function on the datasets in a round robin fashion.
        Name: "hopper"
        ScheduleExpression: "rate(15 minutes)"
        State: ENABLED
        Targets:
          - Arn:
              !GetAtt HopperLambdaFunction.Arn
            Id:
              'Hopper'
            Input:
              !Sub '{ "lambda_function": "${LambdaFunctionName}", "config_bucket": "${ConfigBucketName}" }'

Save this as Hopper.yml, now we will define some scripts that will orchestrate the compression and uploading of the handler script, then create a stack with the required infrastructure.

Infrastructure Provisioning Script

Here I’ve written a couple of scripts in Bash that deploys and brings down our Lambda function when no longer needed. Be sure to run chmod +x deploy_hopper.sh && chmod +x cleanup_hopper.sh before executing to make sure the scripts can be executed. These can be run using commands:

./deploy_hopper.sh <s3_config_bucket> <lambda_function_name>

and

./cleanup_hopper.sh <s3_config_bucket>

This project does not include the code to the lambda function you wish to run or the S3 bucket containing your configuration to be iterated over, it’s assumed these already exist and can be referened.

Deploy script: deploy_hopper.sh

Here’s the deploy script, it compresses the handler function and sends it to your configuration bucket to be referenced in the Cloudformation. It then creates the Cloudformation parameters and creates a stack based off the created file. Finally it cleans up after itself, removing the parameter file.

#!/bin/bash -e

AWS_REGION="--region ap-southeast-2"
s3_config_path=${1}
lambda_function_name=${2}

zip -r9 hopper.zip handler.py
aws s3 cp ${AWS_REGION} ${zipfile} hopper.zip s3://${s3_config_path}/
hopper_stack_name=Hopper

cat << EOF > params.json
[
{"ParameterKey":"ConfigBucketName","ParameterValue":"${s3_config_path}"},
{"ParameterKey":"LambdaFunctionName","ParameterValue":"${lambda_function_name}"}
]
EOF

echo $(date -u +"%Y-%m-%dT%H:%M:%S") Creating ${hopper_stack_name}.
aws ${AWS_REGION} cloudformation create-stack \
  --capabilities CAPABILITY_IAM \
  --stack-name ${hopper_stack_name} \
  --template-body file://Hopper.yml \
  --parameters file://params.json
aws ${AWS_REGION} cloudformation wait stack-create-complete --stack-name ${hopper_stack_name}

rm params.json
rm hopper.zip

Decommission script: cleanup_hopper.sh

This script deletes the stack previously created and deletes the uploaded Lambda package sent to S3. The final command also includes a wait to confirm the delete is completed, in the case that the script is chained to operate with other scripts. The wait can safely be removed if not necessary.

#!/bin/bash -e

AWS_REGION="--region ap-southeast-2"
hopper_stack_name=Hopper
s3_config_path=${1}
echo $(date -u +"%Y-%m-%dT%H:%M:%S") Deleting ${hopper_stack_name}.
aws cloudformation ${AWS_REGION} delete-stack --stack-name ${hopper_stack_name}
aws s3 rm ${AWS_REGION} s3://${s3_config_path}/hopper.zip
aws cloudformation ${AWS_REGION} wait stack-delete-complete --stack-name ${hopper_stack_name}

Once created put these scripts in the same directory as the handler.py and Hopper.yml and run the deployment script. You now have a fully automated building block to add to your infrastructure arsenal!

Summary

As can be seen this is a pattern that can be used to extend your Lambda functions beyond usual AWS limits. It’s also easily extensible, variables can be added to the code to take into account build numbers or environments. Post me your results or errata in the comments, I’m really interested to see how people go with this, I initially got caught with the Event rule getting invocation errors, the Lambda permission specified saved the day; big thanks to Rowan Udell for his article on CloudWatch Event Rules!

Finally as with most AWS resources running and creating these infrastructure resources has a cost associated with them, be careful to remember that whatever you’re processing with the hopper and storing in S3 will have a price associated with it, especially given that the hopper is attached to a repetitive event rule. Deprovision this stack once done to save your credit card.

Update (25/02/2017): I’ve added the code to a Github repository to setup a projects pattern for future examples. The code can be found at https://github.com/galactose/ashinycloud-projects.