How to deploy custom ML models on AWS Sagemaker - Curacel

How to deploy custom ML models on AWS Sagemaker

September 14, 2021

Different Paths to deployment 

Deploying a model for production deployment usually involves a few components

  1. The storage for the model artifacts
  2. The Inference code that uses the model artifacts and an estimator object to make inference
  3. A web application that is able to accept http requests and routes the request to the inference code.

The first can either sit on the same server or be served through dedicated storage such as AWS S3, Google Cloud Storage on on-premise storage.

The inference code usually lives in the same codebase as the serving web application.

The web application will also handle authentication and authorization of the resources. 

The web application should Ideally also handle load balancing to distribute the incoming requests. Depending on how frequently the inferences are used, this might be a deal-breaker in serving your customers. 

At Curacel we take a fully-managed AWS approach using its Sagemaker platform. Sagemaker provides a complete end-to-end ML platform that enables you to label data, train, deploy and monitor models – all leveraging the cloud. It also supports situations where you have trained your model on another machine and just wants to take advantage of its managed inference infrastructure. 

In this article, we detail the steps you can take to deploy a custom machine learning model all the way to serve it through an API.

Steps on how to successfully deploy your model using AWS sagemaker pipelines :

In this section, I will give you the basic practical know-how on the needed operations. To make this process easy enough to follow, we have set up a Github repo, using a dataset but with a different model. The original repo can be found here . First of all, we need to understand the whole process, and the diagram below aims to explain it. 

According to the diagram, we assume your ML is providing some sought of prediction for a mobile client. This can also be a web application or any other system for that matter. The mobile client talks directly with a laravel backend through a RESTFul API. 

To enable your machine learning service to work with the backend, you need to build a separate service for it. By doing this, you can easily make changes to your model without having to obstruct the flow of operations of your laravel backend. This is one of the benefits of a microservice architecture.  

The first stage of this process is to containerize your model in a flask application using docker, after successful builds and testing, you can then push your successful docker image to amazons’ elastic container registry (ECR). What follows next is to create a notebook using AWS sagemaker’s notebook instances to interact with the respective docker image on AWS ECR. Once the connection is successful, you create a sagemaker endpoint to link your sagemaker notebook. Using a lambda function, we invoke the sagemaker endpoint and then finally use AWS gateway API to expose a JSON interface that the backend will be talking to. 

This is the API that your backend hits when your prediction algorithm is needed in action. As we go on, I will explain how making updates to the model will work.

  1. Setting up and testing locally:

This is the first stage in deploying with AWS. To get this section started here are some things you need to have on your system: postman, docker, and a virtual machine for docker if you are using a windows system. You can clone this repo . It contains all the files needed for deployment and the folder structure is designed in order for AWS to access the files it needs. 

Looking through the folder you see a Dockerfile, this dockerfile contains the language, libraries and other necessities that will be needed to run your model in the container. For instance if you want to use a neural network, it is with the docker file you use to install the required libraries. At the end of the docker file where you see  “COPY ftree /opt/program” ftree is the name of the directory that contains the train, serve, prediction and other files. If you would like to rename that folder make sure to replace the ftree with the new name in the dockerfile.

Before any docker builds are done, here are some things to do especially If you are using a windows system. When you clone this repo to your local machine the folder would be named “rforest-aws-container”. On your git bash cli type this : 

find forest-aws-container -type f -exec dos2unix {} \;

This line converts all the files from DOS/MAC format to unix format.

It is really important especially for Windows users to use git bash cli.

Then, change the directory into the “rforest-aws-container” directory and run the following:

chmod +x ftree/train

chmod +x ftree/serve

Now, this has been completed,  you can build the  docker image by running :

docker build -t my-tree . 

Note: “my-tree” is the name of the docker image, you can name yours whatever you want.

Also, do not forget the “.”  after your image name. If you are a windows user make sure your virtual machine is active.

When your build is successful, your cli should look like the image below.

You could also run “docker images” to see all the images that have been successfully built. Once your cli looks like the image below.

The next step in this process is to train the model in the container, you do this by running the following:

(Linux users)

docker run --rm -v  $(pwd)/local_test/test_dir:/opt/ml my-tree train

(windows users)

docker run  --rm -v /$(pwd)/local_test/test_dir:/opt/ml my-tree train 

You should see the following output on your CLI:

Starting the training process ...
Training process is complete.

Next, we deploy the server locally so we can make requests:

(Linux users)

docker run --rm --network=host -v  $(pwd)/local_test/test_dir:/opt/ml my-tree serve

(windows users)

docker run  --rm  --network=host -v /$(pwd)/local_test/test_dir:/opt/ml my-tree serve 

The image below shows that your server is live and you can make requests to it. 

There are two ways to test out the server,  you can create another git bash cli and type in the following:

curl http://<docker_ip>:8080/ping

To obtain your docker ip you can get it by typing the following in the new git bash cli you just created:

docker-machine ip default

The output obtained would be similar to the following:

On the other terminal, the below image will be displayed :

The second way would be to use postman. Similar to the first method all you have to do is make a GET request using postman. The link to make the GET request is :


If you go through the ftree folder, you see a file. This is the flask application. Going through the file you will see that the “/ping” is the GET request path. The POST request path is given as “/invocations”. To make a POST request we do the following.

  • In the headers section set the Key to Content-Type and Value to text/csv.
  • Set the body to raw.
  • The data to be sent is :

  –   5.1,3.5,1.4,0.2

The link to make the post request to is:


You should have an output similar to the following image.

Another way of testing the post functionality is to:

  • ctrl + c on the running server
  • cd local_test

Then run the following command:

./ <docker_ip>:8080 payload.csv text/csv

The output would be similar to the image below.

Since that works fine, you can change the directory back to the parent directory using the following

cd ..

Good, everything works locally.

Note: Assuming after you have done the first build successfully, and somewhere and along the line you decide to change the code, you have to rebuild the image for the change to properly reflect. The only thing to note is when rebuilding an image you have to add a tag. 

An example of such is:

docker build -t=my-tree:v.0 . 

Now if you want to train and serve you add the tag:

docker run --rm -v  $(pwd)/local_test/test_dir:/opt/ml my-tree:v.0 train

docker run --rm --network=host -v  $(pwd)/local_test/test_dir:/opt/ml my-tree:v.0 serve

  1. Moving to AWS Services:

Creating the ECR:

Now that everything works locally, it is time to push our working code to AWS elastic container repository.  You can create a repository here.

Scroll to the bottom and click create a repository.

It is better to name your repository the same name as your docker image.

Now that you have created a repo, you have to add the following permissions to allow you to push to the repository you just created, 

Click on permissions:

Then click on Edit Policy JSON

Copy this and replace it with what is currently there.

  "Version": "2008-10-17",
  "Statement": [
      "Sid": "AllowPushPull",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::< aws_account_id >:user/image-data-access"
      "Action": [

Save it.

Go through the link on how to set up AWS CLI on your computer.

For windows users, you can download the cli version meant for windows but use the cli command meant linux/macOS as long as you are using your git bash cli.

Back on your AWS ECR page, click on “view push commands”  to see the commands needed to push the code you have locally to the AWS repo you created.

Once the push is successful you should see the docker image in the repository created:

Copy the Image URI because it will be needed in the next section.

Connecting to Amazon SageMaker Jupyter Notebooks:

Now that we have our repository properly set up, all we need to do is connect it to Amazon SageMake Jupyter notebooks.  Click this link to create a notebook instance and open a jupyter notebook. You can name it whatever you want. Make your notebook similar to the notebook in this link.

I had issues with the output_path, so I had to manually create it on AWS S3 bucket.

The training instance type and deployment instance are the least instance type AWS provides, if your model would need more resources you can check out this link for more details on both instance types.

When you do not need the Endpoint anymore, you can delete it with the last line of code.

Connecting to Amazon Lambda Function:

Head over to AWS lambda Function here and create a new function.

When you have created the lambda function. you can paste this in the lambda code section. Then click deploy.

import os
import io
import boto3
import json
import csv

# grab environment variables
# grab runtime client
runtime = boto3.client('runtime.sagemaker')

def lambda_handler(event, context):
    # Load data from POST request
    data = json.loads(json.dumps(event))
    # Grab the payload
    payload = data['body']
    # Invoke the model. In this case the data type is a CSV but can be other things such as a JSON
    response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
    # Get the body of the response from the model
    result = response['Body'].read().decode()

    # Return it along with the status code of 200 meaning this was succesful 
    return {
        'statusCode': 200,
        'body': result

Note: I purposely set the ContentType=’application/json’, this is done in order to display the functionalities of ‘cloudwatch’. 

In the code, you would notice an ENDPOINT_NAME environment variable, to properly set that, click on Configuration, then environment variables. Set ENDPOINT_NAME to the name of the endpoint in your sagemaker and save it.

Now that we have that handled, head over to AWS IAM services here  to add some permissions to your Lambda function. Click on roles, then select your lambda function role

You should see a page similar to the page below:

Click on Attach policies, and add AmazonSageMakerFullAccesss.

After, all that has been done click on the AWSLambdaBasicExecutionRole, click JSON and then add “sagemaker:InvokeEndpoint” in the action section, click review policy and save changes.

All that remains is setting up API gateway and viewing the logs on cloud watch for debugging if there are any.

Connecting to API Gateway:

The last part of this connection is to create an API to interact with the lambda function, head over to AWS API Gateway here. Create an API 

Choose HTTP API and click build. 

Add the lambda Integration.  

Set the configuration route, 

And add a stage, in this case, “production”. Set it to Auto-deploy.

Click create.

The production API should be provided below.

On the left-hand side of the page click CORS and set the following parameters as seen below.

Then click Save.

Now we can make a request to the endpoint created, head over to postman, and replace the local link with the new link.

The link should be of the following format:

Note: This link may not be active in the near future.

Now, send a request, your postman screen should look like this: 

  1. Debugging with cloudwatch:

Head over to cloudwatch with this link, click log group and search for /aws/lambda/myForestFunction

View the logs to understand where the issue is coming from. In this case, it says “This predictor only supports CSV data”. Now we can go back to the lambda function for the project and set ContentType=’text/csv’ and deploy.

Make the request again, and your postman should look like this:

There are times when your model takes a while to return a prediction, to account for this, head over to the lambda function, click configuration, general configuration, and then edit the timeout value.

Things to note: 

  1. If the lambda logs is not precise enough, you could check for the sagemaker logs,  in the group logs search for /aws/sagemaker/Endpoints/random-forest-classification-endpoint-v
  1. If the bug involves you going back to your local computer to solve it and then repushing, make sure to terminate the former endpoint and also replace the old image with the new image in the notebook.

That is it, you have been able to deploy your custom model on AWS.

The authors:

Innocent Udeogu is the Head of Innovation at Curacel and leads the Data Engineering team.

Kenechi Ojukwu is a data scientist working on end to end machine learning projects at Curacel.