AWS Machine Learning Blog

Deploy TensorFlow models with Amazon Elastic Inference using a flexible new Python API available in EI-enabled TensorFlow 1.12

Amazon Elastic Inference (EI) now supports the latest version of TensorFlow­–1.12. It provides EIPredictor, a new easy-to-use Python API function for deploying TensorFlow models using EI accelerators. You can now use this new Python API function within your inference scripts as an alternative to using TensorFlow Serving when running TensorFlow models with EI. EIPredictor allows for easy experimentation and lets you compare performance with and without EI. This blog post shows you how to use EIPredictor to deploy your models on EI.

Let me start with some background. Amazon Elastic Inference is a new capability we launched at re:Invent 2018. EI provides a new, significantly more cost-effective way to apply acceleration to your deep learning inference workloads than using standalone GPU instances. EI lets you attach accelerators to any Amazon SageMaker or Amazon EC2 instance type and provides you the low latency, high throughput benefits of GPU acceleration at a much lower cost (up to 75%). You can use EI to deploy TensorFlow, Apache MXNet, and ONNX models for inference.

Using TensorFlow Serving to run models on EI

At the launch of Amazon EI we introduced EI-enabled TensorFlow Serving, which provides an easy way to run your TensorFlow models with EI accelerators without having to make any code changes. Just start a model server with EI-enabled TensorFlow Serving with your trained TensorFlow SavedModel, and make calls to it. EI-enabled TensorFlow Serving uses the same API as normal TensorFlow Serving. The only difference is that the entry point is a different binary named AmazonEI_TensorFlow_Serving_v1.12_v1. Here is an example command that you can use to launch the server:

$ AmazonEI_TensorFlow_Serving_v1.12_v1 --model_name=ssdresnet --model_base_path=/tmp/ssd_resnet50_v1_coco --port=9000

You can find EI-enabled TensorFlow Serving in the AWS Deep Learning AMIs (here’s a tutorial), or you can download the package from this Amazon S3 bucket so you can build it into your own custom Amazon Machine Image (AMI) or Docker container. EI-enabled TensorFlow Serving extends TensorFlow’s high performance model serving system to work seamlessly with EI. It automates accelerator discovery, secures your inference requests over the network with TLS encryption, and restricts access with AWS Identity and Access Management (IAM) policies.

Using EIPredictor to run models on EI

EIPredictor is a simple Python function for performing inference on a pretrained model. It is a new API function available within EI-enabled TensorFlow.  It’s also available in the Deep Learning AMI and for download using Amazon S3. You can use EIPredictor in the following ways:

  • You can use EIPredictor with a saved model or a frozen graph. It’s similar to TF predictor. Please see EI’s documentation for using EIPredictor with these model formats.
  • You can disable usage of EI by using the use_ei flag which is defaulted to True. This is useful to see how your model performs with and without EI acceleration.
  • EIPredictor can also be created from a TensorFlow Estimator. Given a trained Estimator, you first export a SavedModel. Refer to the SavedModel documentationfor more details. Example usage:
    saved_model_dir = estimator.export_savedmodel(my_export_dir, serving_input_fn)
    ei_predictor = EIPredictor(export_dir=saved_model_dir)
    //Once the EIPredictor is created, inference is done using the following:
    output_dict = ei_predictor(feed_dict)

The following code sample shows the available parameters for this function:

ei_predictor = EIPredictor(model_dir,
           signature_def_key=None,
           signature_def=None,
           input_names=None,
           output_names=None,
           tags=None,
           graph=None,
           config=None,
           use_ei=True)

output_dict = ei_predictor(feed_dict)

Example for running a model with EI Predictor

Here’s an example you can try for serving a ResNet using a Single Shot Detector (SSD) model using EI Predictor. This example assumes that you’ve launched an EC2 instance with an EI accelerator. We’re going to use the latest Deep Learning AMI here for this example.

  1. The first step is to activate the TensorFlow Elastic Inference Note that this is specific to the Deep Learning AMI. You don’t need this step if you built the EI-enabled TensorFlow library with your own custom AMI. You can choose between the Python 2 and Python 3 TensorFlow EI environments. I’ll use Python 2 for this example:
    $ source activate amazonei_tensorflow_p27
  2. Download the ResNet SSD model example from Amazon S3.
    $ curl -O https://s3-us-west-2.amazonaws.com/aws-tf-serving-ei-example/ssd_resnet.zip
  3. Unzip the model. Again, you may skip this step if you already have the model.
    $ unzip ssd_resnet.zip -d /tmp
  4. Download a picture of three dogs to your current directory.
    $ curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/3dogs.jpg
  5. Now open a text editor, such as vim, and paste the following inference script. Save the file as ssd_resnet_predictor.py
    from __future__ import absolute_import
    from __future__ import division
    from __future__ import print_function
    
    import os
    import sys
    import numpy as np
    import tensorflow as tf
    import matplotlib.image as mpimg
    import time
    from tensorflow.contrib.ei.python.predictor.ei_predictor import EIPredictor
    
    tf.app.flags.DEFINE_string('image', '', 'path to image in JPEG format')
    FLAGS = tf.app.flags.FLAGS
    if(FLAGS.image == ''):
      print("Supply an Image using '--image [path/to/image]'")
      exit(1)
    coco_classes_txt = "https://raw.githubusercontent.com/amikelive/coco-labels/master/coco-labels-paper.txt"
    local_coco_classes_txt = "/tmp/coco-labels-paper.txt"
    # Downloading coco labels
    os.system("curl -o %s -O %s" % (local_coco_classes_txt, coco_classes_txt))
    # Setting default number of predictions
    NUM_PREDICTIONS = 20
    # Reading coco labels to a list
    with open(local_coco_classes_txt) as f:
      classes = ["No Class"] + [line.strip() for line in f.readlines()]
    
    
    def main(_):
      # Reading the test image given by the user
      img = mpimg.imread(FLAGS.image)
      # Setting batch size to 1
      img = np.expand_dims(img, axis=0)
      # Setting up EIPredictor Input
      ssd_resnet_input = {'inputs': img}
    
      print('Running SSD Resnet on EIPredictor using specified input and outputs')
      # This is the EIPredictor interface, using specified input and outputs
      eia_predictor = EIPredictor(
          # Model directory where the saved model is located
          model_dir='/tmp/ssd_resnet50_v1_coco/1/',
          # Specifying the inputs to the Predictor
          input_names={"inputs": "image_tensor:0"},
          # Specifying the output names to tensor for Predictor
          output_names={"detection_classes": "detection_classes:0", "num_detections": "num_detections:0",
                        "detection_boxes": "detection_boxes:0"},
      )
    
      pred = None
      # Iterating over the predictions. The first inference request can take saveral seconds to complete
      for curpred in range(NUM_PREDICTIONS):
        if(curpred == 0):
          print("The first inference request loads the model into the accelerator and can take several seconds to complete. Please standby!")
        # Start the timer
        start = time.time()
        # This is where the inference actually happens
        pred = eia_predictor(ssd_resnet_input)
        print("Inference %d took %f seconds" % (curpred, time.time()-start))
    
      # Getting the number of objects detected in the input image from the output of the predictor
      num_detections = int(pred["num_detections"])
      print("%d detection[s]" % (num_detections))
      # Getting the class ids from the output
      detection_classes = pred["detection_classes"][0][:num_detections]
      # Mapping the class ids to class names from the coco labels
      print([classes[int(i)] for i in detection_classes])
    
      print('Running SSD Resnet on EIPredictor using default Signature Def')
      # This is the EIPredictor interface using the default Signature Def
      eia_predictor = EIPredictor(
          # Model directory where the saved model is located
          model_dir='/tmp/ssd_resnet50_v1_coco/1/',
      )
    
      # Iterating over the predictions. The first inference request can take saveral seconds to complete
      for curpred in range(NUM_PREDICTIONS):
        if(curpred == 0):
          print("The first inference request loads the model into the accelerator and can take several seconds to complete. Please standby!")
        # Start the timer
        start = time.time()
        # This is where the inference actually happens
        pred = eia_predictor(ssd_resnet_input)
        print("Inference %d took %f seconds" % (curpred, time.time()-start))
    
      # Getting the number of objects detected in the input image from the output of the predictor
      num_detections = int(pred["num_detections"])
      print("%d detection[s]" % (num_detections))
      # Getting the class ids from the output
      detection_classes = pred["detection_classes"][0][:num_detections]
      # Mapping the class ids to class names from the coco labels
      print([classes[int(i)] for i in detection_classes])
    
    
    if __name__ == "__main__":
      tf.app.run()
    
  6. Run the inference script.
    $ python ssd_resnet_predictor.py --image 3dogs.jpg

Conclusion

You now have two convenient ways, depending on your preference, to run your TensorFlow models on cost efficient accelerators. Give it a try and let us know what you think at amazon-ei-feedback@amazon.com

You can learn more about Elastic Inference here and see our documentation user guide. For instructions on using the Deep Learning AMI for EI check out the AWS Deep Learning AMI documentation.

About the Author


Dominic Divakaruni is the Product Manager for Amazon Elastic Inference. He builds services that help customers scale production machine learning applications. In this spare time he enjoys drumming with his son and working on cars.