AWS for M&E Blog
Source monitoring for AWS Elemental MediaLive via Amazon Rekognition
Introduction
AWS Elemental MediaLive is a broadcast-grade video encoder with a comprehensive suite of metrics that can be monitored for stream quality and health, along with a full set of alerts that can be acted upon to ensure the best viewer experience. There are scenarios where a source is present on a MediaLive input but the video is impaired in such a way that it would not trigger an alert. For example, this could happen if an upstream source is routed incorrectly to an unused input on a router or a playout server freezes, thus producing black video, color bars, or frozen frames. These incidents can go undetected unless an operator is watching the channel. The following solution below can help.
The solution
MediaLive by default, generates input source thumbnails for all encode pipelines and it is these thumbnails that are requested and passed to Amazon Rekognition for analysis. No inference training is required in order to use the image properties analysis feature. The following is an example of the first few lines of a typical response received to an “IMAGE_PROPERTIES” request:
A vast amount of information can be returned regarding the color make-up of the supplied image. We are only interested in the quality section at the top, as these constantly changing values provide enough information to target detection. These values can be extracted, summed and pushed to Amazon CloudWatch to produce a graph of values over time. If a black frame is received, these values will all fall to near-zero and will remain at this level while the input remains black. A similar principle is employed to catch an input that has frozen, as the values will remain at a constant value, above zero.
With the addition of some CloudWatch math, we can generate a difference graph of the values that can be used to trigger an alarm when the calculated metrics fail to change for a number of consecutive data points. This can be used to alert operators of a problem or take automatic action, such as switching to another input source. Two consecutive data points have been chosen in this example rather than three, as the first of a group of three will generally be the spike as the image changes to the black or frozen frame.
Following is an example of a simple version of the solution, using CloudWatch Scheduler to trigger an AWS Lambda at regular intervals:
The CloudWatch Scheduler can be configured to poll a maximum of once every minute, meaning it will take over 3 minutes to obtain the datapoints for a confident alarm trigger. This is useful for a proof-of-concept or for low-tier channels, while a more advanced solution is detailed later on for greatly increased detection speeds.
In the previous CloudWatch example, the ImageProperties metric is Brightness, Contrast, and Sharpness as combined in the Lambda script. The Difference metric is generated using the following CloudWatch math expression: ABS(DIFF(m1)), where m1 is ImageProperties. This equation calculates the absolute difference from the previous sample.
The same expression is used when configuring a CloudWatch Alarm. Setting the alarm to trigger when the difference metric is approximately 10 to 15 is a good starting point.
The basic Python Lambda code to demonstrate this is as follows:
Note that you will need to configure the environment variables for Medialive channel ID, pipeline number, and Cloudwatch namespace before using the Lambda function.
Advanced solution
For lower latency detection on high-tier content, for example and to more easily handle a higher number of channels in a more effective manner, we can build on the previous solution, with AWS Fargate.
The updated solution can take more frequent samples, for example, every 10 seconds. This brings the detection time down to just 30 seconds to trigger an alert. A separate Python thread for each channel is used in place of the CloudWatch Scheduler / Lambda pair. As MediaLive channels are started, the events are passed via Amazon EventBridge to an SQS queue to automatically create or destroy the threads. This ensures channels are only monitored when they are running. High-resolution metrics are employed, over the standard one minute metrics.
The remainder of the solution remains as before, with alarms to be configured on the CloudWatch math metric, but there is no reason why the difference detection could not be done in the Python code itself and a simple zero or one emitted to trigger a CloudWatch alarm. A similar principle could be employed to monitor the audio levels emitted by Medialive to CloudWatch for silence detection.
The advanced solution is available on GitHub.
Summary
The two solutions described in this blog add additional monitoring confidence on MediaLive channels, above and beyond that already provided by the service. These solutions can also serve as building blocks for additional monitoring, tailoring to specific customer requirements where appropriate.