AWS for Industries

Running Cadence JasperGold formal verification on AWS at scale

Introduction

As the size and complexity of modern integrated circuits grow, semiconductor development teams are challenged to provide verification coverage for these larger, more complex chips with the same tight schedules.

Formal Verification is an Electronic Design Automation (EDA) application commonly used by development teams to achieve this end. However, verification of modern Systems-on-Chip (SoC) designs increases in complexity exponentially with the complexity of the design-under-test (DUT) itself.

In order to explore how semiconductor customers can utilize the scale and flexibility of AWS Cloud for design verification, Cadence and AWS launched this joint investigation to examine the benefits of scaling-up the JasperGold® Formal Verification Platform on AWS to a scale that customers can achieve on their own. We used the Scale-Out Computing on AWS solution (SOCA) to scale-up the JasperGold workloads to 960 cores.

This blog post describes the runtime improvement benefits that customers can gain by scaling the JasperGold workload on AWS. We describe what we learned along the way and provide guidance on how customers can leverage the combination of JasperGold and AWS to augment on-premises compute farms. This would enable customers to complete the verification of SOC designs and provide adequate assurance of design correctness within reasonable limits of project schedules.

Verification engineers, CAD engineers involved in verification flows, and IT decision makers at semiconductor companies and semiconductor IP providers will see the benefit of combining these two technologies to enable customers of all sizes to fully realize the benefits of formal verification.

Scaling Motivation

Depending on the nature of the verification task and method, the time to complete verification can be reduced by splitting the verification workload into multiple tasks, which can be run in parallel. Any verification tests which are in themselves independent of other tests can be run in parallel, reducing overall time to complete with no reduction in verification quality.

Formal verification can benefit particularly well from such parallelism since a formal test bench for these complex designs typically consists of thousands of properties, or assertions, which can easily be analyzed in parallel by today’s leading tools such as JasperGold.

Given enough licenses for the formal tool, the limiting factor to take advantage of verification schedule reduction through parallelism becomes availability of compute resources.

In addition, the verification load may vary considerably from week to week depending on the size of organization and the number of concurrent projects.

The result is that many organizations are seeing economic value to augment their own compute farms with cloud-based resources.

JasperGold on AWS Project

At the Jasper User Group in October 2020, we presented a joint Cadence and AWS proof-of-concept for utilizing various degrees of parallelism using JasperGold on AWS (JAWS) to verify a design based on a Cadence Tensilica processor. The results can be seen in figure 1.

Figure 1: Results of JAWS Proof-of-Concept

We chose 3 configurations to run on: First, the Cadence network using 12 cores, representative of the resources available to formal verification engineers in many mid-size JasperGold customers; second, the Cadence network extended with AWS using Amazon EC2 instances to provide 96 cores; and the third configuration with 960 cores.

We ran JasperGold with the pre-existing formal environment consisting of 343 properties, on the Tensilica design, with each configuration.

The table on the left of figure 1 shows the results for each iteration step, and the graph on the right sums the steps to show the time to complete the iterations. For example, the grey bars on the right show the aggregate time to complete the first three iterations.

We ran three iterations on all three compute resource configurations with each iteration increasing the time allowed per property.

  • The 96-core configuration reduced the time for the same results by 5.5X.
  • The 960-core configuration reduced the time by 20X.
  • We ran the 4th iteration on the 960-core configuration only, due to time constraints. This iteration found new useful proofs, including 23 with cycle bounds large enough to be valuable, and importantly, found 4 deep security bugs.
  • The run with 960-cores completed in an additional 4 hours and 22 minutes.

If we ran this fourth configuration on 12-cores configuration, we estimate it would have taken 5 days to complete. At 96 cores, it would have taken around 22 hours.

These results clearly demonstrate not just substantial time savings to achieve the same results using higher degrees of parallelism, but also the increased verification and bug discovery that is possible with large-scale parallelism.

Setting up your SOCA Environment

Step one is for you or your IT admin to deploy AWS Scale-Out Computing on AWS Solution. For guidance on how to setup SOCA for EDA workloads, check out the blog Scaling EDA Workloads using Scale-Out Computing on AWS.

Once the solution is deployed and you have a user account for the SOCA portal, use your web browser to access the SOCA website. You should be working with a window similar to figure 2.

Figure 2: SOCA Browser Window

Create a desktop session to run your Jasper sessions. We found that selecting an EC2 instance with 8 vCPUs and 32 GB memory (for example: m5.2xlarge) worked well for JasperGold proof jobs. Select duration (1 day or longer).

The desktop session uses Amazon Nice DCV and looks like a typical VNC session but natively supports compression and pixel encryption. This EC2 instance would be your own private virtual machine for running your JasperGold ProofGrid manager and submitting additional proof jobs.

SOCA provides two different file systems mounted at /data and /fsx with different levels of performance, availability and size. /data is a Linux mount of Amazon Elastic File System (Amazon EFS) which is recommended for user home directories while /fsx is a Linux mount of Amazon FSx for Lustre which is a high-performance file system recommended for running EDA workloads.

For this project we created the following directories:

Home directory                      /data/home/<username>
Scratch area                            /fsx/<username>
Tool installation directory /fsx/apps/jasper_2020.09

We recommend setting up a license server within the same network deployed by the SOCA solution. If your current JasperGold licenses are not running on such a server, please contact your Cadence account manager and ask about the Cloud Passport solution. Your IT admin would provide the name of the license server such as: ip-xxx-xxx-xxx-xxx. To do this set the following environmental variable to find the license server:

setenv CDS_LIC_FILE 5280@ip-xxx-xxx-xxx-xxx

We used the following procedure to specify worker instances for JasperGold engine jobs. Engine jobs can be submitted via /opt/pbs/bin/qsub. Two types of job queues are available in the SOCA solution:

  • Alwayson queue: Enables user to create EC2 instances upfront, the instances will join the alwayson queue, then jobs submitted to the alwayson queue would run based on cpu/memory requirements. Then these instances would be terminated when the jobs complete execution and the instance becomes idle for a configurable idle time.
  • Automatic Provisioning queue: Enables SOCA solution to dynamically provision EC2 instances based on job requirements and the instances will join the normal queue. These are useful for submitting prove/hunt jobs.

EC2 instance creation takes approximately 10 mins for the EC2 instance to be allocated, the operating system to boot, and for the SOCA cluster automation scripts to complete enabling the instance to join the compute cluster.

For the JAWS exercise we created EC2 instances in the alwayson queue. Some experiments were run in normal queue as well. The command to launch instances on alwayson queue needs to run on the scheduler host.

We typically created multiple machines with 48-cores and 384GB memory (for example: m5.24xlarge). The machines would shut down after being idle for 10 minutes.

For the bigger configuration, we used the following launch command for 20 machines each with 48 cores (for a total of 960 cores with 8GB memory per core):

/apps/soca/$SOCA_CONFIGURATION/python/latest/bin/python3 \
/apps/soca/$SOCA_CONFIGURATION/cluster_manager/add_nodes.py \
   --job_owner $USER \
   --job_name $USER \
   --ht_support false \
   --instance_ami ami-0ad1f2458d3807474 \
   --base_os "centos7" \
   --root_size 4 \
   --instance_type m5.24xlarge \
   --desired_capacity 20 \
   --queue alwayson \
   --keep_forever false \ 
   --terminate_when_idle 10 

AWS provides a range of EC2 instance types with varying number of cores and memory. Details can be found here: https://thinkwithwp.com/ec2/instance-types.

Running JasperGold Jobs in the SOCA Environment

Figures 3 and 4 show the flowcharts for running JasperGold in the alwayson and normal queues.

Figure 3: Running JasperGold in alwayson Queue

Figure 4: Running JasperGold in normal Queue

The JasperGold ProofGrid manager setup varies depending on the queueing in use. Set up ProofGrid jobs are submitted through open-source OpenPBS qsub.

Using already existing machines in the alwayson queue:

set_proofgrid_socket_communication off
set_proofgrid_shell {/opt/pbs/bin/qsub –z –r n –q alwayson -W block=true –l select=1:mem=8gb:npus=1 -- }

Create EC2 instances dynamically in the normal queue:

set_proofgrid_socket_communication off
set_proofgrid_shell {/opt/pbs/bin/qsub -z -r n –q normal-W block=true –l instance_type=m5.large -- }

To check status of jobs:

/opt/pbs/bin/qstat			(status of engine jobs in queue)
/opt/pbs/bin/pbsnodes -aSjL		(status of instances created)

For more information, contact your local Cadence or AWS account teams.

For more information on Cadence JasperGold, go to https://www.cadence.com/en_US/home/tools/system-design-and-verification/formal-and-static-verification/jasper-gold-verification-platform.html

For more information on EDA workloads on AWS, go to https://thinkwithwp.com/semiconductor

Ahmed Elzeftawi

Ahmed Elzeftawi

Ahmed Elzeftawi is a Semiconductor and EDA Partner Solutions Architect at Amazon Web Services. Prior to joining AWS, he was a Cloud Product Management Director at Cadence Design Systems where he architected and deployed multiple compute clusters for EDA workloads on AWS. He has 20 years of experience in Electronic Design Automation, chip design, and High Performance Compute clusters. Ahmed holds a Bachelor of Science in electrical and communications engineering from Cairo University and an MBA from Santa Clara University.

Mike Pedneau

Mike Pedneau

Mike Pedneau is an Application Engineer Architect at Cadence Design Systems. He is responsible for driving complex formal solutions as part of the JasperGold product portfolio. He previously worked at Texas Instruments, Verisity Design and AMD architecting and deploying complex verification methodologies for the past 30+ years. Mike holds a BS in Electrical Engineering from Texas A&M and MS in Compute Engineering from The University of Texas at Austin.