AWS Machine Learning Blog
LiDAR 3D point cloud labeling with Velodyne LiDAR sensor in Amazon SageMaker Ground Truth
LiDAR is a key enabling technology in growing autonomous markets, such as robotics, industrial, infrastructure, and automotive. LiDAR delivers precise 3D data about its environment in real time to provide “vision” for autonomous solutions. For autonomous vehicles (AVs), nearly every carmaker uses LiDAR to augment camera and radar systems for a comprehensive perception stack capable of safely navigating complex roadway environments. Computer vision systems can use the 3D maps generated by LiDAR sensors for object detection, object classification, and scene segmentation. Like any other supervised machine learning (ML) system, the point cloud data generated by LiDAR sensors should be labeled correctly in order for the ML model to make correct inferences. This allows AVs to operate smoothly and efficiently, avoiding incidents and collisions with objects, pedestrians, vehicles, and other road users.
In this post, we demonstrate how to label 3D point cloud data generated by Velodyne LiDAR sensors using Amazon SageMaker Ground Truth. We break down the process of sending data for annotation so that you can obtain precise, high-quality results.
The code for this example is available on GitHub.
Solution overview
SageMaker Ground Truth is a data labeling service that you can use to create high-quality labeled datasets for various types of ML use cases. SageMaker Ground Truth is a capability in Amazon SageMaker, which is a comprehensive and fully managed ML service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready environment.
In addition to LiDAR data, we also include camera images, using the sensor fusion feature in SageMaker Ground Truth to deliver robust visual information about the scenes that annotators are labeling. Through sensor fusion, annotators can adjust labels in the 3D scene as well as in 2D images. It delivers the unique capability to ensure that annotations in LiDAR data are mirrored in 2D imagery, making the process more efficient.
With SageMaker Ground Truth, Velodyne LiDAR’s 3D point cloud data generated by a Velodyne LiDAR sensor mounted on a vehicle can be labeled for tracking moving objects. In this challenging use case, we can follow the trajectory of an object like a car or a pedestrian in a dynamic environment, while our point of reference is also moving. In this case, our point of reference is a car that is equipped with Velodyne LiDAR.
To perform this task, we walk through the following topics:
- Velodyne technology
- The dataset
- Creating a labeling job
- The point cloud sequence input manifest file
- Building the sequence input manifest file
- Labeling the category configuration file
- Specifying the job resources
- Completing a labeling job
Prerequisites
To implement the solution in this post, you must have the following prerequisites:
- An AWS account for running the code.
- An Amazon Simple Storage Service (Amazon S3) bucket you can write to. The bucket must be in the same Region as the SageMaker notebook instance. We can also define a valid S3 prefix. All the files related to this experiment are stored in that prefix of our bucket. We must attach the CORS policy to this bucket. For instructions, refer to Configuring cross-origin resource sharing (CORS). Enter the following policy in the CORS configuration editor:
- An AWS Identity and Access Management (IAM) role to access SageMaker.
- A SageMaker notebook instance.
- Familiarity with the Ground Truth 3D point cloud labeling job.
- Familiarity with Python and NumPy.
- Basic understanding of SageMaker.
- Basic familiarity with the AWS Command Line Interface (AWS CLI).
- The code and associated dataset.
Velodyne technology
LiDAR can be divided into different categories, including scanning LiDAR and flash LiDAR. Conventionally scanning LiDAR uses mechanical rotation to spin the sensor for 360-degree detection. Velodyne, which invented the industry’s first 3D LiDAR, continues to innovate and launch new rotational products with cutting-edge technology. Velodyne’s Ultra Puck is a scanning LiDAR sensor that uses Velodyne’s patented surround view technology. It provides a full 360-degree environmental view to deliver accurate real-time 3D data. The Ultra Puck has a compact form factor and delivers the real-time object detection needed for safe navigation and reliable operation. With a combination of optimal power and high performance, this sensor provides distance and calibrated reflectivity measurements at all rotational angles. It’s an ideal solution for robotics, mapping, security, driver assistance, and autonomous navigation. Besides the LiDAR sensor itself, Velodyne has created the Vella Development Kit (VDK), a collection of tools, hardware, and documentation that facilitate access to the Velodyne’s autonomy software stack. The VDK can be configured for different custom interfaces and environments, providing you with a broad range of applications for increased autonomy and improved safety.
Additionally, the VDK can reduce the upfront work you would have to otherwise put in to enable an end-to-end data collection and annotation pipeline by providing the following necessary capabilities:
- Clock synchronization between LiDAR, odometry, and camera frames
- Calibration for LiDAR vehicle 5-DOF extrinsic calibration (z is not observable)
- Calibration for LiDAR camera extrinsic, intrinsic, and distortion parameters
- Collect motion compensated (intra-frame or multi-frame), synchronized LiDAR point clouds and camera images
To develop vehicle-based perception capabilities, Velodyne’s software team has set up their own data collection vehicle with one of their Ultra Puck LiDAR units, a camera and GPS/IMU sensors mounted to the vehicle hood. In the subsequent steps, we refer to their internal processes that use the VDK to prepare, collect, and annotate data needed to develop their vehicle-based perception capabilities as an example to other customers trying to solve their own perception use cases.
Clock synchronization
Accurate clock synchronization of the LiDAR, odometry, and camera outputs can be crucial for any multi-sensor application that combines those data streams. For best results, you should use a PTP synchronization system with a primary clock and support by all sensors. One advantage of PTP is the ability to synchronize multiple devices to high accuracy with a single timing source. Such a system can achieve synchronization accuracy better than 1 microsecond. Other solutions include PPS distribution and per-device time sources. As an alternative option, the VDK supports software synchronization utilizing time-of-arrival timestamping, which can be a great way to get an application off the ground quickly in the absence of proper clock synchronization infrastructure. This can result in timestamping errors on the order of 1–10 milliseconds due to a combination of latency and queuing delays at various levels of the network infrastructure and host operating system, which may or may not be acceptable, depending on the application.
LiDAR vehicle calibration
The LiDAR vehicle calibration estimates the extrinsic position of the LiDAR in vehicle frame along five axes. Z value is unobservable; therefore you must measure the z value independently. Our process is a targetless calibration technique but it works well in an environment where the ground is relatively flat, and the environment has contiguous static objects features rather than dynamic (vehicles, pedestrians) or non-contiguous (shrubs and bushes) features. Think of a parking lot with few obstacles and buildings with flat facades. The presence of geometric structures is ideal for improving the calibration quality. The user is required to drive in some predefined driving patterns indicated by the VDK to expose most of the parameters. One minute of data is sufficient for this calibration. After the data is uploaded to Veldoyne’s platform service, the calibration takes place on the cloud and the result is made available within 24 hours. For the purposes of this notebook, the calibration parameters have already been processed and provided.
The LiDAR dataset
The dataset and resources used in this notebook are provided by Velodyne. This dataset contains one continuous scene from an autonomous vehicle experiment driving around on a highway in California. The entire scene contains 60 frames. The dataset contents are as follows:
- lidar_cam_calib_vlp32_06_10_2021.yaml – Camera calibration information, one camera only
- images/ – Camera footage for each frame
- poses/ – Pose JSON file containing LiDAR extrinsic matrix for each frame
- rectified_scans_local/ – .pcb files in LiDAR sensor local coordinate system
Run the following code to download the dataset locally and then upload to your S3 bucket, which we defined in the initialization section:
Create a labeling job
As the next step, we need to create a data labeling job in SageMaker Ground Truth. We select the task type as object tracking. For more information about 3D point cloud labeling task types, refer to 3D Point Cloud Task types. To create an object tracking point cloud labeling job, we need to add the following resources as the labeling job inputs:
- Point cloud sequence input manifest – A JSON file defining the point cloud frame sequence and associated sensor fusion data. For more information, see Create a Point Cloud Sequence Input Manifest.
- Input manifest file – The input file for the labeling job. Each line of the manifest file contains a link to a sequence file defined in the point cloud sequence input manifest.
- Label category configuration file – This file is used to specify your labels, label category, frame attributes, and worker instructions. For more information, see Create a Labeling Category Configuration File with Label Category and Frame Attributes.
- Predefined AWS resources – Includes the following:
- Pre-annotation Lambda ARN – Refer to PreHumanTaskLambdaArn.
- Annotation consolidation ARN – The AWS Lambda function used to consolidate labels from different workers. Refer to AnnotationConsolidationLambdaArn.
- Workforce ARN – Defines which workforce type we want to use. Refer to Create and Manage Workforces for more details.
- HumanTaskUiArn – Defines the worker UI template to do the labeling job. This should have a format similar to
arn:aws:sagemaker:<region>:123456789012:human-task-ui/PointCloudObjectTracking
.
Keep in mind the following:
- There should not be an entry for the
UiTemplateS3Uri
parameter. - Your
LabelAttributeName
must end in-ref
. For example,ot-labels-ref
. - The number of workers specified in
NumberOfHumanWorkersPerDataObject
should be 1. - 3D point cloud labeling doesn’t support active learning, so we shouldn’t specify values for parameters in
LabelingJobAlgorithmsConfig
. - 3D point cloud object tracking labeling jobs can take multiple hours to complete. You should specify a longer time limit for these labeling jobs in
TaskTimeLimitInSeconds
(up to 7 days, or 604,800 seconds).
Point cloud sequence input manifest file
The following of the most important steps to generating a sequence input manifest file:
- Convert the 3D points to a world coordinate system.
- Generate the sensor extrinsic matrix to enable the sensor fusion feature in SageMaker Ground Truth.
The LiDAR sensor is mounted on a moving vehicle (ego vehicle), which captures the data in its own frame of reference. To perform object tracking, we need to convert this data to a global frame of reference to account for the moving ego vehicle itself. This is the world coordinate system.
Sensor fusion is a feature in SageMaker Ground Truth that synchronizes the 3D point cloud frame side by side with the camera frame. This provides visual context for human labelers and allows labelers to adjust annotation in 3D and 2D images synchronously. For instructions on matrix transformation, refer to Labeling data for 3D object tracking and sensor fusion in Amazon SageMaker Ground Truth.
The generate_transformed_pcd_from_point_cloud
function performs the coordinate translation and then generates the 3D point data file, which SageMaker Ground Truth can consume.
To translate the data from local/sensor global coordinate system, multiply each point in a 3D frame with the extrinsic matrix for the LiDAR sensor.
SageMaker Ground Truth renders the 3D point cloud data in either Compact Binary Pack (.bin) or ASCII (.txt) format. Files in these formats need to contain information about the location (x, y, and z coordinates) of all points that make up that frame, and, optionally, information about the pixel color of each point for colored point clouds (i, r, g, b).
To read more about SageMaker Ground Truth accepted raw 3D data formats, see Accepted Raw 3D Data Formats.
Build the sequence input manifest file
The next step is to build the point cloud sequence input manifest file. The steps listed in this section are also available in the notebook.
- Point the cloud data from the
.pcd
file, the LiDAR extrinsic matrix from the pose file, and the camera extrinsic, intrinsic, and distortion data from the camera calibration.yaml
file. - Perform a per-frame transform of the raw point cloud to the global frame of reference. Generate and store ASCII (.txt) for each frame to Amazon S3.
- Extract the ego vehicle pose from the LiDAR extrinsic matrix.
- Build a sensor position in the global coordinate system by extracting the camera pose from the camera inverse extrinsic matrix.
- Provide camera calibration parameters (such as distortion and skew).
- Build the array of data frames. Reference the ASCII file location, define the vehicle position in world coordinate system, and so on.
- Create the sequence manifest file
sequence.json
. - Create our input manifest file. Each line identifies a single sequence file we just uploaded.
Label the category configuration file
Our label category configuration file is used to specify labels, or classes, for our labeling job. When we use the object detection or object tracking task types; we can also include label attributes in our label category configuration file. Workers can assign one or more attributes we provide to annotations to give more information about that object. For example, we may want to use the attribute occluded
to have workers identify when an object is partially obstructed. Let’s look at an example of the label category configuration file for an object detection or object tracking labeling job:
Specify the job resources
As the next step, we specify various labeling job resources:
- Human task UI ARN – HumanTaskUiArn is a resource that defines the worker task template used to render the worker UI and tools for the labeling job. This attribute is defined under
UiConfig
and the resource name is configured by Region and task type: - Work resource – In this example, we use private team resources. For instructions, refer to Create a Private Workforce (Amazon Cognito Console). When we’re done, we should put our resource ARN in the following parameter:
- Pre-annotation Lambda ARN and post-annotation Lambda ARN – See the following code:
- HumanTaskConfig – We use this to specify our work team and configure our labeling job task. Feel free to update the task description in the following code:
Create the labeling job
Next, we create the labeling request, as shown in the following code:
Finally, we create the labeling job:
Complete a labeling job
When our labeling job is ready, we can add ourselves to our private work team and experiment with the worker’s portal. We should receive an email with the portal link, our user name, and a temporary password. When we log in, we choose the labeling job from the list, and then we should see the worker’s portal like the following screenshot. (It may take a few minutes for a new labeling job to show up in the portal). More information on how to set up workers and instructions can be found here and here respectively.
When we’re are done with the labeling job, we can choose Submit, and then view the output data in the S3 output location we specified earlier.
Conclusion
In this post, we showed how we can create a 3D point cloud labeling job for object tracking for data captured using Velodyne’s LiDAR sensor. We followed the step-by-step instructions in this post and ran the provided code to create a SageMaker Ground Truth labeling job to label the 3D point cloud data. ML models can use the labels created with this job to train object detection, object recognition, and object tracking models commonly used in autonomous vehicle scenarios.
If you are interested in labeling 3D point cloud data captured via Velodyne’s LiDAR sensor, follow the steps in this article to label the data using Amazon SageMaker Ground Truth.
About the Authors
Sharath Nair leads the Computer Vision team that focusses on building perception algorithms for some of Velodyne’s software products like Object Detection & Tracking, Semantic Segmentation, SLAM, etc. Prior to Velodyne, Sharath worked on Autonomous Vehicles and Robotics and has been involved in this space for the past 6 years.
Oliver Monson is a Senior Data Operations Manager at Velodyne Lidar, responsible for the data pipelines and acquisition strategies that support the development of perception software. Prior to Velodyne, Oliver has managed operational teams executing on HD mapping, geospatial, and archaeological applications.
John Kua is Director of Software Engineering at Velodyne, overseeing the System Integration and Robotics, Vella Go, and Software Production teams. Prior to joining Velodyne, John spent over a decade building multimodal sensor platforms for a wide range of 3D localization and mapping applications in commercial and government applications. These platforms included a wide array of sensors including visible light, thermal, and hyperspectral cameras, lidar, GPS, IMUs, and even gamma-ray spectrometers and imagers.
Sally Frykman, Chief Marketing Officer at Velodyne, oversees the strategic development and execution of global marketing and communications programs that advance the company’s innovative vision and goals. Her multifaceted role encompasses a wide array of responsibilities, including promotion of the Velodyne brand, thought leadership development, and robust sales lead generation fueled by highly engaging digital marketing. Previously, Sally worked in public education and social work.
Nitin Wagh is Sr. Business Development Manager for Amazon AI. He likes the opportunity to help customers understand Machine Learning and power of Augmented AI in AWS cloud. In his spare time, he loves spending time with family in outdoors activities.
James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.
Farooq Sabir is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He holds PhD and MS degrees in Electrical Engineering from The University of Texas at Austin and a MS in Computer Science from Georgia Institute of Technology. He has over 15 years of work experience and also likes to teach and mentor college students. At AWS, he helps customers formulate and solve their business problems in data science, machine learning, computer vision, artificial intelligence, numerical optimization and related domains. Based in Dallas, Texas, he and his family love to travel and make long road trips.