Data Types and the Frame Objects

The input into our sensor fusion application will be a series of points, radar points, and camera images that will be rendered and labeled. Because of the size of the objects involved, we will require that the data be JSON-encoded (or protobuf-encoded) and accessible via a URL passed in through the task request. Basically, in order to annotate a point cloud frame, format the data in one of our accepted formats, upload the data as a file, and then send a request to the Scale API, similar to the way that we would process image files.

Below are our definitions for our various object types for the JSON format, and for an entire point cloud frame. The protobuf format is largely identical, and can be downloaded here; the difference is that camera intrinsic parameters are encoded as a oneof within the CameraImage message type, and thus no camera_model field is needed.

Definition: Vector3

  "x": 1,
  "y": 2,
  "z": 3

Vector3 objects are used to represent positions, and are JSON objects with 3 properties.

xfloatx value
yfloaty value
zfloatz value

Definition: LidarPoint

  "x": 1,
  "y": 2,
  "z": 3,
  "i": 0.5,
  "d": 2,
  "t": 1541196976735462000,
  "is_ground": true

LidarPoint objects are mainly used to represent LIDAR points, and are JSON objects.

xfloatx value
yfloaty value
zfloatz value
ifloat (optional)optional intensity value. Number between 0 and 1
dinteger (optional)optional non-negative device id to identify points from multiple sensors (i.e. what device captured the point)
tfloat (optional)optional timestamp of the sensor's detection of the LiDAR point, in nanoseconds
is_groundboolean (optional)optional flag to indicate whether point is of the ground. If not specified, the flag will default to false


Good things to note:

z is up, so the x-y plane should be flat with the ground

Scale processes point coordinates as 32-bit floats. If your coordinates are greater than 10^5, your point cloud may suffer from rounding effects.

What we recommend as a best practice is applying the negative position of the first frame as an offset to all points/calibrations to avoid this, such that the first frame is at position 0,0,0 and all subsequent frames are just the offsets.

Why 10^5 as a suggested limit?
Given that 32 bit floats support 23 bits of precision, when using 10e5 order coordinates, that only leaves you about 2 decimal places of precision. (2e23 ~= 10e7)

Assuming a meter-based unit of measure, this means the point precision would be to the nearest centimeter. Going above 10e5 reduces the precision to an often unacceptable degree, hence the warning here.

Definition: Quaternion

  "x": 1,
  "y": 1,
  "z": 1,
  "w": 1

Quaternion objects are used to represent rotation. We use the Hamilton quaternion convention, where i^2 = j^2 = k^2 = ijk = -1, i.e. the right-handed convention.
The quaternion represented by the tuple (x, y, z, w) is equal to w + x*i + y*j + z*k.

xfloatx value
yfloaty value
zfloatz value
wfloatw value

Definition: GPSPose

  "lat": 1,
  "lon": 1,
  "bearing": 1

GPSPose objects are used to represent the pose (location and direction) of the robot in the world.

latfloatlatitude for the location of the robot. Number between -90 and 90
lonfloatlongitude for the location of the robot. Number between -180 and 180
bearingfloatbearing: represents the direction the robot is facing in terms of the absolute bearing. Number between 0 and 360, interpreted as decimal degrees. 0 degrees represents facing North, 90 degrees represents facing East, 180 degrees represents facing South, and 270 degrees represents facing West.

Definition: CameraImage

CameraImage objects represent an image and the camera position/heading used to record the image.

Camera models supported:

timestampfloat (optional)The timestamp, in nanoseconds, at which the photo was taken
image_urlstringURL of the image file
scale_factorfloat (optional)Factor by which image has been downscaled (if the original image is 1920x1208 and image_url refers to a 960x604 image, scale_factor=2)
positionVector3World-normalized position of the camera
headingQuaternionVector <x, y, z, w> indicating the quaternion of the camera direction; note that the z-axis of the camera frame represents the camera's optical axis. See Heading Examples for examples.
priorityinteger (optional)A higher value indicates that the camera takes precedence over other cameras when a single object appears in multiple camera views. If you are using a mix of long range and short range cameras with overlapping coverage, you should set the short range cameras to priority 1 (default priority is 0).
camera_indexinteger (optional)A number to identify which camera on the car this is (to be used in any 2D/3D linking task sent after completion of sensor fusion task). If not specified, will be inferred from the CameraImage's position in the array.
camera_modelstring (optional)Either fisheye for the OpenCV fisheye model, or brown_conrady for the pinhole model with Brown-Conrady distortion. Defaults to brown_conrady.
fxfloatfocal length in x direction (in pixels)
fyfloatfocal length in y direction (in pixels)
cxfloatprincipal point x value
cyflaotprincipal point y value
skewfloat (optional)camera skew coefficient
k1float (optional)1st radial distortion coefficient (Brown-Conrady, fisheye, omnidirectional)
k2float (optional)2nd radial distortion coefficient (Brown-Conrady, fisheye, omnidirectional)
k3float (optional)3rd radial distortion coefficient (Brown-Conrady, fisheye, omnidirectional)
k4float (optional)4th radial distortion coefficient (fisheye only)
p1float (optional)1st tangential distortion coefficient (Brown-Conrady, omnidirectional)
p2float (optional)2nd tangential distortion coefficient (Brown-Conrady, omnidirectional)
xifloat (optional)reference frame offset for omnidirectional model

For example, to represent a camera pointing along the positive x-axis (i.e. the camera's z-axis points along the world's x-axis) and oriented normally (the camera's y-axis points along the world's negative z-axis), the corresponding heading is 0.5 - 0.5i + 0.5j - 0.5k.

Definition: RadarPoint

  "position": {
    "x": 100.1,
    "y": 150.12,
    "z": 200.2
  "direction": {
    "x": 1,
    "y": 0,
    "z": 0
  "size": 0.5

RadarPoint objects are used to define an individual radar point and Doppler in a particular frame.

PositionVector3A vector defining the position of the RADAR point in the same frame-of-reference as the frame's points array
directionVector3 (optional)A vector defining the velocity (direction and magnitude) of a potential Doppler associated with the RADAR point. This vector is relative to the individual RADAR point, and in the global reference frame. The magnitude of the vector should correspond to the speed in m/s, and the length of the Doppler showed to the labeler will vary based on the magnitude. So, if the Doppler is 1 m/s in the positive x direction, then the direction could be {"x": 1, "y": 0, "z": 0}.
size (optional, default 1)float between 0 and 1A float from 0 to 1 describing the strength of the radar return, where larger numbers are stronger radar returns. This value will be used to determine the brightness of the point to display to the labeler.

Definition: Frame

Frame objects represent all the point cloud, image, and other data that is sent to the annotator.

device_positionVector3position of the LIDAR sensor or car with respect to a static frame of reference, i.e. a pole at (0,0,0) remains at (0,0,0) throughout all frames. This should use the same coordinate system as the points and radar_points.
device_headingQuaternionHeading of the car or robot that the LIDAR is on top of with respect to a static frame of reference, expressed as a Quaternion. See Heading Examples for examples.
device_gps_poseGPSPose (optional)GPS pose (location and bearing) of the robot in the world. The GPS pose provided should correspond to the best estimate of the pose of the same point as defined in device_position and device_heading.
pointslist of LidarPointSeries of points representing the LIDAR point cloud, normalized with respect to a static frame of reference, i.e. a pole at (0,0,0) remains at (0,0,0) throughout all frames. This should use the same coordinate system as device_position and radar_points.
radar_pointslist of RadarPoint (optional)A list of RadarPoints corresponding to the given frame, defining objects which should be labeled using a combination of radar and camera. This should use the same coordinate system as device_position and points.
imageslist of CameraImage (optional)A list of CameraImage objects that can be superimposed over the LIDAR data.
timestampfloat (optional)The starting timestamp of the sensor rotation, in nanoseconds

Example JSON file for a Frame

Steps to Process a series of LIDAR data

  1. Create a Frame JSON object and save to a file; alternatively, create a LidarFrame protobuf message and save to a file. Repeat for each lidar frame.
  2. Upload the files to a Scale accessible location (e.g. an S3 bucket) and record the URLs.
  3. Send a POST request to