Data Types and the Frame Objects

The input into our sensor fusion application will be a series of points, radar points, and camera images that will be rendered and labeled. Because of the size of the objects involved, we will require that the data be JSON-encoded (or protobuf-encoded) and accessible via a URL passed in through the task request. Basically, in order to annotate a point cloud frame, format the data in one of our accepted formats, upload the data as a file, and then send a request to the Scale API, similar to the way that we would process image files.

Below are our definitions for our various object types for the JSON format, and for an entire point cloud frame. The protobuf format is largely identical, and can be downloaded here; the difference is that camera intrinsic parameters are encoded as a oneof within the CameraImage message type, and thus no camera_model field is needed.

Definition: Vector3

{
  "x": 1,
  "y": 2,
  "z": 3
}

Vector3 objects are used to represent positions, and are JSON objects with 3 properties.

📘

Note

  • z is up, so the x-y plane should be flat with the ground
  • Scale processes point coordinates as 32-bit floats. If your coordinates are greater than 10^5, your point cloud may suffer from rounding effects. We recommend shifting point clouds so that all points have small values.

Property

Type

Description

x

float

x value

y

float

y value

z

float

z value

Definition: LidarPoint

{
  "x": 1,
  "y": 2,
  "z": 3,
  "i": 0.5,
  "d": 2,
  "t": 1541196976735462000,
  "is_ground": true
}

LidarPoint objects are mainly used to represent LIDAR points, and are JSON objects.

Property

Type

Description

x

float

x value

y

float

y value

z

float

z value

i

float (optional)

optional intensity value. Number between 0 and 1

d

integer (optional)

optional non-negative device id to identify points from multiple sensors (i.e. what device captured the point)

t

float (optional)

optional timestamp of the sensor's detection of the LiDAR point, in nanoseconds

is_ground

boolean (optional)

optional flag to indicate whether point is of the ground. If not specified, the flag will default to false

Definition: Quaternion

{
  "x": 1,
  "y": 1,
  "z": 1,
  "w": 1
}

Quaternion objects are used to represent rotation. We use the Hamilton quaternion convention, where i^2 = j^2 = k^2 = ijk = -1, i.e. the right-handed convention.
The quaternion represented by the tuple (x, y, z, w) is equal to w + x*i + y*j + z*k.

Property

Type

Description

x

float

x value

y

float

y value

z

float

z value

w

float

w value

Definition: GPSPose

{
  "lat": 1,
  "lon": 1,
  "bearing": 1
}

GPSPose objects are used to represent the pose (location and direction) of the robot in the world.

Property

Type

Definition

lat

float

latitude for the location of the robot. Number between -90 and 90

lon

float

longitude for the location of the robot. Number between -180 and 180

bearing

float

bearing: represents the direction the robot is facing in terms of the absolute bearing. Number between 0 and 360, interpreted as decimal degrees. 0 degrees represents facing North, 90 degrees represents facing East, 180 degrees represents facing South, and 270 degrees represents facing West.

Definition: CameraImage

CameraImage objects represent an image and the camera position/heading used to record the image.

Camera models supported:

Property

Type

Description

timestamp

float (optional)

The timestamp, in nanoseconds, at which the photo was taken

image_url

string

URL of the image file

scale_factor

float (optional)

Factor by which image has been downscaled (if the original image is 1920x1208 and image_url refers to a 960x604 image, scale_factor=2)

position

Vector3

World-normalized position of the camera

heading

Quaternion

Vector <x, y, z, w> indicating the quaternion of the camera direction; note that the z-axis of the camera frame represents the camera's optical axis. See Heading Examples for examples.

priority

integer (optional)

A higher value indicates that the camera takes precedence over other cameras when a single object appears in multiple camera views. If you are using a mix of long range and short range cameras with overlapping coverage, you should set the short range cameras to priority 1 (default priority is 0).

camera_index

integer (optional)

A number to identify which camera on the car this is (to be used in any 2D/3D linking task sent after completion of sensor fusion task). If not specified, will be inferred from the CameraImage's position in the array.

camera_model

string (optional)

Either fisheye for the OpenCV fisheye model, or brown_conrady for the pinhole model with Brown-Conrady distortion. Defaults to brown_conrady.

fx

float

focal length in x direction (in pixels)

fy

float

focal length in y direction (in pixels)

cx

float

principal point x value

cy

flaot

principal point y value

skew

float (optional)

camera skew coefficient

k1

float (optional)

1st radial distortion coefficient (Brown-Conrady, fisheye, omnidirectional)

k2

float (optional)

2nd radial distortion coefficient (Brown-Conrady, fisheye, omnidirectional)

k3

float (optional)

3rd radial distortion coefficient (Brown-Conrady, fisheye, omnidirectional)

k4

float (optional)

4th radial distortion coefficient (fisheye only)

p1

float (optional)

1st tangential distortion coefficient (Brown-Conrady, omnidirectional)

p2

float (optional)

2nd tangential distortion coefficient (Brown-Conrady, omnidirectional)

xi

float (optional)

reference frame offset for omnidirectional model

For example, to represent a camera pointing along the positive x-axis (i.e. the camera's z-axis points along the world's x-axis) and oriented normally (the camera's y-axis points along the world's negative z-axis), the corresponding heading is 0.5 - 0.5i + 0.5j - 0.5k.

Definition: RadarPoint

{
  "position": {
    "x": 100.1,
    "y": 150.12,
    "z": 200.2
  },
  "direction": {
    "x": 1,
    "y": 0,
    "z": 0
  },
  "size": 0.5
}

RadarPoint objects are used to define an individual radar point and Doppler in a particular frame.

Property

Type

Description

Position

Vector3

A vector defining the position of the RADAR point in the same frame-of-reference as the frame's points array

direction

Vector3 (optional)

A vector defining the velocity (direction and magnitude) of a potential Doppler associated with the RADAR point. This vector is relative to the individual RADAR point, and in the global reference frame. The magnitude of the vector should correspond to the speed in m/s, and the length of the Doppler showed to the labeler will vary based on the magnitude. So, if the Doppler is 1 m/s in the positive x direction, then the direction could be {"x": 1, "y": 0, "z": 0}.

size (optional, default 1)

float between 0 and 1

A float from 0 to 1 describing the strength of the radar return, where larger numbers are stronger radar returns. This value will be used to determine the brightness of the point to display to the labeler.

Definition: Frame

Frame objects represent all the point cloud, image, and other data that is sent to the annotator.

Property

Type

Description

device_position

Vector3

position of the LIDAR sensor or car with respect to a static frame of reference, i.e. a pole at (0,0,0) remains at (0,0,0) throughout all frames. This should use the same coordinate system as the points and radar_points.

device_heading

Quaternion

Heading of the car or robot that the LIDAR is on top of with respect to a static frame of reference, expressed as a Quaternion. See Heading Examples for examples.

device_gps_pose

GPSPose (optional)

GPS pose (location and bearing) of the robot in the world. The GPS pose provided should correspond to the best estimate of the pose of the same point as defined in device_position and device_heading.

points

list of LidarPoint

Series of points representing the LIDAR point cloud, normalized with respect to a static frame of reference, i.e. a pole at (0,0,0) remains at (0,0,0) throughout all frames. This should use the same coordinate system as device_position and radar_points.

radar_points

list of RadarPoint (optional)

A list of RadarPoints corresponding to the given frame, defining objects which should be labeled using a combination of radar and camera. This should use the same coordinate system as device_position and points.

images

list of CameraImage (optional)

A list of CameraImage objects that can be superimposed over the LIDAR data.

timestamp

float (optional)

The starting timestamp of the sensor rotation, in nanoseconds

Example JSON file for a Frame

https://static.scale.com/scaleapi-lidar-pointclouds/example.json

Steps to Process a series of LIDAR data

  1. Create a Frame JSON object and save to a file; alternatively, create a LidarFrame protobuf message and save to a file. Repeat for each lidar frame.
  2. Upload the files to a Scale accessible location (e.g. an S3 bucket) and record the URLs.
  3. Send a POST request to https://api.scale.com/v1/task/lidarannotation