The input into our sensor fusion application will be a series of points, radar points, and camera images that will be rendered and labeled. Because of the size of the objects involved, we will require that the data be JSON-encoded (or protobuf-encoded) and accessible via a URL passed in through the task request. Basically, in order to annotate a point cloud frame, format the data in one of our accepted formats, upload the data as a file, and then send a request to the Scale API, similar to the way that we would process image files.
Below are our definitions for our various object types for the JSON format, and for an entire point cloud frame. The protobuf format is largely identical, and can be downloaded here; the difference is that camera intrinsic parameters are encoded as a oneof
within the CameraImage
message type, and thus no camera_model
field is needed.
Definition: Vector2
Vector2
{
"x": 1,
"y": 2
}
Vector2
objects are used to represent positions, and are JSON objects with 2 properties.
Property | Type | Description |
---|---|---|
x | float | x value |
y | float | y value |
Definition: Vector3
Vector3
{
"x": 1,
"y": 2,
"z": 3
}
Vector3
objects are used to represent positions, and are JSON objects with 3 properties.
Property | Type | Description |
---|---|---|
x | float | x value |
y | float | y value |
z | float | z value |
Definition: LidarPoint
LidarPoint
{
"x": 1,
"y": 2,
"z": 3,
"i": 0.5,
"d": 2,
"t": 1541196976735462000,
"is_ground": true
}
LidarPoint
objects are mainly used to represent LIDAR points, and are JSON objects.
Property | Type | Description |
---|---|---|
x | float | x value |
y | float | y value |
z | float | z value |
i | float (optional) | optional intensity value. Number between 0 and 1 |
d | integer (optional) | optional non-negative device id to identify points from multiple sensors (i.e. what device captured the point) |
t | float (optional) | optional timestamp of the sensor's detection of the LiDAR point, in nanoseconds |
is_ground | boolean (optional) | optional flag to indicate whether point is of the ground. If not specified, the flag will default to false |
Good things to note:
z
is up, so the x-y plane should be flat with the groundScale processes point coordinates as 32-bit floats. If your coordinates are greater than 10^5, your point cloud may suffer from rounding effects.
What we recommend as a best practice is applying the negative position of the first frame as an offset to all points/calibrations to avoid this, such that the first frame is at position 0,0,0 and all subsequent frames are just the offsets.
Why 10^5 as a suggested limit?
Given that 32 bit floats support 23 bits of precision, when using 10e5 order coordinates, that only leaves you about 2 decimal places of precision. (2e23 ~= 10e7)Assuming a meter-based unit of measure, this means the point precision would be to the nearest centimeter. Going above 10e5 reduces the precision to an often unacceptable degree, hence the warning here.
Definition: Quaternion
Quaternion
{
"x": 1,
"y": 1,
"z": 1,
"w": 1
}
Quaternion
objects are used to represent rotation. We use the Hamilton quaternion convention, where i^2 = j^2 = k^2 = ijk = -1
, i.e. the right-handed convention.
The quaternion represented by the tuple (x, y, z, w)
is equal to w + x*i + y*j + z*k
.
Property | Type | Description |
---|---|---|
x | float | x value |
y | float | y value |
z | float | z value |
w | float | w value |
Definition: BoundingBox
BoundingBox
{
"top": -85,
"left": -170,
"bottom": 85,
"right": 170
}
Specifies the lat/lon (in degrees) of a BoundingBox
. Used for transforming from world coordinates to pixel coordinates
Property | Type | Description |
---|---|---|
top | float | Top boundary of box. Value must be between -90 and 90 |
left | float | Left boundary of box. Value must be between -180 and 180 |
bottom | float | Bottom boundary of box. Value must be between -90 and 90 |
right | float | Right boundary of box. Value must be between -180 and 180 |
Definition: GPSPose
GPSPose
{
"lat": 1,
"lon": 1,
"bearing": 1
}
GPSPose
objects are used to represent the pose (location and direction) of the robot in the world.
Property | Type | Definition |
---|---|---|
lat | float | latitude for the location of the robot. Number between -90 and 90 |
lon | float | longitude for the location of the robot. Number between -180 and 180 |
bearing | float | bearing : represents the direction the robot is facing in terms of the absolute bearing. Number between 0 and 360 , interpreted as decimal degrees. 0 degrees represents facing North, 90 degrees represents facing East, 180 degrees represents facing South, and 270 degrees represents facing West. |
Definition: CameraImage
CameraImage
CameraImage
objects represent an image and the camera position/heading used to record the image.
Camera models supported:
- Pinhole model with Brown-Conrady distortion (supports
k1
,k2
,k3
,p1
,p2
distortion parameters) - OpenCV fisheye model with radial distortion (supports
k1
,k2
,k3
,k4
distortion parameters) - OpenCV omnidirectional model (supports
k1
,k2
,k3
,p1
,p2
distortion parameters)
Property | Type | Description |
---|---|---|
timestamp | float (optional) | The timestamp, in nanoseconds, at which the photo was taken |
image_url | string | URL of the image file |
scale_factor | float (optional) | Factor by which image has been downscaled (if the original image is 1920x1208 and image_url refers to a 960x604 image, scale_factor=2 ) |
position | Vector3 | World-normalized position of the camera |
heading | Quaternion | Vector <x, y, z, w> indicating the quaternion of the camera direction; note that the z-axis of the camera frame represents the camera's optical axis. See Heading Examples for examples. |
priority | integer (optional) | A higher value indicates that the camera takes precedence over other cameras when a single object appears in multiple camera views. If you are using a mix of long range and short range cameras with overlapping coverage, you should set the short range cameras to priority 1 (default priority is 0). |
camera_index | integer (optional) | This is required if you are using Scale Mapping. A number to identify which camera on the car this is (to be used in any 2D/3D linking task sent after completion of sensor fusion task). If not specified, will be inferred from the CameraImage's position in the array. |
camera_model | string (optional) | Either fisheye for the OpenCV fisheye model, or brown_conrady for the pinhole model with Brown-Conrady distortion. Defaults to brown_conrady . |
fx | float | focal length in x direction (in pixels) |
fy | float | focal length in y direction (in pixels) |
cx | float | principal point x value |
cy | flaot | principal point y value |
skew | float (optional) | camera skew coefficient |
k1 | float (optional) | 1st radial distortion coefficient (Brown-Conrady, fisheye, omnidirectional) |
k2 | float (optional) | 2nd radial distortion coefficient (Brown-Conrady, fisheye, omnidirectional) |
k3 | float (optional) | 3rd radial distortion coefficient (Brown-Conrady, fisheye, omnidirectional) |
k4 | float (optional) | 4th radial distortion coefficient (fisheye only) |
p1 | float (optional) | 1st tangential distortion coefficient (Brown-Conrady, omnidirectional) |
p2 | float (optional) | 2nd tangential distortion coefficient (Brown-Conrady, omnidirectional) |
xi | float (optional) | reference frame offset for omnidirectional model |
For example, to represent a camera pointing along the positive x-axis (i.e. the camera's z-axis points along the world's x-axis) and oriented normally (the camera's y-axis points along the world's negative z-axis), the corresponding heading is 0.5 - 0.5i + 0.5j - 0.5k
.
Definition: RadarPoint
RadarPoint
{
"position": {
"x": 100.1,
"y": 150.12,
"z": 200.2
},
"direction": {
"x": 1,
"y": 0,
"z": 0
},
"size": 0.5
}
RadarPoint
objects are used to define an individual radar point and Doppler in a particular frame.
Property | Type | Description |
---|---|---|
Position | Vector3 | A vector defining the position of the RADAR point in the same frame-of-reference as the frame's points array |
direction | Vector3 (optional) | A vector defining the velocity (direction and magnitude) of a potential Doppler associated with the RADAR point. This vector is relative to the individual RADAR point, and in the global reference frame. The magnitude of the vector should correspond to the speed in m/s , and the length of the Doppler showed to the labeler will vary based on the magnitude. So, if the Doppler is 1 m/s in the positive x direction, then the direction could be {"x": 1, "y": 0, "z": 0} . |
size (optional, default 1 ) | float between 0 and 1 | A float from 0 to 1 describing the strength of the radar return, where larger numbers are stronger radar returns. This value will be used to determine the brightness of the point to display to the labeler. |
Definition: AnnotationRule
AnnotationRule
This object enforces certain annotation relationships.
Property | Type | Description |
---|---|---|
must_derive_from | Array<DeriveFrom > | List of DeriveFrom objects that define the relationships between annotations |
Definition: DeriveFrom
DeriveFrom
This object enforces that if line annotations are used to form, or in other words, "derive" a polygon annotation, then the labels of the involved annotations must be of a certain set.
Property | Type | Description |
---|---|---|
from | Array<string > | A list of line labels or group names |
to | Array<string > | A list of polygon labels or group names whose edges must be from lines |
Definition: RegionOfInterest2d
RegionOfInterest2d
RegionOfInterest2d
This object allows Scale to perform the correct transformation from lon/lat world coordinates to pixels. It allows Scale to identify the pixel coordinates of the camera location on the provided aerial imagery task. If this is not provided, the camera context images will not render on the task.
Property | Type | Description |
---|---|---|
bounding_box | BoundingBox | The bounds of the area the image covers in Lat Long coordinates. Validation is in place to ensure that latitude goes from -90 to 90 degrees, and longitude from -180 to 180 degrees. |
crs | string (optional) | Coordinate reference system used. Currently, only EPSG:4326 is supported |
Definition: RegionOfInterest3d
RegionOfInterest3d
RegionOfInterest3d
This Object crops the attachments’ points to a rectangle on the XY plane centered around position with rotation counterclockwise to the z-axis. This must be submitted for any LiDAR TopDown annotation tasks, and defines the bounds to which the point cloud should be restricted to for annotation. The greater of the x and y dimensions are meter measurements, and are used to create a square on the XY plane. The RegionOfInterest3d
differs from geofencing in that it crops in 3D world space as opposed to 2D orthographic image space. If the RegionOfInterest3d
specified is larger than the size of the point cloud, the orthographic image will contain empty/black space where there are no points. If the RegionOfInterest3d
contains no points, then an error will be thrown. Annotations will be translated to the RegionOfInterest3d
coordinate frame, but will be translated back before the response is sent to the customer.
Property | Type | Description |
---|---|---|
position | Vector2 | Position of the center of the region to be cropped, in meters with respect to the center of the scene (0,0) |
dimensions | Vector2 | Dimensions of the region to be cropped on the XY plane, in meters with respect to the position of the center of the region to be cropped as defined above |
rotation | float | Specifies the rotation counterclockwise to the z-axis, in degrees |
Definition: CameraContext
CameraContext
CameraContext
objects are non-primary images that can be referenced during labeling. They are used to provide additional information to annotators, but are not annotated themselves.
Property | Type | Description |
---|---|---|
type | object | Must be either “lidar_camera”, “world_camera” or “pixel_camera” “lidar_camera”: It is necessary to send the region_of_interest_3d, as an ortho_projection is needed to correctly handle Lidar cameras. “world_camera”: In this case, the images are from a camera with lat/long coordinates. A CameraContext object of this type must have the lat/lon of the position of the camera, and the link to the image-attachment. It is also necessary to send the parameter, region_of_interest_2d, that will allow for the correct transformation from lat/lon to the main image coordinate system pixels.“pixel_camera”: In this case, the reference images are in the same coordinate system as the main image. It is only required to specify type, link to the image, and camera position in x,y,z. |
lat | float required for type world_camera | Latitude of the position of the camera |
long | float required for type world_camera | Longitude of the position of the camera |
alt | float | Altitude of the position of the camera |
attachment | string | Link to the camera image |
camera_position | Vector3 required for type pixel_camera | Position of the camera in the same coordinate system as the provide image |
frame | int | Frame of the image |
Definition: Link
Link
Links
are created to represent a relationship between two Annotation
objects.
Property | Type | Description |
---|---|---|
to | string | Annotation UUID |
from | string | Annotation UUID |
label | string | Selected from the link labels defined in submitted taxonomy |
attributes | object | Dict of all attribute names and selected values for the Link |
Definition: OrthoResponse
OrthoResponse
{
"response_type": "ortho_response",
"annotations": [
{
"label": "Lane Line",
"uuid": "3927f821-bfca-4cc9-8f6c-4fcf3c9e524e",
"vertices": [
{
"x": 1.400146484375,
"y": 5.218505859375
},
{
"x": 1003.986572265625,
"y": 1053.152587890625
},
{
"x": 2984.986572265625,
"y": 2701.152587890625
}
],
"type": "line"
}
]
}
The OrthoResponse
contains annotations in local coordinates of the task. Note that the Annotations
returned in the OrthoResponse
are in 2D.
Property | Type | Description |
---|---|---|
response_type | string | Constant “ortho_response” |
annotations | Array<Annotation > | In the 2D pixel coordinate space of the projected TopDown task image |
links | Array<Link > |
Definition: WorldResponse
WorldResponse
{
"response_type": "world_response",
"annotations": [
{
"label": "Lane Line",
"uuid": "3927f821-bfca-4cc9-8f6c-4fcf3c9e524e",
"vertices_3d": [
{
"x": 10014.00146484375,
"y": 10052.18505859375,
"z": 5.153
},
...
],
"type": "line"
}
]
}
The WorldResponse
contains annotations that are re-projected from the region_of_interest_3d into the World scene coordinates of the original Lidar data. The GroundMesh is used to assign altitude (Z-coordinates) to all Annotation
vertices.
Note that aerial imagery tasks have their WorldResponse
and OrthoResponse
in the same coordinate space as no region_of_interest_3d is specified.
Property | Type | Description |
---|---|---|
response_type | string | Constant “world_response” |
annotations | Array<Annotation > | In 3D world coordinate space |
links | Array<Link > |
Definition: CameraResponse
CameraResponse
[
{
"response_type": "camera_response",
"annotations": [
{
"label": "Lane Line",
"uuid": "3927f821-bfca-4cc9-8f6c-4fcf3c9e524e",
"vertices_3d": [
...
],
"type": "line"
}
],
"frame_number": 0,
"camera_index": 0,
"metadata": {}
},
...
]
For each Lidar Camera Context Attachment, a CameraResponse
is generated, which holds an array of all World annotations projected into the 3D coordinate space from the PoV of the Lidar camera. In the LidarTopdown final task response, the “camera” field holds an array of CameraResponse
, one for each Lidar Camera Context Attachment.
Property | Type | Description |
---|---|---|
response_type | string | Constant “camera_response” |
annotations | Array<Annotation > | In 3D world coordinate space from the perspective of the indicated camera context attachment |
frame_number | float | Matches the frame_number on the submitted camera context attachment |
camera_index | float | Matches the camera_index on the submitted camera context attachment |
Definition: ImageResponse
ImageResponse
[
{
"response_type": "image_response",
"annotations": [
{
"label": "Lane Line",
"uuid": "3927f821-bfca-4cc9-8f6c-4fcf3c9e524e",
"vertices": [
...
],
"type": "line"
}
],
"frame_number": 0,
"camera_index": 0,
"metadata": {}
},
...
]
For each Lidar Camera Context Attachment, the ImageResponse
holds an array of all World annotations projected onto the camera context image itself. These annotations are in the image space of the camera context image, and as such are 2D annotations. As with the ImageResponse
, in the LidarTopdown final task response, the CameraResponse
is an array holding CameraResponses
for each Lidar Camera Context Attachment.
Property | Type | Description |
---|---|---|
response_type | string | Constant "camera" |
annotations | Array<Annotation > | In the 2D pixel coordinate space of the camera context image |
frame_number | float | Matches the frame_number on the submitted camera context attachment |
camera_index | float | Matches the camera_index on the submitted camera context attachment |
Definition: Frame
Frame
Frame
objects represent all the point cloud, image, and other data that is sent to the annotator.
Property | Type | Description |
---|---|---|
device_position | Vector3 | position of the LIDAR sensor or car with respect to a static frame of reference, i.e. a pole at (0,0,0) remains at (0,0,0) throughout all frames. This should use the same coordinate system as the points and radar_points . |
device_heading | Quaternion | Heading of the car or robot that the LIDAR is on top of with respect to a static frame of reference, expressed as a Quaternion . See Heading Examples for examples. |
device_gps_pose | GPSPose (optional) | GPS pose (location and bearing) of the robot in the world. The GPS pose provided should correspond to the best estimate of the pose of the same point as defined in device_position and device_heading . |
points | list of LidarPoint | Series of points representing the LIDAR point cloud, normalized with respect to a static frame of reference, i.e. a pole at (0,0,0) remains at (0,0,0) throughout all frames. This should use the same coordinate system as device_position and radar_points .For LidarTopDown: Using the LiDAR points, Scale generates two images used for labeling: 1. A flattened 2D image with point cloud density, elevation, and ego trajectory 2. A ground mesh to use when projecting annotations into camera images Multi-pass LiDAR data is ideal for LidarTopDown annotation as sparse areas cannot be annotated leading to global inconsistency. |
radar_points | list of RadarPoint (optional) | A list of RadarPoint s corresponding to the given frame, defining objects which should be labeled using a combination of radar and camera. This should use the same coordinate system as device_position and points . |
images | list of CameraImage (optional) | A list of CameraImage objects that can be superimposed over the LIDAR data.For LidarTopDown: Annotations are projected into camera images as an additional reference point - You may optionally provide a metadata field. This field will be included in each item of the camera response and image response - Supports different camera frame rates from Lidar - You may specify “images”:[] when there are no camera images for a given frame |
timestamp | float (optional) | The starting timestamp of the sensor rotation, in nanoseconds |
Example JSON file for a Frame
Frame
https://static.scale.com/scaleapi-lidar-pointclouds/example.json
Steps to Process a series of LIDAR data
- Create a
Frame
JSON object and save to a file; alternatively, create aLidarFrame
protobuf message and save to a file. Repeat for each lidar frame. - Upload the files to a Scale accessible location (e.g. an S3 bucket) and record the URLs.
- Send a
POST
request tohttps://api.scale.com/v1/task/lidarannotation