The input into our sensor fusion application will be a series of points, radar points, and camera images that will be rendered and labeled. Because of the size of the objects involved, we will require that the data be JSON-encoded (or protobuf-encoded) and accessible via a URL passed in through the task request. Basically, in order to annotate a point cloud frame, format the data in one of our accepted formats, upload the data as a file, and then send a request to the Scale API, similar to the way that we would process image files.
Below are our definitions for our various object types for the JSON format, and for an entire point cloud frame. The protobuf format is largely identical, and can be downloaded here; the difference is that camera intrinsic parameters are encoded as a oneof
within the CameraImage
message type, and thus no camera_model
field is needed.
Definition: Vector3
Vector3
{
"x": 1,
"y": 2,
"z": 3
}
Vector3
objects are used to represent positions, and are JSON objects with 3 properties.
Property | Type | Description |
---|---|---|
x | float |
|
y | float |
|
z | float |
|
Definition: LidarPoint
LidarPoint
{
"x": 1,
"y": 2,
"z": 3,
"i": 0.5,
"d": 2,
"t": 1541196976735462000,
"is_ground": true
}
LidarPoint
objects are mainly used to represent LIDAR points, and are JSON objects.
Property | Type | Description |
---|---|---|
x | float |
|
y | float |
|
z | float |
|
i | float (optional) | optional intensity value. Number between |
d | integer (optional) | optional non-negative device id to identify points from multiple sensors (i.e. what device captured the point) |
t | float (optional) | optional timestamp of the sensor's detection of the LiDAR point, in nanoseconds |
is_ground | boolean (optional) | optional flag to indicate whether point is of the ground. If not specified, the flag will default to |
Good things to note:
z
is up, so the x-y plane should be flat with the groundScale processes point coordinates as 32-bit floats. If your coordinates are greater than 10^5, your point cloud may suffer from rounding effects.
What we recommend as a best practice is applying the negative position of the first frame as an offset to all points/calibrations to avoid this, such that the first frame is at position 0,0,0 and all subsequent frames are just the offsets.
Why 10^5 as a suggested limit?
Given that 32 bit floats support 23 bits of precision, when using 10e5 order coordinates, that only leaves you about 2 decimal places of precision. (2e23 ~= 10e7)Assuming a meter-based unit of measure, this means the point precision would be to the nearest centimeter. Going above 10e5 reduces the precision to an often unacceptable degree, hence the warning here.
Definition: Quaternion
Quaternion
{
"x": 1,
"y": 1,
"z": 1,
"w": 1
}
Quaternion
objects are used to represent rotation. We use the Hamilton quaternion convention, where i^2 = j^2 = k^2 = ijk = -1
, i.e. the right-handed convention.
The quaternion represented by the tuple (x, y, z, w)
is equal to w + x*i + y*j + z*k
.
Property | Type | Description |
---|---|---|
x | float |
|
y | float |
|
z | float |
|
w | float |
|
Definition: GPSPose
GPSPose
{
"lat": 1,
"lon": 1,
"bearing": 1
}
GPSPose
objects are used to represent the pose (location and direction) of the robot in the world.
Property | Type | Definition |
---|---|---|
lat | float | latitude for the location of the robot. Number between |
lon | float | longitude for the location of the robot. Number between |
bearing | float |
|
Definition: CameraImage
CameraImage
CameraImage
objects represent an image and the camera position/heading used to record the image.
Camera models supported:
- Pinhole model with Brown-Conrady distortion (supports
k1
,k2
,k3
,p1
,p2
distortion parameters) - OpenCV fisheye model with radial distortion (supports
k1
,k2
,k3
,k4
distortion parameters) - OpenCV omnidirectional model (supports
k1
,k2
,k3
,p1
,p2
distortion parameters)
Property | Type | Description |
---|---|---|
timestamp | float (optional) | The timestamp, in nanoseconds, at which the photo was taken |
image_url | string | URL of the image file |
scale_factor | float (optional) | Factor by which image has been downscaled (if the original image is 1920x1208 and image_url refers to a 960x604 image, |
position |
| World-normalized position of the camera |
heading |
| Vector |
priority | integer (optional) | A higher value indicates that the camera takes precedence over other cameras when a single object appears in multiple camera views. If you are using a mix of long range and short range cameras with overlapping coverage, you should set the short range cameras to priority 1 (default priority is 0). |
camera_index | integer (optional) | A number to identify which camera on the car this is (to be used in any 2D/3D linking task sent after completion of sensor fusion task). If not specified, will be inferred from the CameraImage's position in the array. |
camera_model | string (optional) | Either |
fx | float | focal length in x direction (in pixels) |
fy | float | focal length in y direction (in pixels) |
cx | float | principal point x value |
cy | flaot | principal point y value |
skew | float (optional) | camera skew coefficient |
k1 | float (optional) | 1st radial distortion coefficient (Brown-Conrady, fisheye, omnidirectional) |
k2 | float (optional) | 2nd radial distortion coefficient (Brown-Conrady, fisheye, omnidirectional) |
k3 | float (optional) | 3rd radial distortion coefficient (Brown-Conrady, fisheye, omnidirectional) |
k4 | float (optional) | 4th radial distortion coefficient (fisheye only) |
p1 | float (optional) | 1st tangential distortion coefficient (Brown-Conrady, omnidirectional) |
p2 | float (optional) | 2nd tangential distortion coefficient (Brown-Conrady, omnidirectional) |
xi | float (optional) | reference frame offset for omnidirectional model |
For example, to represent a camera pointing along the positive x-axis (i.e. the camera's z-axis points along the world's x-axis) and oriented normally (the camera's y-axis points along the world's negative z-axis), the corresponding heading is 0.5 - 0.5i + 0.5j - 0.5k
.
Definition: RadarPoint
RadarPoint
{
"position": {
"x": 100.1,
"y": 150.12,
"z": 200.2
},
"direction": {
"x": 1,
"y": 0,
"z": 0
},
"size": 0.5
}
RadarPoint
objects are used to define an individual radar point and Doppler in a particular frame.
Property | Type | Description |
---|---|---|
Position |
| A vector defining the position of the RADAR point in the same frame-of-reference as the frame's |
direction |
| A vector defining the velocity (direction and magnitude) of a potential Doppler associated with the RADAR point. This vector is relative to the individual RADAR point, and in the global reference frame. The magnitude of the vector should correspond to the speed in |
size (optional, default |
| A float from |
Definition: Frame
Frame
Frame
objects represent all the point cloud, image, and other data that is sent to the annotator.
Property | Type | Description |
---|---|---|
device_position |
| position of the LIDAR sensor or car with respect to a static frame of reference, i.e. a pole at |
device_heading |
| Heading of the car or robot that the LIDAR is on top of with respect to a static frame of reference, expressed as a |
device_gps_pose |
| GPS pose (location and bearing) of the robot in the world. The GPS pose provided should correspond to the best estimate of the pose of the same point as defined in |
points | list of | Series of points representing the LIDAR point cloud, normalized with respect to a static frame of reference, i.e. a pole at |
radar_points | list of | A list of |
images | list of | A list of |
timestamp | float (optional) | The starting timestamp of the sensor rotation, in nanoseconds |
Example JSON file for a Frame
Frame
https://static.scale.com/scaleapi-lidar-pointclouds/example.json
Steps to Process a series of LIDAR data
- Create a
Frame
JSON object and save to a file; alternatively, create aLidarFrame
protobuf message and save to a file. Repeat for each lidar frame. - Upload the files to a Scale accessible location (e.g. an S3 bucket) and record the URLs.
- Send a
POST
request tohttps://api.scale.com/v1/task/lidarannotation