Cuboids

Given a cuboid entry in params.geometries, Scale will annotate your image or video with perspective cuboids and return the vertices of the cuboids. If camera intrinsics and extrinsics are provided as well, Scale will return scale-invariant 3D coordinates with respect to the camera, i.e. assuming the camera is at the origin. See https://scale.com/blog/3d-cuboids-annotations for a detailed explanation of how we can augment 2D cuboid responses.

Parameters

Parameter

Type

Default

Description

objects_to_annotate

array

[]

A list of string or LabelDescription objects.

min_height

integer

0

The minimum height in pixels of the cuboids you'd like to be made.

min_width

integer

0

The minimum width in pixels of the cuboids you'd like to be made.

camera_intrinsics

object

null

An object that defines camera intrinsics, in format {fx: number, fy: number, cx: number, cy: number, scalefactor: number, skew: number} (skew defaults to 0, scalefactor defaults to 1). scalefactor is used if the image sent is of different dimensions from the original photo (if the attachment is half the original, set scalefactor to 2) to correct the focal lengths and offsets. Use in conjunction with camera_rotation_quaternion and camera_height to get perspective-corrected cuboids and 3d points.

camera_rotation_quaternion

object

null

Object that defines the rotation of the camera in relation to the world. Expressed as a quaternion, in format {w: number, x: number, y: number, z: number}. Use in conjunction with camera_intrinsics to get perspective-corrected cuboids and 3d points. Note that the z-axis of the camera frame represents the camera's optical axis. Use in conjunction with camera_intrinsics and camera_height to get perspective-corrected cuboids and 3d points.

camera_height

number

null

The height of camera above the ground, in meters. Use in conjunction with camera_rotation_quaternion and camera_intrinsics to get perspective-corrected cuboids and 3d points.

Cuboid Annotation Response Format

The cuboid annotations returned in the response have the following fields:

Key

Type

Description

uuid

string

A computer-generated unique identifier for this annotation.

In video annotation tasks, this can be used to track the same object across frames.

type

string

String to indicate geometry type: cuboid

label

string

The label of this annotation, chosen from the objects_to_annotate array for its geometry. In video annotation tasks, any annotation objects with the same uuid will have the same label across all frames.

attributes

object

See the Annotation Attributes section for more details about the attributes response field.

vertices

array of Vertex objects

A list of Vertex objects defining all visible vertices of the cuboid. See the Vertex section for more details.

edges

array of Edge objects

A list of Edge objects defining the edges of the cuboid.. See the Edge section for more details.

points_2d

array of {x, y} coordinate objects

If camera_rotation_quaternion, camera_intrinsics, and camera_height were provided, contains projected 2D coordinates of all 8 vertices of the cuboid after perspective correction. See diagram below for the order that the points are returned in.

points_3d

array of {x, y, z} coordinate objects

If camera_rotation_quaternion, camera_intrinsics, and camera_height were provided, contains 3D coordinates (arbitrarily scaled, relative to the camera location) of all 8 vertices of the cuboid after perspective correction. See diagram below for the order that the points are returned in.

Points on the cuboid are returned in this order for both points_2d and points_3d:

       3-------2
      /|      /|
     / |     / |
    0-------1  |
    |  7----|--6
    | /     | /
    4-------5

Definition: Vertex

Key

Type

Description

x

number

The distance, in pixels, between the vertex and the left border of the image.

y

number

The distance, in pixels, between the vertex and the top border of the image.

type

string

Always vertex.

description

string

An enum describing the position of the vertex, which is one of:

  • face-topleft
  • face-bottomleft
  • face-topright
  • face-bottomright
  • side-topcorner
  • side-bottomcorner

Definition: Edge

Key

Type

Description

x1

number

The distance, in pixels, between the first vertex of the edge and the left border of the image.

y1

number

The distance, in pixels, between the first vertex of the edge and the top border of the image.

x2

number

The distance, in pixels, between the second vertex of the edge and the left border of the image.

y2

number

The distance, in pixels, between the second vertex of the edge and the top border of the image.

type

string

Always edge.

description

string

An enum describing the position of the edge, which is one of::

  • face-top
  • face-bottom
  • face-left
  • face-right
  • side-top
  • side-bottom
{
  ...,
  "response": {
    "annotations": [
      {
        "label": "car",
        "vertices": [
          {
            "description": "face-topleft",
            "y": 270,
            "x": 293,
            "type": "vertex"
          },
          {
            "description": "face-bottomleft",
            "y": 437,
            "x": 293,
            "type": "vertex"
          },
          {
            "description": "face-topright",
            "y": 270,
            "x": 471,
            "type": "vertex"
          },
          {
            "description": "face-bottomright",
            "y": 437,
            "x": 471,
            "type": "vertex"
          },
          {
            "description": "side-topcorner",
            "y": 286,
            "x": 607,
            "type": "vertex"
          },
          {
            "description": "side-bottomcorner",
            "y": 373,
            "x": 607,
            "type": "vertex"
          }
        ],
        "edges": [
          {
            "description": "face-top",
            "x1": 293,
            "y1": 270,
            "x2": 471,
            "y2": 270,
            "type": "edge"
          },
          {
            "description": "face-right",
            "x1": 471,
            "y1": 270,
            "x2": 471,
            "y2": 437,
            "type": "edge"
          },
          {
            "description": "face-bottom",
            "x1": 471,
            "y1": 437,
            "x2": 293,
            "y2": 437,
            "type": "edge"
          },
          {
            "description": "face-left",
            "x1": 293,
            "y1": 437,
            "x2": 293,
            "y2": 270,
            "type": "edge"
          },
          {
            "description": "side-top",
            "x1": 471,
            "y1": 270,
            "x2": 607,
            "y2": 286,
            "type": "edge"
          },
          {
            "description": "side-bottom",
            "x1": 471,
            "y1": 437,
            "x2": 607,
            "y2": 373,
            "type": "edge"
          }
        ],
        "points_2d": [
          {
            "y": 270,
            "x": 293
          },
          {
            "y": 437,
            "x": 293
          },
          {
            "y": 270,
            "x": 471
          },
          {
            "y": 437,
            "x": 471
          },
          {
            "y": 286,
            "x": 607
          },
          {
            "y": 373,
            "x": 607
          },
          {
            "y": 373,
            "x": 607
          },
          {
            "y": 373,
            "x": 607
          }
        ],
        "points_3d": [
          {
            "z": 0,
            "y": 270,
            "x": 293
          },
          {
            "z": 0,
            "y": 437,
            "x": 293
          },
          {
            "z": 0,
            "y": 270,
            "x": 471
          },
          {
            "z": 0,
            "y": 437,
            "x": 471
          },
          {
            "z": 0,
            "y": 286,
            "x": 607
          },
          {
            "z": 0,
            "y": 373,
            "x": 607
          },
          {
            "z": 0,
            "y": 373,
            "x": 607
          },
          {
            "z": 0,
            "y": 373,
            "x": 607
          }
        ],
      }
    ]
  },
  ...
}