Avoiding Duplicate Tasks

Creating duplicate tasks is an issue every team should be mindful to avoid.

Scale AI provides two different mechanisms to prevent duplicate tasks from being created in its task creation endpoints. This allows you to resubmit requests that may have failed in transit or otherwise need to be retried without the risk of creating duplicate tasks.

Option 1: The unique_id field

The unique_id field is a field available on every task type Scale provides.

Once a unique_id has been submitted to Scale, any future task creation requests with the same unique_id will fail with a 409 error that also conveniently points to the conflicting task.

{  
    "unique_id": "s3://bucket/file.png",
    "instruction": "Do the thing",
    "callback_url": "[email protected]",
    ...
}
{
    "status_code": 409,
    "error": 'The unique_id ("s3://bucket/file.png") is already used for a different task (602c399c6d092c00115aa3c9).'
}

Values passed into the unique_id field are permanently associated with the task and will always be returned to you when retrieving tasks from our platform.

You are able to query for tasks directly based on the unique_id field at any point with our Task Retrieval endpoints.

Best Practices:

  1. unique_id should be thought of as your own customizable id for a task. Ideally, this id can be easy to look up based on the data you have available on your side. A good unique_id might be the filename being submitted, or other types of metadata like a scene or run id that you use internally.

  2. unique_id is set globally across all projects and task types. If you'd like to enforce uniqueness only within a project or task type, we recommend simply prepending or appending the project or task type to the unique id itself, problem solved!

Option 2: The Idempotency-Key header

To use this feature, provide a header Idempotency-Key: <key>. You, the client, are responsible for ensuring the uniqueness of your chosen keys. We recommend using V4 UUIDs.

curl "https://api.scale.com/v1/task/comparison" \
  -u "{{ApiKey}}:" \
  -H "Idempotency-Key: UNIQUE_IDENTIFIER"
  -d callback_url="http://www.example.com/callback" \
  ...

The results of requests specifying an idempotency key are saved. If we later receive a matching request with the same idempotency key, the saved response will be returned, and no additional task will be created. Note that this behavior holds even when the response is an error. Keys are removed after 24 hours.

If an incoming request has the same idempotency key as a saved request, but the two requests do not match in parameters or the users associated with the two requests are different, we will return a 409 error.

In rare situations, we may return a 429 error if two matching requests with identical idempotency keys are made simultaneously. In this case, it is safe to retry.

When would I use this instead of the unique_id field?
Using the header-based approach is useful in retry logic that catches network or other transient failure modes when you would be immediately retrying the exact same request. Specifically, the feature that allows you to seamlessly get the same task response back if the payload didn't change makes for easier code integrations.

You are able to use both options simultaneously as well.

Workflow Support

Because Unique Ids are permanently tied to a task, this means if something unexpected happened, it can be hard to recover on your own. We have added two features to help support more robust workflows.

Canceling Tasks
When canceling tasks, there is a clear_unique_id query parameter you can specify on the request. See the Cancel Task endpoint for more details.

Errored Tasks
Sometimes after a task is submitted, it can run into an error, especially in regards to processing attachments.

Everywhere you can specify a unique id, you can also specify clear_unique_id_on_error: true. As the param name suggests, if the task reaches an error status, the unique id will automatically be unset, such that you could submit a new task with the same new unique id.