Improve Quality with Training & Evaluation Tasks

Quality tasks help you monitor and improve the performance of your labelers. There are 2 types of quality tasks:
- Training Tasks: A subset of audited tasks that Annotators must complete before attempting live tasks from your production batch. These tasks make up the training course that all Annotators must complete (while meeting a certain quality bar) in order to onboard onto your project.
- Evaluation Tasks: A subset of audited tasks that will help track quality of the Annotators. These are tasks that we serve to Annotators after they’ve onboarded onto your project. To the Annotators, it appears as any other task on the project. However, since we already know what the correct labels are, we are able to evaluate how well they performed on the task. This enables us to ensure that labelers continue to perform at a high quality bar over the entire course of time that they’re working on the project. Labelers who drop below the quality threshold can be set to be automatically taken off the project.

Note: Training and evaluation tasks are not served to labelers in calibration batches. They are served only in production batches.

Determining Whether to Create a Training or Evaluation Task

In order to ensure quality of your labels, you'll need to decide on subsets of Training tasks and Evaluation tasks.

If you think the task would be a good one for all Annotators to complete and get some practice on before moving on to the live Production Batch tasks, it would make sense to make the task a Training task. Remember to think about your Training tasks as a set - make sure they cover a good breadth of the data variability of your dataset. These tasks should generally be easier, as it will be the first time an Annotator encounters your data.

If you think the task would be good one to track in terms of measuring quality of your Production Batch tasks, it would make sense to make the task an Evaluation task. Remember to think about your Training tasks as a set - make sure they cover a good breadth of the data variability of your dataset. These tasks should generally be harder, since they will be randomly served to Annotators to gauge quality and accuracy. Note that since they tend to be harder, your general Production Batch quality should be higher than your Evaluation task quality.

Creating a Training or Evaluation Task

You can create a quality task (a training or evaluation task) from any audited task.

You can create a batch of data & label it yourself - with the intention of creating a very specific set of training/evaluation tasks. You would then go in to audit the tasks you did, and approve all the tasks you worked on. This is helpful if you are just getting started and you want to create training tasks for your annotators to start off on.

OR, you can have your annotators start working on tasks - and you can create quality tasks out of their completed output as your project progresses to create a continuous cycle of improvement.

From the audit modal (when you approve or fix/save a task), you will have the option to create either a training task or an evaluation task out of it.

  • Creating a training task will add the task to a the onboarding course annotators have to take prior to labeling on your project
  • Creating an evaluation task will turn the task into a golden dataset that annotators will be secretly graded against in the future to gauge their performance.
2868

NOTE: Evaluation Tasks are automatically split into initial and review based on the changes you made in the audit. If you Rejected / Fixed and then made appropriate corrections to the attempted annotation, that Evaluation Task becomes a Review Phase Evaluation task. If you accepted the task, the evaluation task becomes an Initial Phase Evaluation task.
- Initial Phase Evaluation Tasks measure a Tasker’s ability to complete an annotation task from start to finish.
- Review Phase Evaluation Tasks measure a Tasker’s ability to take the completed work from another Tasker, and make corrections as needed.

Initial Phase Tasks get created when you approve a task, and make an evaluation task out of it. Review Phase Tasks get created when you fix a task, and make an evaluation task out of it.

1002

If you want to convert the initial phase task into a review phase task, navigate to the “Quality Lab,” open up “Initial phase tasks,” click into the task you want to convert, click the “convert” button, and select “convert to review phase evaluation task”

2872

Using Concepts & Difficulties

It is important that you create a diverse set of quality tasks. For example, for a 3 class categorization problem, you would want an equal balance between all 3 classes.

To help create class balance, Studio lets you add concepts and a difficulty to each quality task. Concepts describe what the evaluation task is about, whereas Difficulty describes how difficult the task is to complete. Tagging quality tasks with concepts and difficulties allows us to serve them in a more balanced way to Annotators, obtaining more holistic quality signals on production batches.

For example, If you created 30 evaluation tasks, but 10 of them were testing a similar error, we would make sure based on the tagged concept that we were serving annotators a more holistic set of evaluation tasks, not just tasks that are all of the same nature.

Your Quality Lab page is such a mission control for you to see all your quality tasks you've created. From this page, you can see the list of tasks that you've designated to be a training or evaluation task.