Types of evaluation tasks

Tasks & Projects · Students & Scholars · Updated

Evaluation work at Sovrano AI falls into several categories. You may encounter one or more of these depending on your project assignment.

RLHF (Reinforcement Learning from Human Feedback)

You're shown a prompt and two or more AI-generated responses. Your job is to rank the responses, select the better one, and write a justification explaining your reasoning. This is the most common task type.

Data Annotation & Labelling

You label, tag, or classify data according to a provided rubric. This can include text, images, or multimodal content. Precision and consistency are critical.

Red-Teaming & Safety

You attempt to find vulnerabilities, biases, or safety issues in AI model outputs. This requires creative thinking and domain knowledge to identify edge cases.

Response Writing (SFT)

You write ideal responses to prompts from scratch. These become training examples for the model. This is the highest-skill, highest-pay task type — your response quality directly shapes model behaviour.

Rating & Scoring

You evaluate a single AI response against a rubric, scoring it on criteria like accuracy, helpfulness, completeness, clarity, and safety. Calibration with other evaluators is important.

Frequently asked questions

Can I switch between task types within the same project?

This depends on the project. Some projects involve multiple task types and will rotate you between them. Others are focused on a single type. Your project lead will clarify what's expected during onboarding.

Are there minimum quality standards for each task type?

Yes. Each task type has specific quality rubrics. For RLHF, justifications must be specific and evidence-based. For annotation, precision and consistency are measured against calibration sets. Your project lead will share the relevant quality guidelines during onboarding.

What if there are no tasks available for me right now?

Task availability fluctuates based on project needs. If your queue is empty, check back later or reach out to your project lead on Discord. Gaps between tasks are normal, especially between project phases.

Do different task types pay different hourly rates?

Yes. Response Writing (SFT) and Red-Teaming typically command higher rates due to the skill level required. RLHF and annotation rates vary by domain complexity. Your rate is specified in your contract and doesn't change based on the task type unless your contract explicitly states otherwise.

Which task type is best for beginners?

Rating & Scoring and RLHF tasks are generally the most accessible for new evaluators. They have clear rubrics and structured formats. Response Writing (SFT) is the most demanding and is typically offered to experienced evaluators.

Can I request to be assigned a specific task type?

You can express a preference to your project lead, but task assignments are based on project needs and your demonstrated abilities. Performing well on your current tasks is the best way to influence future assignments.

How do I know if I'm doing a task correctly?

Review the project guidelines and rubric carefully. Use Discord to ask your project lead if you're unsure. Some projects include calibration tasks at the start to help you understand expectations. Pay attention to any feedback on your submissions.

What happens if I consistently struggle with a task type?

Your project lead will typically provide feedback and guidance. If quality doesn't improve, you may be reassigned to a different task type or project. This is not necessarily a negative outcome — different people excel at different types of work.

Related articles

Can't find what you're looking for?

Our team is here to help.

Email Support