Evaluation work at Sovrano AI falls into several categories. You may encounter one or more of these depending on your project assignment.
RLHF (Reinforcement Learning from Human Feedback)
You're shown a prompt and two or more AI-generated responses. Your job is to rank the responses, select the better one, and write a justification explaining your reasoning. This is the most common task type.
Data Annotation & Labelling
You label, tag, or classify data according to a provided rubric. This can include text, images, or multimodal content. Precision and consistency are critical.
Red-Teaming & Safety
You attempt to find vulnerabilities, biases, or safety issues in AI model outputs. This requires creative thinking and domain knowledge to identify edge cases.
Response Writing (SFT)
You write ideal responses to prompts from scratch. These become training examples for the model. This is the highest-skill, highest-pay task type — your response quality directly shapes model behaviour.
Rating & Scoring
You evaluate a single AI response against a rubric, scoring it on criteria like accuracy, helpfulness, completeness, clarity, and safety. Calibration with other evaluators is important.