The entire purpose of human evaluation is to provide genuine human judgment. Using AI tools to do the evaluation for you defeats the purpose and produces training data that actively degrades model quality. Our policy on AI/LLM usage is strict and non-negotiable.
What is allowed
- Grammar and spelling correction — You may use tools like Grammarly or a spell-checker to clean up typos and grammatical errors in your written justifications.
- Minor phrasing refinement — Small adjustments to tone or wording are acceptable, as long as the ideas, reasoning, and structure are entirely your own.
What is strictly prohibited
- Using AI to evaluate or assess response quality. You may not ask ChatGPT, Claude, Gemini, or any other AI system to tell you which response is better.
- Using AI to generate justifications or explanations. Your rationale for why one response is better must come from your own reasoning.
- Using AI to predict code behaviour or runtime results. If a task involves evaluating code, you must assess it yourself.
- Having AI compose or substantially rewrite your submissions. AI may not write your responses, even partially. Light grammar cleanup is the only exception.
Enforcement
Submissions are monitored for signals of AI-generated content. If your work is flagged as AI-assisted beyond the allowed scope:
- Your submission will be rejected.
- You will receive a warning or, in serious cases, immediate contract termination.
- Repeated violations result in a permanent ban from the platform.
The rule of thumb: AI tools may fix how you write, but never what you write. The judgment, analysis, and reasoning must be yours.