Quality Framework

The UTL Quality & QA Framework

Every label we deliver is accurate, consistent, and auditable. Our 6-step quality pipeline is the backbone of everything we do.

Definitions

What We Measure

Three pillars of annotation quality that we track across every project.

Accuracy

Does the label match the ground truth? We measure accuracy against gold sets and expert reviews, targeting 95–99%+ depending on task complexity.

Consistency

Do different annotators produce the same labels for the same data? We track inter-annotator agreement (IAA) to ensure labeling consistency across teams.

Completeness

Is every required label present? Are all edge cases handled? We enforce completeness checks through schema validation and automated QA rules.

Pipeline

Our 6-Step Quality Pipeline

01

Task Design & Guideline Creation

We co-design labeling guidelines with your team. This includes edge-case documentation, decision trees for ambiguous cases, visual examples, and rubrics for subjective tasks. Guidelines are versioned and change-logged.

02

Gold Set Creation & Calibration

We curate gold-standard labeled examples for every task type. These are used for annotator onboarding, ongoing calibration, and performance benchmarking. Gold sets are refreshed quarterly or when guidelines change.

03

Production Labeling

Domain-trained annotators work within managed pods. Each pod has a dedicated PM and QA lead. Throughput, accuracy, and time-per-task are tracked daily.

04

Multi-Layer Review & Adjudication

L1 annotators → L2 reviewers → L3 adjudicators. Disagreements are resolved through structured adjudication with a disagreement taxonomy. Every label has an audit trail.

05

Metrics & Dashboards

Real-time visibility into inter-annotator agreement (IAA), per-class accuracy, error rates, and annotator performance. Metrics inform continuous improvement.

06

Delivery & Continuous Feedback

Structured delivery with acceptance reports, format validation, and metadata. Client feedback is integrated into the next iteration cycle. Regression testing ensures consistency across versions.

Review Structure

L1 / L2 / L3 Reviewer Hierarchy

L1

Annotators

Domain-trained annotators who perform initial labeling following project guidelines. Calibrated against gold sets weekly.

L2

Reviewers

Senior annotators who review L1 output, flag errors, and enforce consistency. Each reviewer covers 3–5 annotators.

L3

Adjudicators

QA leads who resolve disagreements, update guidelines, and sign off on delivery batches. Final authority on ambiguous cases.

Details

Additional Quality Controls

Sampling-based QA with configurable error budgets per batch

Disagreement taxonomy with structured resolution protocols

Guideline versioning with change logs and impact analysis

Full audit trail — every label traced to annotator, reviewer, and timestamp

Delivery acceptance criteria agreed before project starts

Metrics report included with every batch delivery

Continuous monitoring across production batches

Gold set refresh cycles tied to guideline updates

Download the QA Checklist

A step-by-step checklist for building and auditing your annotation QA pipeline.

Want to See Our QA in Action?

Start with a pilot project and experience our quality pipeline firsthand.