COMPUTER VISION

Image & Video Annotation
Services

Pixel-perfect annotations for object detection, segmentation, tracking, pose estimation, and 3D point cloud labeling — with measurable IoU thresholds, per-class accuracy tracking, and multi-tier QA at every step. From single-class bounding boxes to complex multi-sensor fusion annotation.

Bounding Boxes Segmentation Keypoints Tracking 3D Point Cloud Classification

Get Started Image Annotation Tool

5M+

Images Annotated

IoU ≥ 0.92

Avg Quality Score

50K+

Frames/Week Capacity

L1→L2→L3

QA Pipeline

< 2px

Boundary Precision

25+

Automated QA Rules

TASK TYPES

Six Core Annotation Capabilities

Each task type includes specific accuracy benchmarks, throughput rates, and quality controls. All configurable to your project requirements.

Task Type 1

Bounding Box Annotation

Axis-aligned and oriented bounding boxes for object detection

2D axis-aligned bounding boxes, oriented (rotated) bounding boxes, and 3D cuboid annotations for object detection across all domains. Configurable overlap policies, occlusion flags, truncation percentage, and difficulty scoring per annotation.

TECHNICAL DETAILS

Axis-aligned + oriented (rotated) bounding boxes
Occlusion handling: visibility flags (full visible, partially occluded, heavily occluded)
Difficulty scoring: easy/medium/hard per annotation for curriculum learning

3D cuboid annotation for LiDAR point clouds and stereo imagery
Truncation percentage: 0–100% for objects at image boundaries
Multi-class support: unlimited class hierarchy with parent-child relationships

PERFORMANCE

Accuracy

IoU ≥ 0.90 (standard), ≥ 0.95 (precision-critical)

Throughput

2K–8K boxes/annotator/day

Error Rate

< 2% false positives

Task Type 2

Semantic & Instance Segmentation

Pixel-level labeling for scene understanding and object separation

Pixel-perfect semantic segmentation, instance segmentation with unique object IDs, and panoptic segmentation combining both. Support for 100+ class taxonomies with polygon, brush, superpixel, and SAM-assisted annotation tools.

TECHNICAL DETAILS

Semantic segmentation: every pixel assigned a class label
Panoptic segmentation: combined stuff (background) + things (countable objects)
Brush + superpixel tools for complex organic shapes (medical, agricultural)

Instance segmentation: individual object masks with unique IDs
Polygon annotation: sub-pixel vertex accuracy < 2px deviation
SAM-assisted pre-annotation with human correction and validation

PERFORMANCE

Accuracy

mIoU ≥ 0.88 (standard), ≥ 0.92 (precision)

Throughput

200–500 masks/annotator/day

Boundary

< 2px boundary deviation

Task Type 3

Keypoint & Pose Estimation

Skeletal keypoints for pose estimation and landmark detection

Configurable skeleton definitions for human pose (17–133 keypoints), hand tracking (21 points), facial landmarks (68–478 points), and custom articulated objects. Each keypoint includes visibility flags (visible, occluded, out-of-frame) and confidence indicators.

TECHNICAL DETAILS

Human pose: COCO 17-point, MPII 16-point, or custom skeleton definitions
Facial landmarks: 68-point (dlib), 478-point (MediaPipe), or custom configurations
Visibility flags: visible / self-occluded / occluded by other / out-of-frame

Hand tracking: 21 keypoints per hand with finger joint angles
Animal pose: custom skeletons for dogs, horses, birds (AP-10K compatible)
Inter-annotator consistency: OKS (Object Keypoint Similarity) ≥ 0.85

PERFORMANCE

Accuracy

OKS ≥ 0.85

Throughput

500–1.5K poses/annotator/day

Visibility

3-level flags per keypoint

Task Type 4

Video Object Tracking

Frame-accurate multi-object tracking with persistent IDs

Multi-object tracking (MOT) with persistent IDs maintained through occlusions, re-entries, and camera transitions. Keyframe annotation with linear/spline interpolation and manual correction. Support for single-object tracking (SOT), MOT, and multi-camera cross-view tracking.

TECHNICAL DETAILS

Persistent object IDs: maintained through occlusions up to 60+ frames
Re-identification: same object re-entering frame receives original ID
Temporal action boundaries: activity start/end with ±1 frame precision

Keyframe annotation + interpolation: linear and spline with human correction
Multi-camera tracking: cross-view identity linking with shared ID namespace
Track-level attributes: object class, behavior state, direction, speed estimate

PERFORMANCE

Accuracy

MOTA ≥ 0.90, IDF1 ≥ 0.85

Id Switches

≤ 0.5% per sequence

Throughput

500–2K video-min/week

Task Type 5

Image Classification & Tagging

Multi-label hierarchical classification with confidence scoring

Single-label, multi-label, and hierarchical classification with configurable confidence thresholds. Support for fine-grained recognition (breed identification, species classification), quality assessment (defect grading), and content moderation across millions of images.

TECHNICAL DETAILS

Single-label classification with top-1 and top-3 predictions
Hierarchical taxonomy: parent-child class relationships with inheritance rules
Quality/condition assessment: 5-point Likert scales for subjective attributes

Multi-label tagging: unlimited tags per image with confidence scores
Fine-grained recognition: 500+ sub-categories within a domain
Active learning integration: model-confidence-based sample routing

PERFORMANCE

Accuracy

≥ 97% top-1 classification accuracy

Throughput

5K–20K images/annotator/day

Agreement

κ ≥ 0.90

Task Type 6

3D Point Cloud & LiDAR Annotation

3D bounding cuboids, segmentation, and lane marking for autonomous systems

3D bounding cuboid annotation in LiDAR point clouds with heading angle, velocity estimation, and multi-frame tracking. Semantic segmentation of point clouds, lane/road boundary marking, and sensor fusion annotation linking LiDAR to camera imagery.

TECHNICAL DETAILS

3D bounding cuboids: position, dimensions, heading angle (yaw, pitch, roll)
Lane and road boundary annotation in 3D space
Sensor fusion: linked annotations across LiDAR, camera, radar, and IMU

Point-level semantic segmentation: ground, vehicle, pedestrian, vegetation, etc.
Multi-frame tracking: 3D cuboid trajectories with velocity and acceleration estimates
Timestamp synchronization: ≤ 10ms alignment across sensor modalities

PERFORMANCE

Accuracy

3D IoU ≥ 0.70 (standard), heading error < 5°

Throughput

200–500 frames/annotator/day

Fusion

Multi-sensor linked annotations

Industries

Where Our CV Annotations Are Used

Domain-specific annotation protocols for safety-critical and high-accuracy applications across six major verticals.

Autonomous Driving

LiDAR-camera fusion, 3D cuboid tracking, lane detection, traffic sign/light recognition, pedestrian tracking, and edge-case scenario annotation across ODD (Operational Design Domain) coverage matrices.

3D cuboid + tracking Multi-sensor fusion ODD coverage tracking ISO 21448 (SOTIF) aligned

Medical Imaging

DICOM annotation for CT, MRI, X-ray, histopathology WSI, and ophthalmology. Organ segmentation, lesion classification, landmark detection, and measurement by board-certified radiologists and pathologists.

Board-certified annotators Dice ≥ 0.90 HIPAA/IRB compliant DICOM-SEG output

Retail Intelligence

Product recognition, shelf compliance analysis, planogram verification, visual search annotation, customer behavior tracking, and inventory management for retail AI systems.

SKU-level taxonomy Planogram compliance Behavior tracking Multi-camera linking

Manufacturing QC

Surface defect detection, assembly verification, weld inspection, dimensional compliance, and quality grading for industrial quality control on high-speed production lines.

Defect taxonomy (ISO 9001) Sub-mm precision Rare defect focus Production-line sync

Surveillance & Security

Person detection, action recognition, anomaly detection, crowd analysis, license plate recognition, and multi-camera tracking with privacy-compliant annotation workflows.

Privacy-first (face blur) Multi-camera ReID Action recognition 24/7 temporal coverage

Agriculture & Earth Observation

Drone and satellite imagery annotation for crop health monitoring, pest detection, land use classification, yield estimation, and infrastructure monitoring across growing seasons.

Multispectral annotation BBCH growth stages Disease severity (0–9) Temporal tracking

Quality

CV-Specific Quality Controls

Computer vision annotation demands pixel-level precision. Here's how we maintain it across large teams and complex projects.

Gold Set Calibration

IoU threshold validation against expert-labeled gold sets. Annotators must achieve 0.85+ IoU on the gold set before touching production data. Gold sets refreshed 10% monthly to prevent memorization.

IoU ≥ 0.85 on gold set required for production access

Multi-Reviewer Pipeline (L1→L2→L3)

L1 annotators produce initial labels. L2 reviewers audit 100% of output (not sampling). L3 adjudicators resolve disagreements and edge cases. Complex annotations always pass through at least two sets of eyes.

100% L2 review coverage (no sampling)

Per-Class Metrics Tracking

We track accuracy, precision, recall, and IoU per class — not just aggregate metrics. Rare but critical classes (pedestrians, small objects, defects) get extra QA attention and dedicated review queues.

Per-class IoU, precision, recall dashboards updated hourly

Automated Consistency Checks

Rule-based validation catches common errors: overlapping bounding boxes, missing labels, impossible polygon shapes, label-class mismatches, and boundary violations. Errors flagged before human review.

25+ automated validation rules per project

Drift Detection & Alerts

Statistical monitoring across batches detects quality drift before it impacts your model. Batch-over-batch IoU, accuracy, and error-type distributions are compared. Automatic alerts trigger recalibration when drift exceeds ±2%.

Drift alerts at ±2% threshold, recalibration within 4 hours

Edge Case Libraries

Growing libraries of ambiguous and edge-case examples with documented resolution decisions. Used for annotator training, guideline refinement, and quality audit. Every edge case becomes a reusable training asset.

100+ documented edge cases per mature project

COMPATIBILITY

Export Formats & Tool Integration

We deliver data in any format your training pipeline needs, and work with the tools you already use.

Export Formats

COCO JSON

Pascal VOC XML

YOLO TXT

TFRecord

Cityscapes

KITTI

MOT Challenge

NiFTI/NRRD

LabelMe JSON

Custom Schema

Tool Compatibility

CVAT

Label Studio

Labelbox

Supervisely

Roboflow

VGG Image Annotator

Custom Platforms

COMPARISON

UTL CV Annotation vs. Typical Providers

Capability	UTL Data Engine	Typical Providers
Gold set calibration with monthly refresh		Initial only
100% L2 review (not sampling)		5–10% sampling
Per-class IoU/precision/recall tracking		Aggregate only
3D cuboid + multi-sensor fusion		2D only
Automated consistency validation (25+ rules)		Basic checks
Drift detection with auto-alerts
Edge case libraries with decision docs
SAM-assisted pre-annotation
Domain-trained annotators (20+ hr onboarding)		2–4 hr training
Sub-pixel polygon accuracy (< 2px)		5–10px typical

“UTL reduced our annotation rework by over 50%. Their gold set calibration and per-class IoU tracking caught quality issues that our previous vendor missed entirely. The 100% L2 review coverage is what makes the difference — no more sampling-based QA surprises.”

ML Engineering Lead

Series B Autonomous Vehicle Company

FAQS

Computer Vision Questions

What IoU thresholds do you target?

Default targets: IoU ≥ 0.90 for bounding boxes, mIoU ≥ 0.88 for segmentation, OKS ≥ 0.85 for keypoints, 3D IoU ≥ 0.70 for cuboids. For precision-critical projects (medical, AV), we configure higher thresholds (≥ 0.95 for 2D, ≥ 0.92 for segmentation). All thresholds are agreed during scoping and validated against gold sets.

Can you handle 3D point cloud and multi-sensor fusion?

Yes. We annotate LiDAR point clouds with 3D cuboids, point-level segmentation, and trajectory tracking. Multi-sensor fusion links annotations across LiDAR, camera, radar, and IMU with ≤ 10ms timestamp synchronization. Our teams are trained on common AV sensor configurations.

What's your capacity for video annotation?

Steady-state: 500–2K video-minutes/week per project with MOTA ≥ 0.90 tracking accuracy. For burst projects, we scale to 5K+ video-minutes/week with parallel annotator teams. Multi-object tracking, re-identification, and temporal action segmentation are all supported.

How do you handle occlusion and edge cases?

We build edge-case libraries during guideline creation (100+ documented cases per mature project) and maintain a structured decision taxonomy. Occluded objects receive visibility flags (fully visible, partially occluded, heavily occluded) and truncation percentages. New edge cases discovered during production are documented and added to the library with resolution decisions.

Do you support SAM-assisted annotation?

Yes. We use Segment Anything Model (SAM) for pre-annotation to accelerate segmentation tasks. Human annotators review, correct, and validate all SAM-generated masks. This hybrid approach typically delivers 2–3× throughput improvement while maintaining our quality standards.

What's the typical engagement timeline?

Scoping + guideline design: 3–5 days. Team assembly + calibration: 5–7 days. Pilot (1K–5K samples): 5–10 days. First labeled batch by Day 20. Full production velocity by Day 25. We maintain pre-qualified teams across major domains for faster ramp-up.

Need Pixel-Perfect Annotations?

Let's discuss your computer vision data pipeline — from task design to quality-assured delivery. We'll scope a pilot within 48 hours.

Book a Strategy Call Request a Pilot

LLM & Generative AI Data

Computer Vision

NLP & Document AI

Audio & Speech

Data Collection

Data Curation

Annotation Services

Human-in-the-Loop

Image & Video Annotation
Services

Six Core Annotation Capabilities

Bounding Box Annotation

Semantic & Instance Segmentation

Keypoint & Pose Estimation

Video Object Tracking

Image Classification & Tagging

3D Point Cloud & LiDAR Annotation

Where Our CV Annotations Are Used

Autonomous Driving

Medical Imaging

Retail Intelligence

Manufacturing QC

Surveillance & Security

Agriculture & Earth Observation

CV-Specific Quality Controls

Gold Set Calibration

Multi-Reviewer Pipeline (L1→L2→L3)

Per-Class Metrics Tracking

Automated Consistency Checks

Drift Detection & Alerts

Edge Case Libraries

Export Formats & Tool Integration

Export Formats

Tool Compatibility

UTL CV Annotation vs. Typical Providers

Computer Vision Questions

Need Pixel-Perfect Annotations?

LLM & Generative AI Data

Computer Vision

NLP & Document AI

Audio & Speech

Data Collection

Data Curation

Annotation Services

Human-in-the-Loop

Image & Video Annotation Services

Six Core Annotation Capabilities

Bounding Box Annotation

Semantic & Instance Segmentation

Keypoint & Pose Estimation

Video Object Tracking

Image Classification & Tagging

3D Point Cloud & LiDAR Annotation

Where Our CV Annotations Are Used

Autonomous Driving

Medical Imaging

Retail Intelligence

Manufacturing QC

Surveillance & Security

Agriculture & Earth Observation

CV-Specific Quality Controls

Gold Set Calibration

Multi-Reviewer Pipeline (L1→L2→L3)

Per-Class Metrics Tracking

Automated Consistency Checks

Drift Detection & Alerts

Edge Case Libraries

Export Formats & Tool Integration

Export Formats

Tool Compatibility

UTL CV Annotation vs. Typical Providers

Computer Vision Questions

Need Pixel-Perfect Annotations?

Image & Video Annotation
Services