COMPUTER VISION

Image & Video Annotation
Services

Pixel-perfect annotations for object detection, segmentation, tracking, pose estimation, and 3D point cloud labeling — with measurable IoU thresholds, per-class accuracy tracking, and multi-tier QA at every step. From single-class bounding boxes to complex multi-sensor fusion annotation.

5M+
Images Annotated
IoU ≥ 0.92
Avg Quality Score
50K+
Frames/Week Capacity
L1→L2→L3
QA Pipeline
< 2px
Boundary Precision
25+
Automated QA Rules
TASK TYPES

Six Core Annotation Capabilities

Each task type includes specific accuracy benchmarks, throughput rates, and quality controls. All configurable to your project requirements.

2D axis-aligned bounding boxes, oriented (rotated) bounding boxes, and 3D cuboid annotations for object detection across all domains. Configurable overlap policies, occlusion flags, truncation percentage, and difficulty scoring per annotation.

TECHNICAL DETAILS
  • Axis-aligned + oriented (rotated) bounding boxes
  • Occlusion handling: visibility flags (full visible, partially occluded, heavily occluded)
  • Difficulty scoring: easy/medium/hard per annotation for curriculum learning
  • 3D cuboid annotation for LiDAR point clouds and stereo imagery
  • Truncation percentage: 0–100% for objects at image boundaries
  • Multi-class support: unlimited class hierarchy with parent-child relationships
PERFORMANCE
Accuracy
IoU ≥ 0.90 (standard), ≥ 0.95 (precision-critical)
Throughput
2K–8K boxes/annotator/day
Error Rate
< 2% false positives

Pixel-perfect semantic segmentation, instance segmentation with unique object IDs, and panoptic segmentation combining both. Support for 100+ class taxonomies with polygon, brush, superpixel, and SAM-assisted annotation tools.

TECHNICAL DETAILS
  • Semantic segmentation: every pixel assigned a class label
  • Panoptic segmentation: combined stuff (background) + things (countable objects)
  • Brush + superpixel tools for complex organic shapes (medical, agricultural)
  • Instance segmentation: individual object masks with unique IDs
  • Polygon annotation: sub-pixel vertex accuracy < 2px deviation
  • SAM-assisted pre-annotation with human correction and validation
PERFORMANCE
Accuracy
mIoU ≥ 0.88 (standard), ≥ 0.92 (precision)
Throughput
200–500 masks/annotator/day
Boundary
< 2px boundary deviation

Configurable skeleton definitions for human pose (17–133 keypoints), hand tracking (21 points), facial landmarks (68–478 points), and custom articulated objects. Each keypoint includes visibility flags (visible, occluded, out-of-frame) and confidence indicators.

TECHNICAL DETAILS
  • Human pose: COCO 17-point, MPII 16-point, or custom skeleton definitions
  • Facial landmarks: 68-point (dlib), 478-point (MediaPipe), or custom configurations
  • Visibility flags: visible / self-occluded / occluded by other / out-of-frame
  • Hand tracking: 21 keypoints per hand with finger joint angles
  • Animal pose: custom skeletons for dogs, horses, birds (AP-10K compatible)
  • Inter-annotator consistency: OKS (Object Keypoint Similarity) ≥ 0.85
PERFORMANCE
Accuracy
OKS ≥ 0.85
Throughput
500–1.5K poses/annotator/day
Visibility
3-level flags per keypoint

Multi-object tracking (MOT) with persistent IDs maintained through occlusions, re-entries, and camera transitions. Keyframe annotation with linear/spline interpolation and manual correction. Support for single-object tracking (SOT), MOT, and multi-camera cross-view tracking.

TECHNICAL DETAILS
  • Persistent object IDs: maintained through occlusions up to 60+ frames
  • Re-identification: same object re-entering frame receives original ID
  • Temporal action boundaries: activity start/end with ±1 frame precision
  • Keyframe annotation + interpolation: linear and spline with human correction
  • Multi-camera tracking: cross-view identity linking with shared ID namespace
  • Track-level attributes: object class, behavior state, direction, speed estimate
PERFORMANCE
Accuracy
MOTA ≥ 0.90, IDF1 ≥ 0.85
Id Switches
≤ 0.5% per sequence
Throughput
500–2K video-min/week

Single-label, multi-label, and hierarchical classification with configurable confidence thresholds. Support for fine-grained recognition (breed identification, species classification), quality assessment (defect grading), and content moderation across millions of images.

TECHNICAL DETAILS
  • Single-label classification with top-1 and top-3 predictions
  • Hierarchical taxonomy: parent-child class relationships with inheritance rules
  • Quality/condition assessment: 5-point Likert scales for subjective attributes
  • Multi-label tagging: unlimited tags per image with confidence scores
  • Fine-grained recognition: 500+ sub-categories within a domain
  • Active learning integration: model-confidence-based sample routing
PERFORMANCE
Accuracy
≥ 97% top-1 classification accuracy
Throughput
5K–20K images/annotator/day
Agreement
κ ≥ 0.90

3D bounding cuboid annotation in LiDAR point clouds with heading angle, velocity estimation, and multi-frame tracking. Semantic segmentation of point clouds, lane/road boundary marking, and sensor fusion annotation linking LiDAR to camera imagery.

TECHNICAL DETAILS
  • 3D bounding cuboids: position, dimensions, heading angle (yaw, pitch, roll)
  • Lane and road boundary annotation in 3D space
  • Sensor fusion: linked annotations across LiDAR, camera, radar, and IMU
  • Point-level semantic segmentation: ground, vehicle, pedestrian, vegetation, etc.
  • Multi-frame tracking: 3D cuboid trajectories with velocity and acceleration estimates
  • Timestamp synchronization: ≤ 10ms alignment across sensor modalities
PERFORMANCE
Accuracy
3D IoU ≥ 0.70 (standard), heading error < 5°
Throughput
200–500 frames/annotator/day
Fusion
Multi-sensor linked annotations
Industries

Where Our CV Annotations Are Used

Domain-specific annotation protocols for safety-critical and high-accuracy applications across six major verticals.

Autonomous Driving

LiDAR-camera fusion, 3D cuboid tracking, lane detection, traffic sign/light recognition, pedestrian tracking, and edge-case scenario annotation across ODD (Operational Design Domain) coverage matrices.

3D cuboid + tracking Multi-sensor fusion ODD coverage tracking ISO 21448 (SOTIF) aligned

Medical Imaging

DICOM annotation for CT, MRI, X-ray, histopathology WSI, and ophthalmology. Organ segmentation, lesion classification, landmark detection, and measurement by board-certified radiologists and pathologists.

Board-certified annotators Dice ≥ 0.90 HIPAA/IRB compliant DICOM-SEG output

Retail Intelligence

Product recognition, shelf compliance analysis, planogram verification, visual search annotation, customer behavior tracking, and inventory management for retail AI systems.

SKU-level taxonomy Planogram compliance Behavior tracking Multi-camera linking

Manufacturing QC

Surface defect detection, assembly verification, weld inspection, dimensional compliance, and quality grading for industrial quality control on high-speed production lines.

Defect taxonomy (ISO 9001) Sub-mm precision Rare defect focus Production-line sync

Surveillance & Security

Person detection, action recognition, anomaly detection, crowd analysis, license plate recognition, and multi-camera tracking with privacy-compliant annotation workflows.

Privacy-first (face blur) Multi-camera ReID Action recognition 24/7 temporal coverage

Agriculture & Earth Observation

Drone and satellite imagery annotation for crop health monitoring, pest detection, land use classification, yield estimation, and infrastructure monitoring across growing seasons.

Multispectral annotation BBCH growth stages Disease severity (0–9) Temporal tracking
Quality

CV-Specific Quality Controls

Computer vision annotation demands pixel-level precision. Here's how we maintain it across large teams and complex projects.

Gold Set Calibration

IoU threshold validation against expert-labeled gold sets. Annotators must achieve 0.85+ IoU on the gold set before touching production data. Gold sets refreshed 10% monthly to prevent memorization.

IoU ≥ 0.85 on gold set required for production access

Multi-Reviewer Pipeline (L1→L2→L3)

L1 annotators produce initial labels. L2 reviewers audit 100% of output (not sampling). L3 adjudicators resolve disagreements and edge cases. Complex annotations always pass through at least two sets of eyes.

100% L2 review coverage (no sampling)

Per-Class Metrics Tracking

We track accuracy, precision, recall, and IoU per class — not just aggregate metrics. Rare but critical classes (pedestrians, small objects, defects) get extra QA attention and dedicated review queues.

Per-class IoU, precision, recall dashboards updated hourly

Automated Consistency Checks

Rule-based validation catches common errors: overlapping bounding boxes, missing labels, impossible polygon shapes, label-class mismatches, and boundary violations. Errors flagged before human review.

25+ automated validation rules per project

Drift Detection & Alerts

Statistical monitoring across batches detects quality drift before it impacts your model. Batch-over-batch IoU, accuracy, and error-type distributions are compared. Automatic alerts trigger recalibration when drift exceeds ±2%.

Drift alerts at ±2% threshold, recalibration within 4 hours

Edge Case Libraries

Growing libraries of ambiguous and edge-case examples with documented resolution decisions. Used for annotator training, guideline refinement, and quality audit. Every edge case becomes a reusable training asset.

100+ documented edge cases per mature project
COMPATIBILITY

Export Formats & Tool Integration

We deliver data in any format your training pipeline needs, and work with the tools you already use.

Export Formats

COCO JSON
Pascal VOC XML
YOLO TXT
TFRecord
Cityscapes
KITTI
MOT Challenge
NiFTI/NRRD
LabelMe JSON
Custom Schema

Tool Compatibility

CVAT
Label Studio
Labelbox
V7
Supervisely
Roboflow
VGG Image Annotator
Custom Platforms
COMPARISON

UTL CV Annotation vs. Typical Providers

Capability UTL Data Engine Typical Providers
Gold set calibration with monthly refresh Initial only
100% L2 review (not sampling) 5–10% sampling
Per-class IoU/precision/recall tracking Aggregate only
3D cuboid + multi-sensor fusion 2D only
Automated consistency validation (25+ rules) Basic checks
Drift detection with auto-alerts
Edge case libraries with decision docs
SAM-assisted pre-annotation
Domain-trained annotators (20+ hr onboarding) 2–4 hr training
Sub-pixel polygon accuracy (< 2px) 5–10px typical
“UTL reduced our annotation rework by over 50%. Their gold set calibration and per-class IoU tracking caught quality issues that our previous vendor missed entirely. The 100% L2 review coverage is what makes the difference — no more sampling-based QA surprises.”
ML Engineering Lead
Series B Autonomous Vehicle Company
FAQS

Computer Vision Questions

Default targets: IoU ≥ 0.90 for bounding boxes, mIoU ≥ 0.88 for segmentation, OKS ≥ 0.85 for keypoints, 3D IoU ≥ 0.70 for cuboids. For precision-critical projects (medical, AV), we configure higher thresholds (≥ 0.95 for 2D, ≥ 0.92 for segmentation). All thresholds are agreed during scoping and validated against gold sets.
Yes. We annotate LiDAR point clouds with 3D cuboids, point-level segmentation, and trajectory tracking. Multi-sensor fusion links annotations across LiDAR, camera, radar, and IMU with ≤ 10ms timestamp synchronization. Our teams are trained on common AV sensor configurations.
Steady-state: 500–2K video-minutes/week per project with MOTA ≥ 0.90 tracking accuracy. For burst projects, we scale to 5K+ video-minutes/week with parallel annotator teams. Multi-object tracking, re-identification, and temporal action segmentation are all supported.
We build edge-case libraries during guideline creation (100+ documented cases per mature project) and maintain a structured decision taxonomy. Occluded objects receive visibility flags (fully visible, partially occluded, heavily occluded) and truncation percentages. New edge cases discovered during production are documented and added to the library with resolution decisions.
Yes. We use Segment Anything Model (SAM) for pre-annotation to accelerate segmentation tasks. Human annotators review, correct, and validate all SAM-generated masks. This hybrid approach typically delivers 2–3× throughput improvement while maintaining our quality standards.
Scoping + guideline design: 3–5 days. Team assembly + calibration: 5–7 days. Pilot (1K–5K samples): 5–10 days. First labeled batch by Day 20. Full production velocity by Day 25. We maintain pre-qualified teams across major domains for faster ramp-up.

Need Pixel-Perfect Annotations?

Let's discuss your computer vision data pipeline — from task design to quality-assured delivery. We'll scope a pilot within 48 hours.