Vision API Reference

This page documents ManipulaPy.vision, the module for computer vision capabilities including stereo vision, object detection, camera calibration, and PyBullet integration with optional YOLO object detection.

Tip

For conceptual explanations, see Vision User Guide.

Quick Navigation

Vision Class

Camera Management

Configuration Methods

Extrinsic Matrix Computation

Image Acquisition

PyBullet Integration

Object Detection

YOLO Integration

Stereo Vision Pipeline

Rectification Setup

Image Processing

3D Reconstruction

PyBullet Debug Interface

Debug Slider Setup

Resource Management

Cleanup Methods

Utility Functions

Data Structures and Configuration

Internal Storage Format

Camera Storage (self.cameras):

self.cameras[index] = {
    "name": str,
    "translation": [x, y, z],
    "rotation": [roll_deg, pitch_deg, yaw_deg],
    "fov": float,
    "near": float,
    "far": float,
    "intrinsic_matrix": np.ndarray(3, 3),
    "distortion_coeffs": np.ndarray(5,),
    "use_opencv": bool,
    "device_index": int,
    "extrinsic_matrix": np.ndarray(4, 4)
}

Stereo Configuration Attributes:

# Stereo processing state
self.stereo_enabled: bool
self.left_cam_cfg: dict
self.right_cam_cfg: dict

# Rectification maps
self.left_map_x: np.ndarray
self.left_map_y: np.ndarray
self.right_map_x: np.ndarray
self.right_map_y: np.ndarray

# 3D reconstruction
self.Q: np.ndarray  # 4x4 disparity-to-depth matrix
self.stereo_matcher: cv2.StereoSGBM

Default Camera Configuration

When no camera_configs provided, uses:

default_config = {
    "name": "default_camera",
    "translation": [0, 0, 0],
    "rotation": [0, 0, 0],
    "fov": 60,
    "near": 0.1,
    "far": 5.0,
    "intrinsic_matrix": [[500, 0, 320],
                        [0, 500, 240],
                        [0, 0, 1]],
    "distortion_coeffs": [0, 0, 0, 0, 0],
    "use_opencv": False,
    "device_index": 0
}

Error Handling and Validation

YOLO Model Management

  • Loading Failure: Sets yolo_model = None, continues operation

  • Detection Fallback: Returns empty arrays when YOLO unavailable

  • Input Validation: Checks rgb_image and depth_image for None/invalid

Stereo Processing Errors

  • Configuration Validation: Checks required keys in stereo configs

  • Runtime State Checking: Validates stereo_enabled before operations

  • Map Initialization: Ensures rectification maps computed before use

OpenCV Device Handling

  • Device Access Failure: Raises RuntimeError with descriptive message

  • Capture Validation: Uses cap.isOpened() to verify device accessibility

  • Resource Cleanup: Automatic release in destructor and explicit method

PyBullet Integration Safety

  • Parameter Reading: Safe handling of missing debug parameters

  • Matrix Validation: Debug logging for view/projection matrix inspection

  • Client State: No assumptions about PyBullet connection status

Performance Considerations

Memory Management

  • Image Arrays: Contiguous memory layout for OpenCV operations

  • Rectification Maps: Persistent storage for repeated stereo processing

  • Point Clouds: Filtered arrays to reduce memory footprint

Computational Efficiency

  • YOLO Inference: Single forward pass per image

  • Stereo Matching: StereoSGBM optimized for quality/speed balance

  • Depth Scaling: Vectorized operations for range conversion

Threading Considerations

  • OpenCV Capture: Single-threaded device access

  • YOLO Processing: GPU acceleration when available

  • PyBullet Integration: Main thread simulation access required

See Also

External Dependencies

  • OpenCV – Computer vision algorithms and stereo processing

  • PyBullet – Physics simulation and camera rendering

  • Ultralytics YOLO – Object detection framework

  • NumPy – Numerical array operations

  • Matplotlib – Debug visualization