Vision User Guide๏ƒ

This comprehensive guide covers the Vision module in ManipulaPy, which provides advanced computer vision capabilities for robotic perception, including stereo vision, object detection, and PyBullet integration.

Overview๏ƒ

The Vision module is a unified computer vision system that brings together:

  • Monocular and stereo camera support with flexible configuration

  • YOLO-based object detection for real-time obstacle identification

  • PyBullet virtual cameras with interactive debugging sliders

  • Stereo vision pipeline for 3D reconstruction and depth estimation

  • Camera calibration utilities for precise geometric measurements

๐Ÿ“ท

Multi-Camera Support

Configure multiple cameras with individual intrinsics, extrinsics, and distortion parameters

๐Ÿค–

YOLO Integration

Real-time object detection with YOLOv8 for robust obstacle identification

๐ŸŽฎ

PyBullet Debug

Interactive virtual cameras with real-time parameter adjustment

๐Ÿ‘๏ธ

Stereo Vision

Complete stereo pipeline from rectification to 3D point cloud generation

Getting Started๏ƒ

Basic Camera Setup๏ƒ

The simplest way to start with the Vision module:

from ManipulaPy.vision import Vision
import numpy as np

# Create a basic vision system with default settings
vision = Vision()

# Capture an image (requires PyBullet environment)
rgb_image, depth_image = vision.capture_image()

print(f"๐Ÿ“ธ Captured RGB image: {rgb_image.shape}")
print(f"๐Ÿ“ Captured depth image: {depth_image.shape}")

Custom Camera Configuration๏ƒ

For more control, configure your cameras explicitly:

# Define camera parameters
camera_config = {
    "name": "workspace_camera",
    "translation": [0.0, 0.0, 1.0],  # 1 meter above workspace
    "rotation": [0, 45, 0],           # Look down at 45 degrees
    "fov": 60,                        # Field of view in degrees
    "near": 0.1,                      # Near clipping plane
    "far": 10.0,                      # Far clipping plane
    "intrinsic_matrix": np.array([
        [500, 0, 320],    # fx, 0, cx
        [0, 500, 240],    # 0, fy, cy
        [0, 0, 1]         # 0, 0, 1
    ], dtype=np.float32),
    "distortion_coeffs": np.zeros(5, dtype=np.float32),  # k1,k2,p1,p2,k3
    "use_opencv": False,              # Use PyBullet cameras
    "device_index": 0                 # Camera device index
}

# Create vision system with custom configuration
vision = Vision(camera_configs=[camera_config])

Core Features๏ƒ

Object Detection with YOLO๏ƒ

The Vision module integrates YOLOv8 for robust object detection:

# Capture images
rgb_image, depth_image = vision.capture_image(camera_index=0)

# Detect obstacles with 3D positioning
obstacle_positions, orientations = vision.detect_obstacles(
    depth_image=depth_image,
    rgb_image=rgb_image,
    depth_threshold=5.0,    # Only consider objects within 5 meters
    camera_index=0,
    step=2                  # Depth sampling step for efficiency
)

# Process detected obstacles
print(f"๐Ÿ” Detected {len(obstacle_positions)} obstacles")

for i, (pos, orientation) in enumerate(zip(obstacle_positions, orientations)):
    print(f"Obstacle {i+1}:")
    print(f"  ๐Ÿ“ Position: [{pos[0]:.2f}, {pos[1]:.2f}, {pos[2]:.2f}] meters")
    print(f"  ๐Ÿงญ Orientation: {orientation:.1f} degrees")

The object detection pipeline:

  1. YOLO Detection: Identifies objects in RGB images with bounding boxes

  2. Depth Analysis: Uses depth information within bounding boxes

  3. 3D Positioning: Converts 2D detections to 3D world coordinates

  4. Orientation Estimation: Computes object orientation in the XY plane

PyBullet Virtual Cameras๏ƒ

For simulation and debugging, use PyBulletโ€™s virtual cameras:

# Create an interactive debug camera system
debug_vision = Vision(
    use_pybullet_debug=True,    # Enable PyBullet debug sliders
    show_plot=True              # Display camera feed in matplotlib
)

# The debug interface provides real-time sliders for:
# - Camera position (target_x, target_y, target_z)
# - Camera orientation (yaw, pitch, roll)
# - View parameters (distance, up axis)
# - Projection settings (width, height, FOV, near/far planes)

Debug Interface Features:

  • Real-time parameter adjustment via PyBullet GUI sliders

  • Live camera feed displayed in matplotlib window

  • Matrix visualization for view and projection matrices

  • Interactive positioning for optimal camera placement

Stereo Vision Pipeline๏ƒ

For 3D reconstruction, configure a stereo camera pair:

# Configure left camera
left_camera_config = {
    "name": "left_camera",
    "translation": [0.0, 0.0, 0.5],
    "rotation": [0, 0, 0],
    "intrinsic_matrix": np.array([
        [600, 0, 320],
        [0, 600, 240],
        [0, 0, 1]
    ], dtype=np.float32),
    "distortion_coeffs": np.zeros(5, dtype=np.float32)
}

# Configure right camera (10cm baseline)
right_camera_config = left_camera_config.copy()
right_camera_config["name"] = "right_camera"
right_camera_config["translation"] = [0.1, 0.0, 0.5]  # 10cm to the right

# Create stereo vision system
stereo_vision = Vision(stereo_configs=(left_camera_config, right_camera_config))

# Compute rectification maps (do this once)
stereo_vision.compute_stereo_rectification_maps(image_size=(640, 480))

# Capture stereo images
left_image, _ = stereo_vision.capture_image(0)  # Left camera
right_image, _ = stereo_vision.capture_image(1) # Right camera

# Process stereo pipeline
left_rect, right_rect = stereo_vision.rectify_stereo_images(left_image, right_image)
disparity_map = stereo_vision.compute_disparity(left_rect, right_rect)
point_cloud = stereo_vision.disparity_to_pointcloud(disparity_map)

print(f"๐ŸŒ Generated point cloud with {len(point_cloud)} 3D points")

Stereo Pipeline Steps:

  1. Image Rectification: Align stereo images for disparity computation

  2. Disparity Calculation: Use StereoSGBM for robust disparity estimation

  3. 3D Reconstruction: Convert disparity to 3D points using camera geometry

  4. Point Cloud Filtering: Remove invalid and distant points

Advanced Usage๏ƒ

Multiple Camera Systems๏ƒ

Configure and manage multiple cameras simultaneously:

# Define multiple camera configurations
camera_configs = [
    {  # Overview camera
        "name": "overview_camera",
        "translation": [0, 0, 2.0],
        "rotation": [0, 90, 0],     # Look straight down
        "fov": 80,
        "intrinsic_matrix": np.array([[400, 0, 320], [0, 400, 240], [0, 0, 1]], dtype=np.float32),
        "distortion_coeffs": np.zeros(5, dtype=np.float32)
    },
    {  # Side view camera
        "name": "side_camera",
        "translation": [1.0, 0, 0.5],
        "rotation": [0, 0, 90],     # Look sideways
        "fov": 60,
        "intrinsic_matrix": np.array([[500, 0, 320], [0, 500, 240], [0, 0, 1]], dtype=np.float32),
        "distortion_coeffs": np.zeros(5, dtype=np.float32)
    }
]

# Create multi-camera vision system
multi_vision = Vision(camera_configs=camera_configs)

# Capture from different cameras
overview_rgb, overview_depth = multi_vision.capture_image(camera_index=0)
side_rgb, side_depth = multi_vision.capture_image(camera_index=1)

# Detect obstacles from multiple viewpoints
obstacles_overview, _ = multi_vision.detect_obstacles(overview_depth, overview_rgb, camera_index=0)
obstacles_side, _ = multi_vision.detect_obstacles(side_depth, side_rgb, camera_index=1)

print(f"๐Ÿ“ท Overview camera detected {len(obstacles_overview)} obstacles")
print(f"๐Ÿ“ท Side camera detected {len(obstacles_side)} obstacles")

OpenCV Camera Integration๏ƒ

Use real hardware cameras with OpenCV:

# Configure real camera with OpenCV
real_camera_config = {
    "name": "usb_camera",
    "translation": [0, 0, 0],
    "rotation": [0, 0, 0],
    "fov": 60,
    "intrinsic_matrix": np.array([
        [800, 0, 320],    # Values from camera calibration
        [0, 800, 240],
        [0, 0, 1]
    ], dtype=np.float32),
    "distortion_coeffs": np.array([-0.1, 0.05, 0, 0, 0], dtype=np.float32),  # From calibration
    "use_opencv": True,        # Enable OpenCV capture
    "device_index": 0          # USB camera device ID
}

# Create vision system with real camera
real_vision = Vision(camera_configs=[real_camera_config])

# Note: capture_image() will use OpenCV for image acquisition
# when use_opencv=True

Camera Calibration Parameters๏ƒ

Understanding the camera configuration parameters:

# Intrinsic matrix format:
# [fx  0  cx]
# [0  fy  cy]
# [0   0   1]
#
# Where:
# fx, fy = focal lengths in pixels
# cx, cy = principal point (image center) in pixels

intrinsic_matrix = np.array([
    [focal_x, 0, center_x],
    [0, focal_y, center_y],
    [0, 0, 1]
], dtype=np.float32)

# Distortion coefficients: [k1, k2, p1, p2, k3]
# k1, k2, k3 = radial distortion coefficients
# p1, p2 = tangential distortion coefficients
distortion_coeffs = np.array([k1, k2, p1, p2, k3], dtype=np.float32)

# Extrinsic parameters (pose in world coordinates):
# translation = [x, y, z] position in meters
# rotation = [roll, pitch, yaw] in degrees

Performance Optimization๏ƒ

Memory Management๏ƒ

# For long-running applications, manage resources carefully
vision = Vision(camera_configs=configs)

try:
    while True:
        # Capture and process images
        rgb, depth = vision.capture_image()
        obstacles, _ = vision.detect_obstacles(depth, rgb)

        # Process obstacles...

        # Clean up large arrays if needed
        del rgb, depth

finally:
    # Always release resources
    vision.release()

Efficient Object Detection๏ƒ

# Optimize detection parameters for performance
obstacles, orientations = vision.detect_obstacles(
    depth_image=depth,
    rgb_image=rgb,
    depth_threshold=3.0,     # Limit detection range
    camera_index=0,
    step=4                   # Increase step size for speed (lower accuracy)
)

# For real-time applications, consider:
# - Reducing image resolution
# - Increasing step size
# - Limiting depth threshold
# - Processing every nth frame

Error Handling and Debugging๏ƒ

Robust Error Handling๏ƒ

try:
    # Create vision system
    vision = Vision(camera_configs=configs)

    # Attempt image capture
    rgb, depth = vision.capture_image(camera_index=0)

    if rgb is None or depth is None:
        print("โŒ Failed to capture images")
        return

    # Attempt object detection
    obstacles, orientations = vision.detect_obstacles(depth, rgb)

    if len(obstacles) == 0:
        print("โš ๏ธ No obstacles detected")
    else:
        print(f"โœ… Detected {len(obstacles)} obstacles")

except RuntimeError as e:
    print(f"โŒ Vision system error: {e}")
except Exception as e:
    print(f"โŒ Unexpected error: {e}")
finally:
    if 'vision' in locals():
        vision.release()

Debugging Tips๏ƒ

# Enable debug logging
import logging
logging.basicConfig(level=logging.DEBUG)

# Create vision with detailed logging
vision = Vision(camera_configs=configs, logger_name="DebugVision")

# Check YOLO model status
if vision.yolo_model is None:
    print("โš ๏ธ YOLO model not loaded - object detection disabled")
else:
    print("โœ… YOLO model loaded successfully")

# Verify camera configuration
for idx, camera in vision.cameras.items():
    print(f"๐Ÿ“ท Camera {idx}: {camera['name']}")
    print(f"   Position: {camera['translation']}")
    print(f"   Rotation: {camera['rotation']}")
    print(f"   FOV: {camera['fov']}ยฐ")

Common Issues and Solutions๏ƒ

Issue: No objects detected by YOLO

# Solutions:
# 1. Check if YOLO model loaded properly
if vision.yolo_model is None:
    print("Install ultralytics: pip install ultralytics")

# 2. Verify image quality
rgb, depth = vision.capture_image()
if rgb.max() == 0:
    print("Image is completely black - check lighting/camera")

# 3. Adjust detection confidence
# Lower confidence threshold in detect_obstacles()

Issue: Poor stereo reconstruction

# Solutions:
# 1. Ensure proper camera calibration
# 2. Check baseline distance (should be 5-15% of working distance)
# 3. Verify image rectification quality

left_rect, right_rect = vision.rectify_stereo_images(left, right)
# Rectified images should be aligned horizontally

Issue: Inaccurate 3D positions

# Solutions:
# 1. Calibrate intrinsic matrix precisely
# 2. Verify depth image scaling
# 3. Check coordinate frame conventions

# Debug depth values
print(f"Depth range: {depth.min():.3f} - {depth.max():.3f}")
print(f"Near/far planes: {camera['near']} - {camera['far']}")

Real-World Applications๏ƒ

Robot Navigation๏ƒ

from ManipulaPy.path_planning import TrajectoryPlanning

# Integrated obstacle detection for path planning
def safe_navigation():
    # Detect current obstacles
    rgb, depth = vision.capture_image()
    obstacles, _ = vision.detect_obstacles(depth, rgb, depth_threshold=2.0)

    # Update robot's environmental model
    planner = TrajectoryPlanning(robot, urdf_file, dynamics, joint_limits)

    # Plan collision-free trajectory
    safe_trajectory = planner.joint_trajectory(
        thetastart=current_position,
        thetaend=target_position,
        Tf=5.0,
        N=100,
        method=5
    )

    return safe_trajectory, obstacles

Pick and Place Operations๏ƒ

def pick_and_place_with_vision():
    # Detect objects in workspace
    rgb, depth = vision.capture_image(camera_index=0)  # Overhead camera
    objects, orientations = vision.detect_obstacles(depth, rgb, depth_threshold=1.0)

    if len(objects) == 0:
        print("No objects found to pick")
        return

    # Select closest object
    closest_idx = np.argmin([np.linalg.norm(obj) for obj in objects])
    target_object = objects[closest_idx]
    target_orientation = orientations[closest_idx]

    print(f"๐ŸŽฏ Targeting object at: {target_object}")
    print(f"๐Ÿงญ Object orientation: {target_orientation:.1f}ยฐ")

    # Plan approach trajectory
    # ... (integrate with kinematics and planning)

Best Practices๏ƒ

  1. Camera Placement - Position cameras for optimal workspace coverage - Avoid backlighting and reflective surfaces - Ensure sufficient lighting for object detection

  2. Calibration - Use high-quality calibration patterns (checkerboards) - Capture calibration images from multiple angles - Verify calibration accuracy before deployment

  3. Performance - Choose appropriate image resolutions for your application - Balance detection accuracy with processing speed - Use temporal filtering for stable object tracking

  4. Robustness - Implement proper error handling for all vision operations - Use multiple cameras for redundancy when possible - Validate detection results before using in control loops

  5. Integration - Coordinate vision frame rates with control loop timing - Transform coordinates to robot base frame consistently - Use vision confidence scores in decision making

See Also๏ƒ