Path Planning API Reference

This page documents ManipulaPy.path_planning, the module for optimized trajectory generation with adaptive GPU/CPU execution, CUDA acceleration, and collision avoidance.

Tip

For conceptual explanations, see Trajectory Planning User Guide.

Quick Navigation

OptimizedTrajectoryPlanning Class

TrajectoryPlanning Class

Utility Functions

CPU Optimization Functions

Usage Examples

Optimized Trajectory Planning:

from ManipulaPy.path_planning import OptimizedTrajectoryPlanning
from ManipulaPy.urdf_processor import URDFToSerialManipulator

# Setup with optimization options
processor = URDFToSerialManipulator("robot.urdf")
planner = OptimizedTrajectoryPlanning(
    processor.serial_manipulator,
    "robot.urdf",
    processor.dynamics,
    joint_limits=[(-np.pi, np.pi)] * 6,
    use_cuda=None,  # Auto-detect
    cuda_threshold=100,  # Use GPU for problems > 100*6 = 600 elements
    memory_pool_size_mb=256,  # 256MB GPU memory pool
    enable_profiling=True
)

# Generate trajectory with automatic GPU/CPU selection
trajectory = planner.joint_trajectory(
    thetastart=[0, 0, 0, 0, 0, 0],
    thetaend=[0.5, -0.3, 0.8, 0.1, -0.2, 0.4],
    Tf=3.0,
    N=2000,  # Large enough to trigger GPU
    method=5
)

# Check performance stats
stats = planner.get_performance_stats()
print(f"GPU usage: {stats['gpu_usage_percent']:.1f}%")
print(f"Average GPU time: {stats['avg_gpu_time']*1000:.2f}ms")

Factory Function Usage:

from ManipulaPy.path_planning import create_optimized_planner

# Create planner with auto-optimized settings
planner = create_optimized_planner(
    serial_manipulator=processor.serial_manipulator,
    urdf_path="robot.urdf",
    dynamics=processor.dynamics,
    joint_limits=[(-np.pi, np.pi)] * 6,
    gpu_memory_mb=512,
    enable_profiling=False
)

Batch Trajectory Processing:

# Generate 10 trajectories simultaneously
batch_size = 10
num_joints = 6

starts = np.random.uniform(-np.pi, np.pi, (batch_size, num_joints))
ends = np.random.uniform(-np.pi, np.pi, (batch_size, num_joints))

# Process all trajectories in parallel on GPU
batch_trajectories = planner.batch_joint_trajectory(
    thetastart_batch=starts,
    thetaend_batch=ends,
    Tf=2.0,
    N=500,
    method=3
)

print(f"Batch shape: {batch_trajectories['positions'].shape}")  # (10, 500, 6)

Performance Comparison:

from ManipulaPy.path_planning import compare_implementations

# Compare CPU vs GPU performance
results = compare_implementations(
    serial_manipulator=processor.serial_manipulator,
    urdf_path="robot.urdf",
    dynamics=processor.dynamics,
    joint_limits=[(-np.pi, np.pi)] * 6,
    test_params={"N": 5000, "Tf": 3.0, "method": 5}
)

print(f"CPU time: {results['cpu']['time']:.4f}s")
if results['gpu']['available']:
    print(f"GPU time: {results['gpu']['time']:.4f}s")
    print(f"Speedup: {results['gpu']['speedup']:.2f}x")
    print(f"Max position difference: {results['accuracy']['max_pos_diff']:.2e}")

Advanced Dynamics Integration:

# Generate trajectory
trajectory = planner.joint_trajectory(
    thetastart=[0, 0, 0, 0, 0, 0],
    thetaend=[1.0, -0.5, 0.8, 0.2, -0.3, 0.6],
    Tf=4.0,
    N=1000,
    method=5
)

# Compute required torques with GPU acceleration
torques = planner.inverse_dynamics_trajectory(
    trajectory["positions"],
    trajectory["velocities"],
    trajectory["accelerations"],
    gravity_vector=[0, 0, -9.81],
    Ftip=[0, 0, -10, 0, 0, 0]  # 10N downward force
)

# Forward dynamics simulation
simulation = planner.forward_dynamics_trajectory(
    thetalist=[0, 0, 0, 0, 0, 0],
    dthetalist=[0, 0, 0, 0, 0, 0],
    taumat=torques,
    g=[0, 0, -9.81],
    Ftipmat=np.zeros((1000, 6)),
    dt=0.004,
    intRes=5
)

Performance Monitoring:

# Run multiple operations
for i in range(5):
    trajectory = planner.joint_trajectory(
        thetastart=np.random.uniform(-1, 1, 6),
        thetaend=np.random.uniform(-1, 1, 6),
        Tf=2.0,
        N=1000,
        method=3
    )

# Analyze performance
stats = planner.get_performance_stats()
print(f"Total operations: {stats['gpu_calls'] + stats['cpu_calls']}")
print(f"GPU efficiency: {stats['gpu_usage_percent']:.1f}%")
print(f"Memory transfers: {stats['memory_transfers']}")

# Benchmark different problem sizes
benchmark_results = planner.benchmark_performance([
    {"N": 100, "joints": 6, "name": "Small"},
    {"N": 2000, "joints": 6, "name": "Large"},
    {"N": 1000, "joints": 12, "name": "Many joints"},
])

for name, result in benchmark_results.items():
    print(f"{name}: {result['total_time']:.4f}s (GPU: {result['used_gpu']})")

Memory Management:

# Clean up GPU resources explicitly
planner.cleanup_gpu_memory()

# Reset performance stats
planner.reset_performance_stats()

Cartesian Space Planning with Hybrid Computation:

# Define start and end poses
T_start = np.eye(4)
T_start[:3, 3] = [0.5, 0.2, 0.3]

T_end = np.eye(4)
T_end[:3, 3] = [0.3, 0.4, 0.5]
# Add rotation
from ManipulaPy.utils import rotation_matrix_z
T_end[:3, :3] = rotation_matrix_z(np.pi/3)

# Generate Cartesian trajectory (GPU for derivatives, CPU for orientations)
cart_traj = planner.cartesian_trajectory(
    Xstart=T_start,
    Xend=T_end,
    Tf=3.0,
    N=1000,
    method=5
)

# Visualize
planner.plot_cartesian_trajectory(cart_traj, Tf=3.0)

Adaptive Threshold Tuning:

# The planner automatically adapts thresholds based on performance
initial_threshold = planner.cpu_threshold

# Run several operations
for _ in range(10):
    trajectory = planner.joint_trajectory(
        thetastart=np.random.uniform(-1, 1, 6),
        thetaend=np.random.uniform(-1, 1, 6),
        Tf=2.0,
        N=800,  # Around threshold
        method=3
    )

# Check if threshold adapted
final_threshold = planner.cpu_threshold
print(f"Threshold adapted: {initial_threshold} -> {final_threshold}")

Key Features

Adaptive Execution:
  • Automatic GPU/CPU selection based on problem size

  • Intelligent threshold adaptation based on performance history

  • Graceful fallback on GPU errors or memory constraints

CUDA Acceleration:
  • 2D parallelized kernels for optimal GPU utilization

  • Shared memory optimization for time-scaling computations

  • Memory pooling to reduce allocation overhead

  • Pinned memory transfers for maximum PCIe bandwidth

Memory Management:
  • Per-instance GPU array caching

  • Global memory pool with automatic cleanup

  • Configurable memory pool sizes

  • Explicit memory management controls

Performance Monitoring:
  • Detailed timing statistics for GPU vs CPU usage

  • Automatic performance tracking and analysis

  • Built-in benchmarking capabilities

  • Adaptive threshold tuning based on empirical performance

Batch Processing:
  • 3D parallelized batch trajectory generation

  • Efficient memory management for large batches

  • Automatic load balancing across GPU cores

Robust Operation:
  • Comprehensive error handling with fallbacks

  • Automatic recovery from GPU memory issues

  • Extensive logging for debugging and optimization

Performance Characteristics

GPU Acceleration Thresholds:
  • Small problems (N × joints < 200): CPU (lower overhead)

  • Medium problems (200 ≤ N × joints < 5000): Adaptive selection

  • Large problems (N × joints ≥ 5000): GPU preferred

Memory Usage:
  • Joint trajectory: ~12 bytes per (time, joint) element

  • Cartesian trajectory: ~36 bytes per time step

  • Dynamics computation: ~20 bytes per (time, joint) element

Typical Speedups (on modern GPU):
  • Joint trajectories: 5-20× for N > 1000

  • Batch processing: 10-50× for large batches

  • Dynamics computation: 3-15× for long trajectories

Configuration Guidelines

GPU Memory Pool Sizing:
  • Small robots (≤6 DOF): 128-256 MB

  • Medium robots (7-12 DOF): 256-512 MB

  • Large robots (>12 DOF): 512+ MB

CUDA Threshold Tuning:
  • High-end GPUs: Lower threshold (50-100)

  • Mid-range GPUs: Medium threshold (100-200)

  • Integrated GPUs: Higher threshold (200-500)

Batch Size Recommendations:
  • Memory-limited: batch_size ≤ 50

  • Compute-limited: batch_size ≤ 200

  • High-memory systems: batch_size ≤ 1000

See Also