Path Planning API Reference
This page documents ManipulaPy.path_planning, the module for optimized trajectory generation with adaptive GPU/CPU execution, CUDA acceleration, and collision avoidance.
Tip
For conceptual explanations, see Trajectory Planning User Guide.
OptimizedTrajectoryPlanning Class
TrajectoryPlanning Class
Utility Functions
CPU Optimization Functions
Usage Examples
Optimized Trajectory Planning:
from ManipulaPy.path_planning import OptimizedTrajectoryPlanning
from ManipulaPy.urdf_processor import URDFToSerialManipulator
# Setup with optimization options
processor = URDFToSerialManipulator("robot.urdf")
planner = OptimizedTrajectoryPlanning(
processor.serial_manipulator,
"robot.urdf",
processor.dynamics,
joint_limits=[(-np.pi, np.pi)] * 6,
use_cuda=None, # Auto-detect
cuda_threshold=100, # Use GPU for problems > 100*6 = 600 elements
memory_pool_size_mb=256, # 256MB GPU memory pool
enable_profiling=True
)
# Generate trajectory with automatic GPU/CPU selection
trajectory = planner.joint_trajectory(
thetastart=[0, 0, 0, 0, 0, 0],
thetaend=[0.5, -0.3, 0.8, 0.1, -0.2, 0.4],
Tf=3.0,
N=2000, # Large enough to trigger GPU
method=5
)
# Check performance stats
stats = planner.get_performance_stats()
print(f"GPU usage: {stats['gpu_usage_percent']:.1f}%")
print(f"Average GPU time: {stats['avg_gpu_time']*1000:.2f}ms")
Factory Function Usage:
from ManipulaPy.path_planning import create_optimized_planner
# Create planner with auto-optimized settings
planner = create_optimized_planner(
serial_manipulator=processor.serial_manipulator,
urdf_path="robot.urdf",
dynamics=processor.dynamics,
joint_limits=[(-np.pi, np.pi)] * 6,
gpu_memory_mb=512,
enable_profiling=False
)
Batch Trajectory Processing:
# Generate 10 trajectories simultaneously
batch_size = 10
num_joints = 6
starts = np.random.uniform(-np.pi, np.pi, (batch_size, num_joints))
ends = np.random.uniform(-np.pi, np.pi, (batch_size, num_joints))
# Process all trajectories in parallel on GPU
batch_trajectories = planner.batch_joint_trajectory(
thetastart_batch=starts,
thetaend_batch=ends,
Tf=2.0,
N=500,
method=3
)
print(f"Batch shape: {batch_trajectories['positions'].shape}") # (10, 500, 6)
Performance Comparison:
from ManipulaPy.path_planning import compare_implementations
# Compare CPU vs GPU performance
results = compare_implementations(
serial_manipulator=processor.serial_manipulator,
urdf_path="robot.urdf",
dynamics=processor.dynamics,
joint_limits=[(-np.pi, np.pi)] * 6,
test_params={"N": 5000, "Tf": 3.0, "method": 5}
)
print(f"CPU time: {results['cpu']['time']:.4f}s")
if results['gpu']['available']:
print(f"GPU time: {results['gpu']['time']:.4f}s")
print(f"Speedup: {results['gpu']['speedup']:.2f}x")
print(f"Max position difference: {results['accuracy']['max_pos_diff']:.2e}")
Advanced Dynamics Integration:
# Generate trajectory
trajectory = planner.joint_trajectory(
thetastart=[0, 0, 0, 0, 0, 0],
thetaend=[1.0, -0.5, 0.8, 0.2, -0.3, 0.6],
Tf=4.0,
N=1000,
method=5
)
# Compute required torques with GPU acceleration
torques = planner.inverse_dynamics_trajectory(
trajectory["positions"],
trajectory["velocities"],
trajectory["accelerations"],
gravity_vector=[0, 0, -9.81],
Ftip=[0, 0, -10, 0, 0, 0] # 10N downward force
)
# Forward dynamics simulation
simulation = planner.forward_dynamics_trajectory(
thetalist=[0, 0, 0, 0, 0, 0],
dthetalist=[0, 0, 0, 0, 0, 0],
taumat=torques,
g=[0, 0, -9.81],
Ftipmat=np.zeros((1000, 6)),
dt=0.004,
intRes=5
)
Performance Monitoring:
# Run multiple operations
for i in range(5):
trajectory = planner.joint_trajectory(
thetastart=np.random.uniform(-1, 1, 6),
thetaend=np.random.uniform(-1, 1, 6),
Tf=2.0,
N=1000,
method=3
)
# Analyze performance
stats = planner.get_performance_stats()
print(f"Total operations: {stats['gpu_calls'] + stats['cpu_calls']}")
print(f"GPU efficiency: {stats['gpu_usage_percent']:.1f}%")
print(f"Memory transfers: {stats['memory_transfers']}")
# Benchmark different problem sizes
benchmark_results = planner.benchmark_performance([
{"N": 100, "joints": 6, "name": "Small"},
{"N": 2000, "joints": 6, "name": "Large"},
{"N": 1000, "joints": 12, "name": "Many joints"},
])
for name, result in benchmark_results.items():
print(f"{name}: {result['total_time']:.4f}s (GPU: {result['used_gpu']})")
Memory Management:
# Clean up GPU resources explicitly
planner.cleanup_gpu_memory()
# Reset performance stats
planner.reset_performance_stats()
Cartesian Space Planning with Hybrid Computation:
# Define start and end poses
T_start = np.eye(4)
T_start[:3, 3] = [0.5, 0.2, 0.3]
T_end = np.eye(4)
T_end[:3, 3] = [0.3, 0.4, 0.5]
# Add rotation
from ManipulaPy.utils import rotation_matrix_z
T_end[:3, :3] = rotation_matrix_z(np.pi/3)
# Generate Cartesian trajectory (GPU for derivatives, CPU for orientations)
cart_traj = planner.cartesian_trajectory(
Xstart=T_start,
Xend=T_end,
Tf=3.0,
N=1000,
method=5
)
# Visualize
planner.plot_cartesian_trajectory(cart_traj, Tf=3.0)
Adaptive Threshold Tuning:
# The planner automatically adapts thresholds based on performance
initial_threshold = planner.cpu_threshold
# Run several operations
for _ in range(10):
trajectory = planner.joint_trajectory(
thetastart=np.random.uniform(-1, 1, 6),
thetaend=np.random.uniform(-1, 1, 6),
Tf=2.0,
N=800, # Around threshold
method=3
)
# Check if threshold adapted
final_threshold = planner.cpu_threshold
print(f"Threshold adapted: {initial_threshold} -> {final_threshold}")
Key Features
- Adaptive Execution:
Automatic GPU/CPU selection based on problem size
Intelligent threshold adaptation based on performance history
Graceful fallback on GPU errors or memory constraints
- CUDA Acceleration:
2D parallelized kernels for optimal GPU utilization
Shared memory optimization for time-scaling computations
Memory pooling to reduce allocation overhead
Pinned memory transfers for maximum PCIe bandwidth
- Memory Management:
Per-instance GPU array caching
Global memory pool with automatic cleanup
Configurable memory pool sizes
Explicit memory management controls
- Performance Monitoring:
Detailed timing statistics for GPU vs CPU usage
Automatic performance tracking and analysis
Built-in benchmarking capabilities
Adaptive threshold tuning based on empirical performance
- Batch Processing:
3D parallelized batch trajectory generation
Efficient memory management for large batches
Automatic load balancing across GPU cores
- Robust Operation:
Comprehensive error handling with fallbacks
Automatic recovery from GPU memory issues
Extensive logging for debugging and optimization
Performance Characteristics
- GPU Acceleration Thresholds:
Small problems (N × joints < 200): CPU (lower overhead)
Medium problems (200 ≤ N × joints < 5000): Adaptive selection
Large problems (N × joints ≥ 5000): GPU preferred
- Memory Usage:
Joint trajectory: ~12 bytes per (time, joint) element
Cartesian trajectory: ~36 bytes per time step
Dynamics computation: ~20 bytes per (time, joint) element
- Typical Speedups (on modern GPU):
Joint trajectories: 5-20× for N > 1000
Batch processing: 10-50× for large batches
Dynamics computation: 3-15× for long trajectories
Configuration Guidelines
- GPU Memory Pool Sizing:
Small robots (≤6 DOF): 128-256 MB
Medium robots (7-12 DOF): 256-512 MB
Large robots (>12 DOF): 512+ MB
- CUDA Threshold Tuning:
High-end GPUs: Lower threshold (50-100)
Mid-range GPUs: Medium threshold (100-200)
Integrated GPUs: Higher threshold (200-500)
- Batch Size Recommendations:
Memory-limited: batch_size ≤ 50
Compute-limited: batch_size ≤ 200
High-memory systems: batch_size ≤ 1000
See Also
CUDA Kernels API Reference – Low-level CUDA acceleration functions
Kinematics API Reference – Forward and inverse kinematics for trajectory execution
Dynamics API Reference – Dynamics computations for torque calculation
Control API Reference – Controllers for trajectory following
Potential Field API Reference – Collision avoidance and path optimization
Optimized Path Planning User Guide – Conceptual overview and planning strategies