DiffCloud: Real-to-sim from Point Clouds with Differentiable Simulation and Rendering of Deformable Objects
Priya Sundaresan, Rika Antonova, Jeannette Bohg
Stanford University
{priyasun, rika.antonova, bohg}@stanford.edu
Abstract
Research in manipulation of deformable objects is typically conducted on a limited range of scenarios, because handling each scenario on hardware takes significant effort. Realistic simulators with support for various types of deformations and interactions have the potential to speed up experimentation with novel tasks and algorithms. However, for highly deformable objects it is challenging to align the output of a simulator with the behavior of real objects. Manual tuning is not intuitive, hence automated methods are needed.
We view this alignment problem as a joint perception-inference challenge and demonstrate how to use recent neural network architectures to successfully perform simulation parameter inference from real point clouds. We analyze the performance of various architectures, comparing their data and training requirements. Furthermore, we propose to leverage differentiable point cloud sampling and differentiable simulation to significantly reduce the time to achieve the alignment. We employ an efficient way to propagate gradients from point clouds to simulated meshes and further through to the physical simulation parameters, such as mass and stiffness. Experiments with highly deformable objects show that our method can achieve comparable or better alignment with real object behavior, while reducing the time needed to achieve this by more than an order of magnitude.
Video
System Overview
Real Data Collection
We use a Kinova Gen3 robot arm to manipulate a deformable object and record multi-view point clouds using two Intel Realsense D435 cameras. Using knowledge of the kinematics and geometry of the robot, we mask out the arm from point cloud frames, capturing only the deformable object of interest. The merged point clouds serve as input to DiffCloud.
Hardware Setup: We visualize the Kinova robot arm and multi-view Realsense cameras on a cloth fold task.
Point Cloud Processing: Given robot masks and depth images, we obtain merged, masked point clouds of the deformable object, shown for the cloth lift task above.
Differentiable Simulation
We use the mesh-based differentiable simulator DiffSim [1] which supports simulation of deformable objects and their interactions with rigid bodies. For each real scenario considered, we set up an analogue in simulation such that the scale of all objects and the trajectories executed are identical. However, the behavior of the deformable object differs between simulation and reality due to physical parameter mismatch, which we hope to align using DiffCloud.
Scenarios considered
Differentiable Point Cloud Sampling + Loss Propagation
During each iteration of optimization, we rollout the simulation with the currently estimated parameters and use a differentiable point cloud sampling procedure as in [2] to convert the mesh states to point clouds. We use a variation of Chamfer Loss to compare the simulated point clouds against real, and propagate the losses to the underlying parameters. This update procedure is repeated until some the maximum number of optimization epochs is exceeded or the loss falls below a threshold, indicating that we have found parameters that best describe the observed point clouds.
Experiments
We evaluate DiffCloud against non-differentiable baselines: MeteorNet, PointNet++, and MLP. All baselines are architectures for processing point clouds or sequences of point clouds. We generate a noise-injected, synthetic dataset of rendered deformable object point clouds from the above scenarios with varied input physical parameters, and train all baselines to regress the parameters given point cloud input.
Real2Sim Qualitative Experiments
We evaluate DiffCloud on performing parameter estimation for the cloth lift and cloth fold scenarios.
We find that DiffCloud is able to find stiffness and mass properties that describe the real point cloud sequences. In the below sequences, DiffCloud correctly infers that the blue polka cloth and black cloth collapse inwards in the lift and fold scenarios, whereas the paper towel and light blue towel retain their shape in both settings.
Lift
Fold
Real RGB
DiffCloud Result
Real Point Cloud
Real RGB
DiffCloud Result
Real Point Cloud
Sim2Sim Qualitative Experiments
We evaluate DiffCloud on parameter estimation for randomly generated target point cloud sequences in the band-stretch and vest-hang scenarios.
DiffCloud learns to find parameters that match a target sequence, using gradient descent starting from an initial guess.
Overlay
Initial Guess
DiffCloud Result
Target
Band Stretch
Low stiffness, high mass
Elastic, taut
Vest Hang
Highly deformable
Shape-retaining
Quantitative Evaluation
DiffCloud achieves lower or comparable loss to non-differentiable baselines requiring substantial training data and hours of train time. Furthermore, the parameters found by DiffCloud intuitively describe real deformable objects.
References
[1] Qiao, Yi-Ling, et al. "Scalable differentiable physics for learning and control." ICML 2020.[2] Ravi, Nikhila, et al. "Accelerating 3d deep learning with pytorch3d." Neural Information Processing Systems WiML Workshop 2020.