DiffCloud: Real-to-sim from Point Clouds with Differentiable Simulation and Rendering of Deformable Objects

Priya Sundaresan, Rika Antonova, Jeannette Bohg

Stanford University

{priyasun, rika.antonova, bohg}@stanford.edu

Paper [Link] Code [Link]

Abstract

Research in manipulation of deformable objects is typically conducted on a limited range of scenarios, because handling each scenario on hardware takes significant effort. Realistic simulators with support for various types of deformations and interactions have the potential to speed up experimentation with novel tasks and algorithms. However, for highly deformable objects it is challenging to align the output of a simulator with the behavior of real objects. Manual tuning is not intuitive, hence automated methods are needed.

We view this alignment problem as a joint perception-inference challenge and demonstrate how to use recent neural network architectures to successfully perform simulation parameter inference from real point clouds. We analyze the performance of various architectures, comparing their data and training requirements. Furthermore, we propose to leverage differentiable point cloud sampling and differentiable simulation to significantly reduce the time to achieve the alignment. We employ an efficient way to propagate gradients from point clouds to simulated meshes and further through to the physical simulation parameters, such as mass and stiffness. Experiments with highly deformable objects show that our method can achieve comparable or better alignment with real object behavior, while reducing the time needed to achieve this by more than an order of magnitude.

Video

diffcloud.mp4

System Overview

Real Data Collection

We use a Kinova Gen3 robot arm to manipulate a deformable object and record multi-view point clouds using two Intel Realsense D435 cameras. Using knowledge of the kinematics and geometry of the robot, we mask out the arm from point cloud frames, capturing only the deformable object of interest. The merged point clouds serve as input to DiffCloud.

Hardware Setup: We visualize the Kinova robot arm and multi-view Realsense cameras on a cloth fold task.

Point Cloud Processing: Given robot masks and depth images, we obtain merged, masked point clouds of the deformable object, shown for the cloth lift task above.

Differentiable Simulation

We use the mesh-based differentiable simulator DiffSim [1] which supports simulation of deformable objects and their interactions with rigid bodies. For each real scenario considered, we set up an analogue in simulation such that the scale of all objects and the trajectories executed are identical. However, the behavior of the deformable object differs between simulation and reality due to physical parameter mismatch, which we hope to align using DiffCloud.

Scenarios considered

Differentiable Point Cloud Sampling + Loss Propagation

During each iteration of optimization, we rollout the simulation with the currently estimated parameters and use a differentiable point cloud sampling procedure as in [2] to convert the mesh states to point clouds. We use a variation of Chamfer Loss to compare the simulated point clouds against real, and propagate the losses to the underlying parameters. This update procedure is repeated until some the maximum number of optimization epochs is exceeded or the loss falls below a threshold, indicating that we have found parameters that best describe the observed point clouds.

Experiments

We evaluate DiffCloud against non-differentiable baselines: MeteorNet, PointNet++, and MLP. All baselines are architectures for processing point clouds or sequences of point clouds. We generate a noise-injected, synthetic dataset of rendered deformable object point clouds from the above scenarios with varied input physical parameters, and train all baselines to regress the parameters given point cloud input.

Real2Sim Qualitative Experiments

We evaluate DiffCloud on performing parameter estimation for the cloth lift and cloth fold scenarios.

We find that DiffCloud is able to find stiffness and mass properties that describe the real point cloud sequences. In the below sequences, DiffCloud correctly infers that the blue polka cloth and black cloth collapse inwards in the lift and fold scenarios, whereas the paper towel and light blue towel retain their shape in both settings.

Lift

Fold

Real RGB

DiffCloud Result

Real Point Cloud

Real RGB

DiffCloud Result

Real Point Cloud

Sim2Sim Qualitative Experiments

We evaluate DiffCloud on parameter estimation for randomly generated target point cloud sequences in the band-stretch and vest-hang scenarios.

DiffCloud learns to find parameters that match a target sequence, using gradient descent starting from an initial guess.

Overlay

Initial Guess

DiffCloud Result

Target

Band Stretch

Low stiffness, high mass

Elastic, taut

Vest Hang

Highly deformable

Shape-retaining

Quantitative Evaluation

DiffCloud achieves lower or comparable loss to non-differentiable baselines requiring substantial training data and hours of train time. Furthermore, the parameters found by DiffCloud intuitively describe real deformable objects.

References

[1] Qiao, Yi-Ling, et al. "Scalable differentiable physics for learning and control." ICML 2020.[2] Ravi, Nikhila, et al. "Accelerating 3d deep learning with pytorch3d." Neural Information Processing Systems WiML Workshop 2020.