Shape Completion with Prediction of Uncertain Regions

Humt, Matthias; Winkelbauer, Dominik; Hillenbrand, Ulrich

Shape Completion with Prediction of Uncertain Regions

Matthias Humt^1,2, Dominik Winkelbauer^1,2, Ulrich Hillenbrand¹

¹German Aerospace Center (DLR) ²TU Munich
IROS 2023

arXiv Paper Code Video Slides (PDF) Slides (PPTX) Poster

Shape completion with uncertain regions - left shows predicted uncertain region when handle is occluded, right shows full reconstruction when handle is visible

Shape completion of a mug. Left: If the handle is occluded from the camera view, we predict the uncertain region (pink) that contains the handle, resulting from pose ambiguity. The mug is reconstructed in the region not affected by pose ambiguity (gray). Right: If the handle is visible in the input point cloud (blue), the handle is reconstructed directly and no uncertain region is predicted.

Video

Abstract

Shape completion, i.e., predicting the complete geometry of an object from a partial observation, is highly relevant for several downstream tasks, most notably robotic manipulation. When basing planning or prediction of real grasps on object shape reconstruction, an indication of severe geometric uncertainty is indispensable. In particular, there can be an irreducible uncertainty in extended regions about the presence of entire object parts when given ambiguous object views.

To treat this important case, we propose two novel methods for predicting such uncertain regions as straightforward extensions of any method for predicting local spatial occupancy, one through postprocessing occupancy scores, the other through direct prediction of an uncertainty indicator. We compare these methods together with two known approaches to probabilistic shape completion.

Moreover, we generate a dataset, derived from ShapeNet, of realistically rendered depth images of object views with ground-truth annotations for the uncertain regions. We train on this dataset and test each method in shape completion and prediction of uncertain regions for known and novel object instances and on synthetic and real data. While direct uncertainty prediction is by far the most accurate in the segmentation of uncertain regions, both novel methods outperform the two baselines in shape completion and uncertain region prediction, and avoiding the predicted uncertain regions increases the quality of grasps for all tested methods.

Dataset

We generate a dataset derived from the ShapeNet mugs category with realistically rendered depth images and ground-truth annotations for uncertain regions. The dataset includes 201 mug instances with handles, split into training (70%), validation (10%), and test (20%) sets. Point clouds are projected from depth images rendered using BlenderProc2, with online augmentation including noise and point removal based on surface normals.

For sim2real evaluation, we test on real-world RGB-D data from the BOP Challenge datasets: Linemod (LM), HomebrewedDB (HB), YCB-Video (YCBV), and Toyota Light (TYOL), containing a total of nine different mugs captured with Primesense and Kinect sensors.

Pose Ambiguity

When the handle is hidden from view by the mug's body, many possible orientations of the mug yield the same observation. The uncertain region (red) is defined by all rigid transformations that produce an identical 2D projection of the input point cloud (blue). We estimate this region by randomly transforming the object and comparing projections—identical views with handles at different positions enlarge the uncertain region.

Method 1: Binary with Gradient Criterion

Extracting uncertain regions from occupancy predictions using gradient thresholding

(a) Slice of a side view from the predicted occupancy probability grid. Using a small lower threshold, a region possibly containing a handle appears behind the mug (light red), but the mug itself is also contained (dark red). (b) Using the gradient of the predicted occupancy probability with its average as upper threshold, this unwanted region can be discarded by only considering the intersection of regions shown in red as uncertain.

This method extracts uncertain regions by thresholding both the occupancy score and the gradient magnitude without requiring extra annotation. The gradient magnitude is expected to be large near the object surface but small in truly uncertain regions farther away.

Method 2: Trinary Classification

Trinary classification directly predicting free, occupied, and uncertain regions

We extend the binary classification (free/occupied) to a trinary classification with classes: free, occupied, and uncertain. The model directly predicts the uncertain region (red) containing all possible handle locations, while reconstructing the mug body (gray) where geometry is certain.

This approach requires ground-truth uncertain region labels for training but achieves significantly higher accuracy in uncertain region segmentation compared to the binary gradient method and baseline approaches.

Quantitative Results

Novel View Generalization

F1 Score

Grasp Collision Risk (GCR)

Novel view GCR without (blue) and with (red) uncertain region filtering

Methods left to right: Trinary, Binary, VAE, Dropout. F1: occupied (blue), uncertain (red). GCR: without filtering (blue), with filtering (red) — lower is better.

Novel Instance Generalization

F1 Score

Grasp Collision Risk (GCR)

Novel instance GCR without (blue) and with (red) uncertain region filtering

Methods left to right: Trinary, Binary, VAE, Dropout. F1: occupied (blue), uncertain (red). GCR: without filtering (blue), with filtering (red) — lower is better.

Key findings: The trinary method achieves by far the highest F1 score for uncertain region prediction (49.5% novel view, 35.5% novel instance) compared to binary (31.8%, 23.1%), VAE (18.7%, 20.1%), and dropout (19.4%, 18.9%). For occupied region prediction, trinary and binary perform similarly (~86% and ~80% F1) and significantly outperform the baselines.

Grasp Collision Risk (GCR): Using the predicted uncertain region to filter grasps reduces GCR for all methods. For novel views, the binary method achieves the lowest GCR when using uncertainty filtering (13.6%), followed by trinary (14.5%). The baselines benefit less from filtering due to their poor uncertain region predictions.

Uncertainty-Aware Grasping

Grasping with and without uncertain region filtering

Grasping the mug with an occluded handle. Left: Without filtering using the predicted uncertain region, the grasp leads to collision with the handle. Right: Discarding grasps that collide with the uncertain region avoids collision and thus improves the grasp quality.

Qualitative Sim2Real Results

HB_pri

HB_kin

LM

TYOL

YCBV₄₈

YCBV₅₅

Qualitative sim2real results (Fig. 5). From top to bottom: Input point cloud, predicted mesh, ground-truth mesh. Models trained only on synthetic ShapeNet data generalize to real-world RGB-D sensor data from BOP Challenge datasets (HomebrewedDB, Linemod, Toyota Light, YCB-Video).

Sim2Real Results: Binary Method

Binary method sim2real results showing shape completion with uncertain region prediction

Binary method sim2real results. The binary model uses occupancy score thresholding combined with a gradient criterion to extract uncertain regions (red) without requiring ground-truth uncertain region labels for training.

Sim2Real Results: Trinary Method

Trinary method sim2real results showing shape completion with uncertain region prediction

Trinary method sim2real results. The trinary model predicts both completed shapes (gray) and uncertain regions (red) for mugs with occluded handles. When the handle is visible in the input (blue), it is reconstructed directly. The trinary method achieves the highest accuracy in uncertain region segmentation.

Poster

BibTeX

@inproceedings{humt2023shape,
  title={Shape Completion with Prediction of Uncertain Regions},
  author={Humt, Matthias and Winkelbauer, Dominik and Hillenbrand, Ulrich},
  booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2023}
}