Publications
Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats

Deep learning-based 3D human pose estimation performs best when trained on large amounts of labeled data, making combined learning from many datasets an important research direction. One obstacle to this endeavor are the different skeleton formats provided by different datasets, i.e., they do not label the same set of anatomical landmarks. There is little prior research on how to best supervise one model with such discrepant labels. We show that simply using separate output heads for different skeletons results in inconsistent depth estimates and insufficient information sharing across skeletons. As a remedy, we propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks. The discovered latent 3D points capture the redundancy among skeletons, enabling enhanced information sharing when used for consistency regularization. Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model, which outperforms prior work on a range of benchmarks, including the challenging 3D Poses in the Wild (3DPW) dataset. Our code and models are available for research purposes.
@inproceedings{Sarandi23WACV,
author = {S\'ar\'andi, Istv\'an and Hermans, Alexander and Leibe, Bastian},
title = {Learning {3D} Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats},
booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
year = {2023},
}
Mask3D for 3D Semantic Instance Segmentation

Modern 3D semantic instance segmentation approaches predominantly rely on specialized voting mechanisms followed by carefully designed geometric clustering techniques. Building on the successes of recent Transformer-based methods for object detection and image segmentation, we propose the first Transformer-based approach for 3D semantic instance segmentation. We show that we can leverage generic Transformer building blocks to directly predict instance masks from 3D point clouds. In our model called Mask3D each object instance is represented as an instance query. Using Transformer decoders, the instance queries are learned by iteratively attending to point cloud features at multiple scales. Combined with point features, the instance queries directly yield all instance masks in parallel. Mask3D has several advantages over current state-of-the-art approaches, since it neither relies on (1) voting schemes which require hand-selected geometric properties (such as centers) nor (2) geometric grouping mechanisms requiring manually-tuned hyper-parameters (e.g. radii) and (3) enables a loss that directly optimizes instance masks. Mask3D sets a new state-of-the-art on ScanNet test (+6.2 mAP), S3DIS 6-fold (+10.1 mAP), STPLS3D (+11.2 mAP) and ScanNet200 test (+12.4 mAP).
@article{Schult23ICRA,
title = {{Mask3D for 3D Semantic Instance Segmentation}},
author = {Schult, Jonas and Engelmann, Francis and Hermans, Alexander and Litany, Or and Tang, Siyu and Leibe, Bastian},
booktitle = {{International Conference on Robotics and Automation (ICRA)}},
year = {2023}
}
Surface Maps via Adaptive Triangulations

We present a new method to compute continuous and bijective maps (surface homeomorphisms) between two or more genus-0 triangle meshes. In contrast to previous approaches, we decouple the resolution at which a map is represented from the resolution of the input meshes. We discretize maps via common triangulations that approximate the input meshes while remaining in bijective correspondence to them. Both the geometry and the connectivity of these triangulations are optimized with respect to a single objective function that simultaneously controls mapping distortion, triangulation quality, and approximation error. A discrete-continuous optimization algorithm performs both energy-based remeshing as well as global second-order optimization of vertex positions, parametrized via the sphere. With this, we combine the disciplines of compatible remeshing and surface map optimization in a unified formulation and make a contribution in both fields. While existing compatible remeshing algorithms often operate on a fixed pre-computed surface map, we can now globally update this correspondence during remeshing. On the other hand, bijective surface-to-surface map optimization previously required computing costly overlay meshes that are inherently tied to the input mesh resolution. We achieve significant complexity reduction by instead assessing distortion between the approximating triangulations. This new map representation is inherently more robust than previous overlay-based approaches, is less intricate to implement, and naturally supports mapping between more than two surfaces. Moreover, it enables adaptive multi-resolution schemes that, e.g., first align corresponding surface regions at coarse resolutions before refining the map where needed. We demonstrate significant speedups and increased flexibility over state-of-the art mapping algorithms at similar map quality, and also provide a reference implementation of the method.
@article{schmidt2023surface,
title={Surface Maps via Adaptive Triangulations},
author={Schmidt, Patrick and Pieper, D\"orte and Kobbelt, Leif},
year={2023},
journal={Computer Graphics Forum},
volume={42},
number={2},
}
Gaining the High Ground: Teleportation to Mid-Air Targets in Immersive Virtual Environments

Most prior teleportation techniques in virtual reality are bound to target positions in the vicinity of selectable scene objects. In this paper, we present three adaptations of the classic teleportation metaphor that enable the user to travel to mid-air targets as well. Inspired by related work on the combination of teleports with virtual rotations, our three techniques differ in the extent to which elevation changes are integrated into the conventional target selection process. Elevation can be specified either simultaneously, as a connected second step, or separately from horizontal movements. A user study with 30 participants indicated a trade-off between the simultaneous method leading to the highest accuracy and the two-step method inducing the lowest task load as well as receiving the highest usability ratings. The separate method was least suitable on its own but could serve as a complement to one of the other approaches. Based on these findings and previous research, we define initial design guidelines for mid-air navigation techniques.
Point2Vec for Self-Supervised Representation Learning on Point Clouds

Recently, the self-supervised learning framework data2vec has shown inspiring performance for various modalities using a masked student-teacher approach. However, it remains open whether such a framework generalizes to the unique challenges of 3D point clouds.To answer this question, we extend data2vec to the point cloud domain and report encouraging results on several downstream tasks. In an in-depth analysis, we discover that the leakage of positional information reveals the overall object shape to the student even under heavy masking and thus hampers data2vec to learn strong representations for point clouds. We address this 3D-specific shortcoming by proposing point2vec, which unleashes the full potential of data2vec-like pre-training on point clouds. Our experiments show that point2vec outperforms other self-supervised methods on shape classification and few-shot learning on ModelNet40 and ScanObjectNN, while achieving competitive results on part segmentation on ShapeNetParts. These results suggest that the learned representations are strong and transferable, highlighting point2vec as a promising direction for self-supervised learning of point cloud representations.
@article{abouzeid2023point2vec,
title = {{Point2Vec for Self-Supervised Representation Learning on Point Clouds}},
author = {Abou Zeid, Karim and Schult, Jonas and Hermans, Alexander and Leibe, Bastian},
journal = {arXiv},
year = {2023},
}
Modeling the Droplet Impact on the Substrate with Surface Preparation in Thermal Spraying with SPH
The properties of thermally sprayed coatings depend heavily on their microstructure. The microstructure is determined by the dynamics of the impact of the droplets on the substrate surface and the subsequent overlapping of the previously solidified and deformed droplets. Substrate preparation prior to spraying ensures strong adhesion of the coating. This includes roughening and preheating of the substrate surface. In the present study, the smoothed particle hydrodynamics (SPH) method is used to model the Al2O3 impact on a preheated substrate and a roughened substrate surface. A semi-implicit enthalpy–porosity method is applied to simulate the solidification process in the mushy zone. In addition, an implicit correction for SPH simulations is used to improve the performance and stability of the simulation. To investigate the dynamics of heat transfer in the contact between the surface and the droplet, the discretization of the substrate is also taken into account. The results show that the studied substrate surface conditions affect the splat morphology and the solidification process. Subsequently, the simulation of multiple droplets for coating formation is also performed and analyzed.
@Article{BHJ+23,
author = {Kirsten Bobzin and Hendrik Heinemann and Kevin Jasutyn and Stefan Rhys Jeske and Jan Bender and Sergej Warkentin and Oleg Mokrov and Rahul Sharma and Uwe Reisgen},
journal = {Journal of Thermal Spray Technology},
title = {Modeling the Droplet Impact on the Substrate with Surface Preparation in Thermal Spraying with SPH},
year = {2023},
month = {jan},
doi = {10.1007/s11666-023-01534-0},
}
Previous Year (2022)