header

Publications


 

STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos


Ali Athar, Sabarinath Mahadevan, Aljoša Ošep, Laura Leal-Taixé, Bastian Leibe
ECCV '20
pubimg

Existing methods for instance segmentation in videos typically involve multi-stage pipelines that follow the tracking-by-detection paradigm and model a video clip as a sequence of images. Multiple networks are used to detect objects in individual frames, and then associate these detections over time. Hence, these methods are often non-end-to-end trainable and highly tailored to specific tasks. In this paper, we propose a different approach that is well-suited to a variety of tasks involving instance segmentation in videos. In particular, we model a video clip as a single 3D spatio-temporal volume, and propose a novel approach that segments and tracks instances across space and time in a single stage. Our problem formulation is centered around the idea of spatio-temporal embeddings which are trained to cluster pixels belonging to a specific object instance over an entire video clip. To this end, we introduce (i) novel mixing functions that enhance the feature representation of spatio-temporal embeddings, and (ii) a single-stage, proposal-free network that can reason about temporal context. Our network is trained end-to-end to learn spatio-temporal embeddings as well as parameters required to cluster these embeddings, thus simplifying inference. Our method achieves state-of-the-art results across multiple datasets and tasks.

» Show BibTeX

@inproceedings{AtharMahadevan20ECCV,
title={STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos},
author={Athar, Ali and Mahadevan, Sabarinath and O{\v{s}}ep, Aljo{\v{s}}a and Leal-Taix{\'e}, Laura and Leibe, Bastian},
booktitle=ECCV,
year={2020}
}





Inter-Surface Maps via Constant-Curvature Metrics


Patrick Schmidt, Marcel Campen, Janis Born, Leif Kobbelt
SIGGRAPH 2020
pubimg

We propose a novel approach to represent maps between two discrete surfaces of the same genus and to minimize intrinsic mapping distortion. Our maps are well-defined at every surface point and are guaranteed to be continuous bijections (surface homeomorphisms). As a key feature of our approach, only the images of vertices need to be represented explicitly, since the images of all other points (on edges or in faces) are properly defined implicitly. This definition is via unique geodesics in metrics of constant Gaussian curvature. Our method is built upon the fact that such metrics exist on surfaces of arbitrary topology, without the need for any cuts or cones (as asserted by the uniformization theorem). Depending on the surfaces' genus, these metrics exhibit one of the three classical geometries: Euclidean, spherical or hyperbolic. Our formulation handles constructions in all three geometries in a unified way. In addition, by considering not only the vertex images but also the discrete metric as degrees of freedom, our formulation enables us to simultaneously optimize the images of these vertices and images of all other points.

» Show BibTeX

@article{schmidt2020intersurface,
author = {Schmidt, Patrick and Campen, Marcel and Born, Janis and Kobbelt, Leif},
title = {Inter-Surface Maps via Constant-Curvature Metrics},
journal = {ACM Transactions on Graphics},
issue_date = {July 2020},
volume = {39},
number = {4},
month = jul,
year = {2020},
articleno = {119},
url = {https://doi.org/10.1145/3386569.3392399},
doi = {10.1145/3386569.3392399},
publisher = {ACM},
address = {New York, NY, USA},
}





3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation


Francis Engelmann, Martin Bokeloh, Alireza Fathi, Bastian Leibe, Matthias Nießner
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020
pubimg

We present 3D-MPA, a method for instance segmentation on 3D point clouds. Given an input point cloud, we propose an object-centric approach where each point votes for its object center. We sample object proposals from the predicted object centers. Then we learn proposal features from grouped point features that voted for the same object center. A graph convolutional network introduces inter-proposal relations, providing higher-level feature learning in addition to the lower-level point features. Each proposal comprises a semantic label, a set of associated points over which we define a foreground-background mask, an objectness score and aggregation features. Previous works usually perform non-maximum-suppression (NMS) over proposals to obtain the final object detections or semantic instances. However, NMS can discard potentially correct predictions. Instead, our approach keeps all proposals and groups them together based on the learned aggregation features. We show that grouping proposals improves over NMS and outperforms previous state-of-the-art methods on the tasks of 3D object detection and semantic instance segmentation on the ScanNetV2 benchmark and the S3DIS dataset.

» Show BibTeX

@inproceedings{Engelmann20CVPR,
title = {{3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation}},
author = {Engelmann, Francis and Bokeloh, Martin and Fathi, Alireza and Leibe, Bastian and Nie{\ss}ner, Matthias},
booktitle = {{IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}},
year = {2020}
}





DualConvMesh-Net: Joint Geodesic and Euclidean Convolutions on 3D Meshes


Jonas Schult*, Francis Engelmann*, Theodora Kontogianni, Bastian Leibe
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020 (Oral)
pubimg

We propose DualConvMesh-Nets (DCM-Net) a family of deep hierarchical convolutional networks over 3D geometric data that combines two types of convolutions. The first type, geodesic convolutions, defines the kernel weights over mesh surfaces or graphs. That is, the convolutional kernel weights are mapped to the local surface of a given mesh. The second type, Euclidean convolutions, is independent of any underlying mesh structure. The convolutional kernel is applied on a neighborhood obtained from a local affinity representation based on the Euclidean distance between 3D points. Intuitively, geodesic convolutions can easily separate objects that are spatially close but have disconnected surfaces, while Euclidean convolutions can represent interactions between nearby objects better, as they are oblivious to object surfaces. To realize a multi-resolution architecture, we borrow well-established mesh simplification methods from the geometry processing domain and adapt them to define mesh-preserving pooling and unpooling operations. We experimentally show that combining both types of convolutions in our architecture leads to significant performance gains for 3D semantic segmentation, and we report competitive results on three scene segmentation benchmarks.

» Show BibTeX

@inproceedings{Schult20CVPR,
author = {Jonas Schult* and
Francis Engelmann* and
Theodora Kontogianni and
Bastian Leibe},
title = {{DualConvMesh-Net: Joint Geodesic and Euclidean Convolutions on 3D Meshes}},
booktitle = {{IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}},
year = {2020}
}





Siam R-CNN: Visual Tracking by Re-Detection


Paul Voigtlaender, Jonathon Luiten, Philip Torr, Bastian Leibe
CVPR'20
pubimg

We present Siam R-CNN, a Siamese re-detection architecture which unleashes the full power of two-stage object detection approaches for visual object tracking. We combine this with a novel tracklet-based dynamic programming algorithm, which takes advantage of re-detections of both the first-frame template and previous-frame predictions, to model the full history of both the object to be tracked and potential distractor objects. This enables our approach to make better tracking decisions, as well as to re-detect tracked objects after long occlusion. Finally, we propose a novel hard example mining strategy to improve Siam RCNN’s robustness to similar looking objects. The proposed tracker achieves the current best performance on ten tracking benchmarks, with especially strong results for long-term tracking.

» Show BibTeX

@inproceedings{Voigtlaender20CVPR,
title={Siam R-CNN: Visual Tracking by Re-Detection},
author={Paul Voigtlaender and Jonathon Luiten and Philip H. S. Torr and Bastian Leibe},
year={2020},
booktitle={CVPR},
}





Making a Case for 3D Convolutions for Object Segmentation in Videos


Sabarinath Mahadevan, Ali Athar, Aljoša Ošep, Sebastian Hennen, Laura Leal-Taixé, Bastian Leibe
BMVC'20
pubimg

The task of object segmentation in videos is usually accomplished by processing appearance and motion information separately using standard 2D convolutional networks, followed by a learned fusion of the two sources of information. On the other hand, 3D convolutional networks have been successfully applied for video classification tasks, but have not been leveraged as effectively to problems involving dense per-pixel interpretation of videos compared to their 2D convolutional counterparts and lag behind the aforementioned networks in terms of performance. In this work, we show that 3D CNNs can be effectively applied to dense video prediction tasks such as salient object segmentation. We propose a simple yet effective encoder-decoder network architecture consisting entirely of 3D convolutions that can be trained end-to-end using a standard cross-entropy loss. To this end, we leverage an efficient 3D encoder, and propose a 3D decoder architecture, that comprises novel 3D Global Convolution layers and 3D Refinement modules. Our approach outperforms existing state-of-the-arts by a large margin on the DAVIS'16 Unsupervised, FBMS and ViSal dataset benchmarks in addition to being faster, thus showing that our architecture can efficiently learn expressive spatio-temporal features and produce high quality video segmentation masks.

» Show BibTeX

@inproceedings{Mahadevan20BMVC,
title={Making a Case for 3D Convolutions for Object Segmentation in Videos},
author={Mahadevan, Sabarinath and Athar, Ali and O{\v{s}}ep, Aljo{\v{s}}a and Hennen, Sebastian and Leal-Taix{\'e}, Laura and Leibe, Bastian},
booktitle={BMVC},
year={2020}
}





Implicit Frictional Boundary Handling for SPH


Jan Bender, Tassilo Kugelstadt, Marcel Weiler, Dan Koschier
IEEE Transactions on Visualization and Computer Graphics
pubimg

In this paper, we present a novel method for the robust handling of static and dynamic rigid boundaries in Smoothed Particle Hydrodynamics (SPH) simulations. We build upon the ideas of the density maps approach which has been introduced recently by Koschier and Bender. They precompute the density contributions of solid boundaries and store them on a spatial grid which can be efficiently queried during runtime. This alleviates the problems of commonly used boundary particles, like bumpy surfaces and inaccurate pressure forces near boundaries. Our method is based on a similar concept but we precompute the volume contribution of the boundary geometry. This maintains all benefits of density maps but offers a variety of advantages which are demonstrated in several experiments. Firstly, in contrast to the density maps method we can compute derivatives in the standard SPH manner by differentiating the kernel function. This results in smooth pressure forces, even for lower map resolutions, such that precomputation times and memory requirements are reduced by more than two orders of magnitude compared to density maps. Furthermore, this directly fits into the SPH concept so that volume maps can be seamlessly combined with existing SPH methods. Finally, the kernel function is not baked into the map such that the same volume map can be used with different kernels. This is especially useful when we want to incorporate common surface tension or viscosity methods that use different kernels than the fluid simulation.

» Show BibTeX

@Article{BKWK2020,
author = {Jan Bender and Tassilo Kugelstadt and Marcel Weiler and Dan Koschier },
title = {Implicit Frictional Boundary Handling for SPH},
journal = {IEEE Transactions on Visualization and Computer Graphics},
year = {2020},
publisher = {IEEE},
volume={26},
number={10},
pages={2982-2993},
doi={10.1109/TVCG.2020.3004245},
}





Dilated Point Convolutions: On the Receptive Field Size of Point Convolutions on 3D Point Clouds


Francis Engelmann, Theodora Kontogianni, Bastian Leibe
International Conference on Robotics and Automation (ICRA) 2020
pubimg

In this work, we propose Dilated Point Convolutions (DPC). In a thorough ablation study, we show that the receptive field size is directly related to the performance of 3D point cloud processing tasks, including semantic segmentation and object classification. Point convolutions are widely used to efficiently process 3D data representations such as point clouds or graphs. However, we observe that the receptive field size of recent point convolutional networks is inherently limited. Our dilated point convolutions alleviate this issue, they significantly increase the receptive field size of point convolutions. Importantly, our dilation mechanism can easily be integrated into most existing point convolutional networks. To evaluate the resulting network architectures, we visualize the receptive field and report competitive scores on popular point cloud benchmarks.

» Show BibTeX

@inproceedings{Engelmann20ICRA,
author = {Engelmann, Francis and Kontogianni, Theodora and Leibe, Bastian},
title = {{Dilated Point Convolutions: On the Receptive Field Size of Point Convolutions on 3D Point Clouds}},
booktitle = {{International Conference on Robotics and Automation (ICRA)}},
year = {2020}
}





Track to Reconstruct and Reconstruct to Track


Jonathon Luiten, Tobias Fischer, Bastian Leibe
RA-L 2020 / ICRA 2020
pubimg

Object tracking and 3D reconstruction are often performed together, with tracking used as input for reconstruction. However, the obtained reconstructions also provide useful information for improving tracking. We propose a novel method that closes this loop, first tracking to reconstruct, and then reconstructing to track. Our approach, MOTSFusion (Multi-Object Tracking, Segmentation and dynamic object Fusion), exploits the 3D motion extracted from dynamic object reconstructions to track objects through long periods of complete occlusion and to recover missing detections. Our approach first builds up short tracklets using 2D optical flow, and then fuses these into dynamic 3D object reconstructions. The precise 3D object motion of these reconstructions is used to merge tracklets through occlusion into long-term tracks, and to locate objects when detections are missing. On KITTI, our reconstruction-based tracking reduces the number of ID switches of the initial tracklets by more than 50%, and outperforms all previous approaches for both bounding box and segmentation tracking.

» Show BibTeX

@article{luiten2020track,
title={Track to Reconstruct and Reconstruct to Track},
author={Luiten, Jonathon and Fischer, Tobias and Leibe, Bastian},
journal={IEEE Robotics and Automation Letters},
volume={5},
number={2},
pages={1803--1810},
year={2020},
publisher={IEEE}
}





UnOVOST: Unsupervised Offline Video Object Segmentation and Tracking


Jonathon Luiten, Idil Esen Zulfikar, Bastian Leibe
WACV 2020
pubimg

We address Unsupervised Video Object Segmentation (UVOS), the task of automatically generating accurate pixel masks for salient objects in a video sequence and of tracking these objects consistently through time, without any input about which objects should be tracked. Towards solving this task, we present UnOVOST (Unsupervised Offline Video Object Segmentation and Tracking) as a simple and generic algorithm which is able to track and segment a large variety of objects. This algorithm builds up tracks in a number stages, first grouping segments into short tracklets that are spatio-temporally consistent, before merging these tracklets into long-term consistent object tracks based on their visual similarity. In order to achieve this we introduce a novel tracklet-based Forest Path Cutting data association algorithm which builds up a decision forest of track hypotheses before cutting this forest into paths that form long-term consistent object tracks. When evaluating our approach on the DAVIS 2017 Unsupervised dataset we obtain state-of-the-art performance with a mean J &F score of 67.9% on the val, 58% on the test-dev and 56.4% on the test-challenge benchmarks, obtaining first place in the DAVIS 2019 Unsupervised Video Object Segmentation Challenge. UnOVOST even performs competitively with many semi-supervised video object segmentation algorithms even though it is not given any input as to which objects should be tracked and segmented.

» Show BibTeX

@inproceedings{luiten2020unovost,
title={UnOVOST: Unsupervised Offline Video Object Segmentation and Tracking},
author={Luiten, Jonathon and Zulfikar, Idil Esen and Leibe, Bastian},
booktitle={Proceedings of the IEEE Winter Conference on Applications in Computer Vision},
year={2020}
}





Metric-Scale Truncation-Robust Heatmaps for 3D Human Pose Estimation


István Sárándi, Timm Linder, Kai O. Arras, Bastian Leibe
IEEE International Conference on Automatic Face and Gesture Recognition (FG) 2020
pubimg

Heatmap representations have formed the basis of 2D human pose estimation systems for many years, but their generalizations for 3D pose have only recently been considered. This includes 2.5D volumetric heatmaps, whose X and Y axes correspond to image space and the Z axis to metric depth around the subject. To obtain metric-scale predictions, these methods must include a separate, explicit post-processing step to resolve scale ambiguity. Further, they cannot encode body joint positions outside of the image boundaries, leading to incomplete pose estimates in case of image truncation. We address these limitations by proposing metric-scale truncation-robust (MeTRo) volumetric heatmaps, whose dimensions are defined in metric 3D space near the subject, instead of being aligned with image space. We train a fully-convolutional network to estimate such heatmaps from monocular RGB in an end-to-end manner. This reinterpretation of the heatmap dimensions allows us to estimate complete metric-scale poses without test-time knowledge of the focal length or person distance and without relying on anthropometric heuristics in post-processing. Furthermore, as the image space is decoupled from the heatmap space, the network can learn to reason about joints beyond the image boundary. Using ResNet-50 without any additional learned layers, we obtain state-of-the-art results on the Human3.6M and MPI-INF-3DHP benchmarks. As our method is simple and fast, it can become a useful component for real-time top-down multi-person pose estimation systems. We make our code publicly available to facilitate further research.

» Show BibTeX

@inproceedings{Sarandi20FG,
title={Metric-Scale Truncation-Robust Heatmaps for 3{D} Human Pose Estimation},
author={S\'ar\'andi, Istv\'an and Linder, Timm and Arras, Kai O. and Leibe, Bastian},
booktitle={Automatic Face and Gesture Recognition, 2020 IEEE Int. Conf. on},
year={2020}
}





Higher-Order Time Integration for Deformable Solids


Fabian Löschner, Andreas Longva, Stefan Jeske, Tassilo Kugelstadt, Jan Bender
Computer Graphics Forum
pubimg

Visually appealing and vivid simulations of deformable solids represent an important aspect of physically based computer animation. For the temporal discretization, it is customary in computer animation to use first-order accurate integration methods, such as Backward Euler, due to their simplicity and robustness. Although there is notable research on second-order methods, their use is not widespread. Many of these well-known methods have significant drawbacks such as severe numerical damping or scene-dependent time step restrictions to ensure stability. In this paper, we discuss the most relevant requirements on such methods in computer animation and motivate the interest beyond first-order accuracy. Keeping these requirements in mind, we investigate several promising methods from the families of diagonally implicit Runge-Kutta (DIRK) and Rosenbrock methods which currently do not appear to have considerable popularity in this field. We show that the usage of such methods improves the visual quality of physical animations. In addition, we demonstrate that they allow distinctly more control over damping at lower computational cost than classical methods. As part of our theoretical contribution, we review aspects of simulations that are often considered more intricate with higher-order methods, such as contact handling. To this end, we derive an implicit linearized contact model based on a predictor-corrector approach that leads to consistent behavior with higher-order integrators as predictors. Our contact model is well suited for the simulation of stiff, nonlinear materials with the integration methods presented in this paper and more common methods such as Backward Euler alike.

» Show BibTeX

@article{LLJKB20,
author = {Fabian L{\"{o}}schner and Andreas Longva and Stefan Jeske and Tassilo Kugelstadt and Jan Bender},
title = {Higher-Order Time Integration for Deformable Solids},
year = {2020},
journal = {Computer Graphics Forum},
volume = {39},
number = {8}
}





Inferring a User’s Intent on Joining or Passing by Social Groups


Andrea Bönsch, Alexander R. Bluhm, Jonathan Ehret, Torsten Wolfgang Kuhlen
20th ACM International Conference on Intelligent Virtual Agents 2020 (IVA'20)
pubimg

Modeling the interactions between users and social groups of virtual agents (VAs) is vital in many virtual-reality-based applications. However, only little research on group encounters has been conducted yet. We intend to close this gap by focusing on the distinction between joining and passing-by a group. To enhance the interactive capacity of VAs in these situations, knowing the user’s objective is required to showreasonable reactions. To this end,we propose a classification scheme which infers the user’s intent based on social cues such as proxemics, gazing and orientation, followed by triggering believable, non-verbal actions on the VAs.We tested our approach in a pilot study with overall promising results and discuss possible improvements for further studies.

» Show BibTeX

@inproceedings{10.1145/3383652.3423862,
author = {B\"{o}nsch, Andrea and Bluhm, Alexander R. and Ehret, Jonathan and Kuhlen, Torsten W.},
title = {Inferring a User's Intent on Joining or Passing by Social Groups},
year = {2020},
isbn = {9781450375863},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3383652.3423862},
doi = {10.1145/3383652.3423862},
abstract = {Modeling the interactions between users and social groups of virtual agents (VAs) is vital in many virtual-reality-based applications. However, only little research on group encounters has been conducted yet. We intend to close this gap by focusing on the distinction between joining and passing-by a group. To enhance the interactive capacity of VAs in these situations, knowing the user's objective is required to show reasonable reactions. To this end, we propose a classification scheme which infers the user's intent based on social cues such as proxemics, gazing and orientation, followed by triggering believable, non-verbal actions on the VAs. We tested our approach in a pilot study with overall promising results and discuss possible improvements for further studies.},
booktitle = {Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents},
articleno = {10},
numpages = {8},
keywords = {virtual agents, joining a group, social groups, virtual reality},
location = {Virtual Event, Scotland, UK},
series = {IVA '20}
}





Evaluating the Influence of Phoneme-Dependent Dynamic Speaker Directivity of Embodied Conversational Agents’ Speech


Jonathan Ehret, Jonas Stienen, Chris Brozdowski, Andrea Bönsch, Irene Mittelberg, Michael Vorländer, Torsten Wolfgang Kuhlen
20th ACM International Conference on Intelligent Virtual Agents 2020 (IVA'20)
pubimg

Generating natural embodied conversational agents within virtual spaces crucially depends on speech sounds and their directionality. In this work, we simulated directional filters to not only add directionality, but also directionally adapt each phoneme. We therefore mimic reality where changing mouth shapes have an influence on the directional propagation of sound. We conducted a study (n = 32) evaluating naturalism ratings, preference and distinguishability of omnidirectional speech auralization compared to static and dynamic, phoneme-dependent directivities. The results indicated that participants cannot distinguish dynamic from static directivity. Furthermore, participants’ preference ratings aligned with their naturalism ratings. There was no unanimity, however, with regards to which auralization is the most natural.

» Show BibTeX

@inproceedings{10.1145/3383652.3423863,
author = {Ehret, Jonathan and Stienen, Jonas and Brozdowski, Chris and B\"{o}nsch, Andrea and Mittelberg, Irene and Vorl\"{a}nder, Michael and Kuhlen, Torsten W.},
title = {Evaluating the Influence of Phoneme-Dependent Dynamic Speaker Directivity of Embodied Conversational Agents' Speech},
year = {2020},
isbn = {9781450375863},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3383652.3423863},
doi = {10.1145/3383652.3423863},
abstract = {Generating natural embodied conversational agents within virtual spaces crucially depends on speech sounds and their directionality. In this work, we simulated directional filters to not only add directionality, but also directionally adapt each phoneme. We therefore mimic reality where changing mouth shapes have an influence on the directional propagation of sound. We conducted a study (n = 32) evaluating naturalism ratings, preference and distinguishability of omnidirectional speech auralization compared to static and dynamic, phoneme-dependent directivities. The results indicated that participants cannot distinguish dynamic from static directivity. Furthermore, participants' preference ratings aligned with their naturalism ratings. There was no unanimity, however, with regards to which auralization is the most natural.},
booktitle = {Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents},
articleno = {17},
numpages = {8},
keywords = {phoneme-dependent directivity, directional 3D sound, speech, embodied conversational agents, virtual acoustics},
location = {Virtual Event, Scotland, UK},
series = {IVA '20}
}





Rilievo: Artistic Scene Authoring via Interactive Height Map Extrusion in VR


Sevinc Eroglu, Patric Schmitz, Carlos Aguilera Martinez, Jana Rusch, Leif Kobbelt, Torsten Wolfgang Kuhlen
ACM SIGGRAPH 2020 Art Papers. Published in Leonardo Journal.
pubimg

The authors present a virtual authoring environment for artistic creation in VR. It enables the effortless conversion of 2D images into volumetric 3D objects. Artistic elements in the input material are extracted with a convenient VR-based segmentation tool. Relief sculpting is then performed by interactively mixing different height maps. These are automatically generated from the input image structure and appearance. A prototype of the tool is showcased in an analog-virtual artistic workflow in collaboration with a traditional painter. It combines the expressiveness of analog painting and sculpting with the creative freedom of spatial arrangement in VR.

» Show BibTeX

@article{eroglu2020rilievo,
title={Rilievo: Artistic Scene Authoring via Interactive Height Map Extrusion in VR},
author={Eroglu, Sevinc and Schmitz, Patric and Martinez, Carlos Aguilera and Rusch, Jana and Kobbelt, Leif and Kuhlen, Torsten W},
journal={Leonardo},
volume={53},
number={4},
pages={438--441},
year={2020},
publisher={MIT Press}
}





Fast and Robust QEF Minimization using Probabilistic Quadrics


Philip Trettner, Leif Kobbelt
Computer Graphics Forum (Proc. EUROGRAPHICS 2020)
pubimg

Error quadrics are a fundamental and powerful building block in many geometry processing algorithms. However, finding the minimizer of a given quadric is in many cases not robust and requires a singular value decomposition or some ad-hoc regularization. While classical error quadrics measure the squared deviation from a set of ground truth planes or polygons, we treat the input data as genuinely uncertain information and embed error quadrics in a probabilistic setting ("probabilistic quadrics") where the optimal point minimizes the expected squared error. We derive closed form solutions for the popular plane and triangle quadrics subject to (spatially varying, anisotropic) Gaussian noise. Probabilistic quadrics can be minimized robustly by solving a simple linear system - 50x faster than SVD. We show that probabilistic quadrics have superior properties in tasks like decimation and isosurface extraction since they favor more uniform triangulations and are more tolerant to noise while still maintaining feature sensitivity. A broad spectrum of applications can directly benefit from our new quadrics as a drop-in replacement which we demonstrate with mesh smoothing via filtered quadrics and non-linear subdivision surfaces.

» Show BibTeX

@article {10.1111:cgf.13933,
journal = {Computer Graphics Forum},
title = {{Fast and Robust QEF Minimization using Probabilistic Quadrics}},
author = {Trettner, Philip and Kobbelt, Leif},
year = {2020},
publisher = {The Eurographics Association and John Wiley & Sons Ltd.},
ISSN = {1467-8659},
DOI = {10.1111/cgf.13933}
}





High-Fidelity Point-Based Rendering of Large-Scale 3D Scan Datasets


Patric Schmitz, Timothy Blut, Christian Mattes, Leif Kobbelt
IEEE Computer Graphics and Applications
pubimg

Digitalization of 3D objects and scenes using modern depth sensors and high-resolution RGB cameras enables the preservation of human cultural artifacts at an unprecedented level of detail. Interactive visualization of these large datasets, however, is challenging without degradation in visual fidelity. A common solution is to fit the dataset into available video memory by downsampling and compression. The achievable reproduction accuracy is thereby limited for interactive scenarios, such as immersive exploration in Virtual Reality (VR). This degradation in visual realism ultimately hinders the effective communication of human cultural knowledge. This article presents a method to render 3D scan datasets with minimal loss of visual fidelity. A point-based rendering approach visualizes scan data as a dense splat cloud. For improved surface approximation of thin and sparsely sampled objects, we propose oriented 3D ellipsoids as rendering primitives. To render massive texture datasets, we present a virtual texturing system that dynamically loads required image data. It is paired with a single-pass page prediction method that minimizes visible texturing artifacts. Our system renders a challenging dataset in the order of 70 million points and a texture size of 1.2 terabytes consistently at 90 frames per second in stereoscopic VR.




The Impact of a Virtual Agent’s Non-Verbal Emotional Expression on a User’s Personal Space Preferences


Andrea Bönsch, Sina Radke, Jonathan Ehret, Ute Habel, Torsten Wolfgang Kuhlen
20th ACM International Conference on Intelligent Virtual Agents 2020 (IVA'20)
pubimg

Virtual-reality-based interactions with virtual agents (VAs) are likely subject to similar influences as human-human interactions. In either real or virtual social interactions, interactants try to maintain their personal space (PS), an ubiquitous, situative, flexible safety zone. Building upon larger PS preferences to humans and VAs with angry facial expressions, we extend the investigations to whole-body emotional expressions. In two immersive settings–HMD and CAVE–66 males were approached by an either happy, angry, or neutral male VA. Subjects preferred a larger PS to the angry VA when being able to stop him at their convenience (Sample task), replicating previous findings, and when being able to actively avoid him (PassBy task). In the latter task, we also observed larger distances in the CAVE than in the HMD.

» Show BibTeX

@inproceedings{10.1145/3383652.3423888,
author = {B\"{o}nsch, Andrea and Radke, Sina and Ehret, Jonathan and Habel, Ute and Kuhlen, Torsten W.},
title = {The Impact of a Virtual Agent's Non-Verbal Emotional Expression on a User's Personal Space Preferences},
year = {2020},
isbn = {9781450375863},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3383652.3423888},
doi = {10.1145/3383652.3423888},
abstract = {Virtual-reality-based interactions with virtual agents (VAs) are likely subject to similar influences as human-human interactions. In either real or virtual social interactions, interactants try to maintain their personal space (PS), an ubiquitous, situative, flexible safety zone. Building upon larger PS preferences to humans and VAs with angry facial expressions, we extend the investigations to whole-body emotional expressions. In two immersive settings-HMD and CAVE-66 males were approached by an either happy, angry, or neutral male VA. Subjects preferred a larger PS to the angry VA when being able to stop him at their convenience (Sample task), replicating previous findings, and when being able to actively avoid him (Pass By task). In the latter task, we also observed larger distances in the CAVE than in the HMD.},
booktitle = {Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents},
articleno = {12},
numpages = {8},
keywords = {personal space, virtual reality, emotions, virtual agents},
location = {Virtual Event, Scotland, UK},
series = {IVA '20}
}





The 19 Unifying Questionnaire Constructs of Artificial Social Agents: An IVA Community Analysis


Siska Fitrianie, Merijn Bruijnes, Deborah Richards, Andrea Bönsch, Willem-Paul Brinkman
20th ACM International Conference on Intelligent Virtual Agents 2020 (IVA'20)
pubimg

In this paper, we report on the multi-year Intelligent Virtual Agents (IVA) community effort, involving more than 80 researchers worldwide, researching the IVA community interests and practises in evaluating human interaction with an artificial social agent (ASA). The effort is driven by previous IVA workshops and plenary IVA discussions related to the methodological crisis on the evaluation of ASAs. A previous literature review showed a continuous practise of creating new questionnaires instead of reusing validated questionnaires. We address this issue by examining questionnaire measurement constructs used in empirical studies between 2013 to 2018 published in the IVA conference. We identified 189 constructs used in 89 questionnaires that are reported across 81 studies. Although these constructs have different names, they often measure the same thing. In this paper, we, therefore, present a unifying set of 19 constructs that captures more than 80% of the 189 constructs initially identified. We established this set in two steps. First, 49 researchers classified the constructs in broad theoretically based categories. Next, 23 researchers grouped the constructs in each category on their similarity. The resulting 19 groups form a unifying set of constructs, which will be the basis for the future questionnaire instrument of human-ASA interaction.

Nominated for the Best Paper Award.

» Show BibTeX

@inproceedings{10.1145/3383652.3423873,
author = {Fitrianie, Siska and Bruijnes, Merijn and Richards, Deborah and B\"{o}nsch, Andrea and Brinkman, Willem-Paul},
title = {The 19 Unifying Questionnaire Constructs of Artificial Social Agents: An IVA Community Analysis},
year = {2020},
isbn = {9781450375863},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3383652.3423873},
doi = {10.1145/3383652.3423873},
abstract = {In this paper, we report on the multi-year Intelligent Virtual Agents (IVA) community effort, involving more than 80 researchers worldwide, researching the IVA community interests and practises in evaluating human interaction with an artificial social agent (ASA). The effort is driven by previous IVA workshops and plenary IVA discussions related to the methodological crisis on the evaluation of ASAs. A previous literature review showed a continuous practise of creating new questionnaires instead of reusing validated questionnaires. We address this issue by examining questionnaire measurement constructs used in empirical studies between 2013 to 2018 published in the IVA conference. We identified 189 constructs used in 89 questionnaires that are reported across 81 studies. Although these constructs have different names, they often measure the same thing. In this paper, we, therefore, present a unifying set of 19 constructs that captures more than 80% of the 189 constructs initially identified. We established this set in two steps. First, 49 researchers classified the constructs in broad theoretically based categories. Next, 23 researchers grouped the constructs in each category on their similarity. The resulting 19 groups form a unifying set of constructs, which will be the basis for the future questionnaire instrument of human-ASA interaction.},
booktitle = {Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents},
articleno = {21},
numpages = {8},
keywords = {evaluation instrument, user study, Artificial social agent, questionnaire, measurement construct},
location = {Virtual Event, Scotland, UK},
series = {IVA '20}
}





High-Performance Image Filters via Sparse Approximations


Kersten Schuster, Philip Trettner, Leif Kobbelt
Proceedings of the ACM on Computer Graphics and Interactive Techniques, Vol. 3, No. 2, 2020
pubimg

We present a numerical optimization method to find highly efficient (sparse) approximations for convolutional image filters. Using a modified parallel tempering approach, we solve a constrained optimization that maximizes approximation quality while strictly staying within a user-prescribed performance budget. The results are multi-pass filters where each pass computes a weighted sum of bilinearly interpolated sparse image samples, exploiting hardware acceleration on the GPU. We systematically decompose the target filter into a series of sparse convolutions, trying to find good trade-offs between approximation quality and performance. Since our sparse filters are linear and translation-invariant, they do not exhibit the aliasing and temporal coherence issues that often appear in filters working on image pyramids. We show several applications, ranging from simple Gaussian or box blurs to the emulation of sophisticated Bokeh effects with user-provided masks. Our filters achieve high performance as well as high quality, often providing significant speed-up at acceptable quality even for separable filters. The optimized filters can be baked into shaders and used as a drop-in replacement for filtering tasks in image processing or rendering pipelines.




Cost Minimizing Local Anisotropic Quad Mesh Refinement


Max Lyon, David Bommes, Leif Kobbelt
Eurographics Symposium on Geometry Processing 2020
pubimg

Quad meshes as a surface representation have many conceptual advantages over triangle meshes. Their edges can naturally be aligned to principal curvatures of the underlying surface and they have the flexibility to create strongly anisotropic cells without causing excessively small inner angles. While in recent years a lot of progress has been made towards generating high quality uniform quad meshes for arbitrary shapes, their adaptive and anisotropic refinement remains difficult since a single edge split might propagate across the entire surface in order to maintain consistency. In this paper we present a novel refinement technique which finds the optimal trade-off between number of resulting elements and inserted singularities according to a user prescribed weighting. Our algorithm takes as input a quad mesh with those edges tagged that are prescribed to be refined. It then formulates a binary optimization problem that minimizes the number of additional edges which need to be split in order to maintain consistency. Valence 3 and 5 singularities have to be introduced in the transition region between refined and unrefined regions of the mesh. The optimization hence computes the optimal trade-off and places singularities strategically in order to minimize the number of consistency splits — or avoids singularities where this causes only a small number of additional splits. When applying the refinement scheme iteratively, we extend our binary optimization formulation such that previous splits can be undone if this prevents degenerate cells with small inner angles that otherwise might occur in anisotropic regions or in the vicinity of singularities. We demonstrate on a number of challenging examples that the algorithm performs well in practice.




A Three-Level Approach to Texture Mapping and Synthesis on 3D Surfaces


Kersten Schuster, Philip Trettner, Patric Schmitz, Leif Kobbelt
Proceedings of the ACM on Computer Graphics and Interactive Techniques, Vol. 3, No. 1, 2020
pubimg

We present a method for example-based texturing of triangular 3D meshes. Our algorithm maps a small 2D texture sample onto objects of arbitrary size in a seamless fashion, with no visible repetitions and low overall distortion. It requires minimal user interaction and can be applied to complex, multi-layered input materials that are not required to be tileable. Our framework integrates a patch-based approach with per-pixel compositing. To minimize visual artifacts, we run a three-level optimization that starts with a rigid alignment of texture patches (macro scale), then continues with non-rigid adjustments (meso scale) and finally performs pixel-level texture blending (micro scale). We demonstrate that the relevance of the three levels depends on the texture content and type (stochastic, structured, or anisotropic textures).

» Show BibTeX

@article{schuster2020,
author = {Schuster, Kersten and Trettner, Philip and Schmitz, Patric and Kobbelt, Leif},
title = {A Three-Level Approach to Texture Mapping and Synthesis on 3D Surfaces},
year = {2020},
issue_date = {Apr 2020},
publisher = {The Association for Computers in Mathematics and Science Teaching},
address = {USA},
volume = {3},
number = {1},
url = {https://doi.org/10.1145/3384542},
doi = {10.1145/3384542},
journal = {Proc. ACM Comput. Graph. Interact. Tech.},
month = apr,
articleno = {1},
numpages = {19},
keywords = {material blending, surface texture synthesis, texture mapping}
}





Immersive Sketching to Author Crowd Movements in Real-time


Andrea Bönsch, Sebastian J. Barton, Jonathan Ehret, Torsten Wolfgang Kuhlen
20th ACM International Conference on Intelligent Virtual Agents 2020 (IVA'20)
pubimg

the flow of virtual crowds in a direct and interactive manner. Here, options to redirect a flow by sketching barriers, or guiding entities based on a sketched network of connected sections are provided. As virtual crowds are increasingly often embedded into virtual reality (VR) applications, 3D authoring is of interest.

In this preliminary work, we thus present a sketch-based approach for VR. First promising results of a proof-of-concept are summarized and improvement suggestions, extensions, and future steps are discussed.

» Show BibTeX

@inproceedings{10.1145/3383652.3423883,
author = {B\"{o}nsch, Andrea and Barton, Sebastian J. and Ehret, Jonathan and Kuhlen, Torsten W.},
title = {Immersive Sketching to Author Crowd Movements in Real-Time},
year = {2020},
isbn = {9781450375863},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3383652.3423883},
doi = {10.1145/3383652.3423883},
abstract = {Sketch-based interfaces in 2D screen space allow to efficiently author the flow of virtual crowds in a direct and interactive manner. Here, options to redirect a flow by sketching barriers, or guiding entities based on a sketched network of connected sections are provided.As virtual crowds are increasingly often embedded into virtual reality (VR) applications, 3D authoring is of interest. In this preliminary work, we thus present a sketch-based approach for VR. First promising results of a proof-of-concept are summarized and improvement suggestions, extensions, and future steps are discussed.},
booktitle = {Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents},
articleno = {11},
numpages = {3},
keywords = {virtual crowds, virtual reality, sketch-based interface, authoring},
location = {Virtual Event, Scotland, UK},
series = {IVA '20}
}





Unsupervised Segmentation of Indoor 3D Point Cloud: Application to Object-based Classification


Florent Poux, Christian Mattes, Leif Kobbelt
3D GeoInfo Conference 2020
pubimg

Point cloud data of indoor scenes is primarily composed of planar-dominant elements. Automatic shape segmentation is thus valuable to avoid labour intensive labelling. This paper provides a fully unsupervised region growing segmentation approach for efficient clustering of massive 3D point clouds. Our contribution targets a low-level grouping beneficial to object-based classification. We argue that the use of relevant segments for object-based classification has the potential to perform better in terms of recognition accuracy, computing time and lowers the manual labelling time needed. However, fully unsupervised approaches are rare due to a lack of proper generalisation of user-defined parameters. We propose a self-learning heuristic process to define optimal parameters, and we validate our method on a large and richly annotated dataset (S3DIS) yielding 88.1% average F1-score for object-based classification. It permits to automatically segment indoor point clouds with no prior knowledge at commercially viable performance and is the foundation for efficient indoor 3D modelling in cluttered point clouds.

» Show BibTeX

@Article{poux2020b,
author = {Poux, F. and Mattes, C. and Kobbelt, L.},
title = {UNSUPERVISED SEGMENTATION OF INDOOR 3D POINT CLOUD: APPLICATION TO OBJECT-BASED CLASSIFICATION},
journal = {ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences},
volume = {XLIV-4/W1-2020},
year = {2020},
pages = {111--118},
url = {https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLIV-4-W1-2020/111/2020/},
doi = {10.5194/isprs-archives-XLIV-4-W1-2020-111-2020}
}





Initial User-Centered Design of a Virtual Reality Heritage System: Applications for Digital Tourism


Florent Poux, Quentin Valembois, Christian Mattes, Leif Kobbelt, Roland Billen
Remote Sensing
pubimg

Reality capture allows for the reconstruction, with a high accuracy, of the physical reality of cultural heritage sites. Obtained 3D models are often used for various applications such as promotional content creation, virtual tours, and immersive experiences. In this paper, we study new ways to interact with these high-quality 3D reconstructions in a real-world scenario. We propose a user-centric product design to create a virtual reality (VR) application specifically intended for multi-modal purposes. It is applied to the castle of Jehay (Belgium), which is under renovation, to permit multi-user digital immersive experiences. The article proposes a high-level view of multi-disciplinary processes, from a needs analysis to the 3D reality capture workflow and the creation of a VR environment incorporated into an immersive application. We provide several relevant VR parameters for the scene optimization, the locomotion system, and the multi-user environment definition that were tested in a heritage tourism context.

» Show BibTeX

@article{poux2020a,
title={Initial User-Centered Design of a Virtual Reality Heritage System: Applications for Digital Tourism},
volume={12},
ISSN={2072-4292},
url={http://dx.doi.org/10.3390/rs12162583},
DOI={10.3390/rs12162583},
number={16},
journal={Remote Sensing},
publisher={MDPI AG},
author={Poux, Florent and Valembois, Quentin and Mattes, Christian and Kobbelt, Leif and Billen, Roland},
year={2020},
month={Aug},
pages={2583}
}





Reposing Humans by Warping 3D Features


Markus Knoche, István Sárándi, Bastian Leibe
Workshop on Towards Human-Centric Image/Video Synthesis -- Conference on Computer Vision and Pattern Recognition (CVPRW'20)
pubimg

We address the problem of reposing an image of a human into any desired novel pose. This conditional image-generation task requires reasoning about the 3D structure of the human, including self-occluded body parts. Most prior works are either based on 2D representations or require fitting and manipulating an explicit 3D body mesh. Based on the recent success in deep learning-based volumetric representations, we propose to implicitly learn a dense feature volume from human images, which lends itself to simple and intuitive manipulation through explicit geometric warping. Once the latent feature volume is warped according to the desired pose change, the volume is mapped back to RGB space by a convolutional decoder. Our state-of-the-art results on the DeepFashion and the iPER benchmarks indicate that dense volumetric human representations are worth investigating in more detail.

» Show BibTeX

@inproceedings{Knoche20CVPRW,
author = {Markus Knoche and Istv\'an S\'ar\'andi and Bastian Leibe},
title = {Reposing Humans by Warping 3{D} Features},
booktitle = {CVPR Workshop on Towards Human-Centric Image/Video Synthesis},
year = {2020}
}





Calibratio - A Small, Low-Cost, Fully Automated Motion-to-Photon Measurement Device


Sebastian Pape, Marcel Krüger, Jan Müller, Torsten Wolfgang Kuhlen
10th Workshop on Software Engineering and Architectures for Realtime Interactive Systems (SEARIS), 2020
pubimg

Since the beginning of the design and implementation of virtual environments, these systems have been built to give the users the best possible experience. One detrimental factor for the user experience was shown to be a high end-to-end latency, here measured as motionto-photon latency, of the system. Thus, a lot of research in the past was focused on the measurement and minimization of this latency in virtual environments. Most existing measurement-techniques require either expensive measurement hardware like an oscilloscope, mechanical components like a pendulum or depend on manual evaluation of samples. This paper proposes a concept of an easy to build, low-cost device consisting of a microcontroller, servo motor and a photo diode to measure the motion-to-photon latency in virtual reality environments fully automatically. It is placed or attached to the system, calibrates itself and is controlled/monitored via a web interface. While the general concept is applicable to a variety of VR technologies, this paper focuses on the context of CAVE-like systems.

» Show BibTeX

@InProceedings{Pape2020a,
author = {Sebastian Pape and Marcel Kr\"{u}ger and Jan M\"{u}ller and Torsten W. Kuhlen},
title = {{Calibratio - A Small, Low-Cost, Fully Automated Motion-to-Photon Measurement Device}},
booktitle = {10th Workshop on Software Engineering and Architectures for Realtime Interactive Systems (SEARIS)},
year = {2020},
month={March}
}





Joint Dual-Tasking in VR: Outlining the Behavioral Design of Interactive Human Companions Who Walk and Talk with a User


Andrea Bönsch, Torsten Wolfgang Kuhlen
IEEE Virtual Humans and Crowds for Immersive Environments (VHCIE), 2020
pubimg

To resemble realistic and lively places, virtual environments are increasingly often enriched by virtual populations consisting of computer-controlled, human-like virtual agents. While the applications often provide limited user-agent interaction based on, e.g., collision avoidance or mutual gaze, complex user-agent dynamics such as joint locomotion combined with a secondary task, e.g., conversing, are rarely considered yet. These dual-tasking situations, however, are beneficial for various use-cases: guided tours and social simulations will become more realistic and engaging if a user is able to traverse a scene as a member of a social group, while platforms to study crowd and walking behavior will become more powerful and informative. To this end, this presentation deals with different areas of interaction dynamics, which need to be combined for modeling dual-tasking with virtual agents. Areas covered are kinematic parameters for the navigation behavior, group shapes in static and mobile situations as well as verbal and non-verbal behavior for conversations.

» Show BibTeX

@InProceedings{Boensch2020a,
author = {Andrea B\"{o}nsch and Torsten W. Kuhlen},
title = {{Joint Dual-Tasking in VR: Outlining the Behavioral Design of Interactive Human Companions Who Walk and Talk with a User}},
booktitle = {IEEE Virtual Humans and Crowds for Immersive Environments (VHCIE)},
year = {2020},
month={March}
}





Towards a Graphical User Interface for Exploring and Fine-Tuning Crowd Simulations


Andrea Bönsch, Marcel Jonda, Jonathan Ehret, Torsten Wolfgang Kuhlen
IEEE Virtual Humans and Crowds for Immersive Environments (VHCIE), 2020
pubimg

Simulating a realistic navigation of virtual pedestrians through virtual environments is a recurring subject of investigations. The various mathematical approaches used to compute the pedestrians’ paths result, i.a., in different computation-times and varying path characteristics. Customizable parameters, e.g., maximal walking speed or minimal interpersonal distance, add another level of complexity. Thus, choosing the best-fitting approach for a given environment and use-case is non-trivial, especially for novice users.

To facilitate the informed choice of a specific algorithm with a certain parameter set, crowd simulation frameworks such as Menge provide an extendable collection of approaches with a unified interface for usage. However, they often miss an elaborated visualization with high informative value accompanied by visual analysis methods to explore the complete simulation data in more detail – which is yet required for an informed choice. Benchmarking suites such as SteerBench are a helpful approach as they objectively analyze crowd simulations, however they are too tailored to specific behavior details. To this end, we propose a preliminary design of an advanced graphical user interface providing a 2D and 3D visualization of the crowd simulation data as well as features for time navigation and an overall data exploration.

» Show BibTeX

@InProceedings{Boensch2020b,
author = {Andrea B\"{o}nsch and Marcel Jonda and Jonathan Ehret and Torsten W. Kuhlen},
title = {{Towards a Graphical User Interface for Exploring and Fine-Tuning Crowd Simulations}},
booktitle = {IEEE Virtual Humans and Crowds for Immersive Environments (VHCIE)},
year = {2020},
month={March}
}





Talk: Insite: A Generalized Pipeline for In-transit Visualization and Analysis


Simon Oehrl, Jan Müller, Ali Can Demiralp, Marcel Krüger, Sebastian Spreizer, Benjamin Weyers, Torsten Wolfgang Kuhlen
NEST Conference 2020
pubimg

Neuronal network simulators are essential to computational neuroscience, enabling the study of the nervous system through in-silico experiments. Through utilization of high-performance computing resources, these simulators are able to simulate increasingly complex and large networks of neurons today. It also creates new challenges for the analysis and visualization of such simulations. In-situ and in-transport strategies are popular approaches in these scenarios. They enable live monitoring of running simulations and parameter adjustment in the case of erroneous configurations which can save valuable compute resources.

This talk will present the current status of our pipeline for in-transport analysis and visualization of neuronal network simulator data. The pipeline is able to couple with NEST along other simulators with data management (querying, filtering and merging) from multiple simulator instances. Finally, the data is passed to end-user applications for visualization and analysis. The goal is to be integrated into third party tools such as the multi-view visual analysis toolkit ViSimpl.




Talk: Proximity in Social VR - Interpersonal Distance between a User and Virtual Agents


Andrea Bönsch
3rd Workshop on "Person-to-Person Interaction: From Analysis to Applications", 2020

Proxemic is a well known social behavioral measure, where the interpersonal distance between interactans is evaluated - either in real or in virtual social encounters. Given the prominent role of emotional expressions in our everyday social interactions, we investigated how emotions of a virtual agent affect proxemic adaptions while taking the aspects spatial constellation between user and agent as well as user’s level of dynamics into account.



DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data


Dan Jia, Alexander Hermans, Bastian Leibe
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020
pubimg

Detecting persons using a 2D LiDAR is a challenging task due to the low information content of 2D range data. To alleviate the problem caused by the sparsity of the LiDAR points, current state-of-the-art methods fuse multiple previous scans and perform detection using the combined scans. The downside of such a backward looking fusion is that all the scans need to be aligned explicitly, and the necessary alignment operation makes the whole pipeline more expensive -- often too expensive for real-world applications. In this paper, we propose a person detection network which uses an alternative strategy to combine scans obtained at different times. Our method, Distance Robust SPatial Attention and Auto-regressive Model (DR-SPAAM), follows a forward looking paradigm. It keeps the intermediate features from the backbone network as a template and recurrently updates the template when a new scan becomes available. The updated feature template is in turn used for detecting persons currently in the scene. On the DROW dataset, our method outperforms the existing state-of-the-art, while being approximately four times faster, running at 87.2 FPS on a laptop with a dedicated GPU and at 22.6 FPS on an NVIDIA Jetson AGX embedded GPU. We release our code in PyTorch and a ROS node including pre-trained models.




MeTRAbs: Metric-Scale Truncation-Robust Heatmaps for Absolute 3D Human Pose Estimation


István Sárándi, Timm Linder, Kai O. Arras, Bastian Leibe
ArXiv
pubimg

Heatmap representations have formed the basis of human pose estimation systems for many years, and their extension to 3D has been a fruitful line of recent research. This includes 2.5D volumetric heatmaps, whose X and Y axes correspond to image space and Z to metric depth around the subject. To obtain metric-scale predictions, 2.5D methods need a separate post-processing step to resolve scale ambiguity. Further, they cannot localize body joints outside the image boundaries, leading to incomplete estimates for truncated images. To address these limitations, we propose metric-scale truncation-robust (MeTRo) volumetric heatmaps, whose dimensions are all defined in metric 3D space, instead of being aligned with image space. This reinterpretation of heatmap dimensions allows us to directly estimate complete, metric-scale poses without test-time knowledge of distance or relying on anthropometric heuristics, such as bone lengths. To further demonstrate the utility our representation, we present a differentiable combination of our 3D metric-scale heatmaps with 2D image-space ones to estimate absolute 3D pose (our MeTRAbs architecture). We find that supervision via absolute pose loss is crucial for accurate non-root-relative localization. Using a ResNet-50 backbone without further learned layers, we obtain state-of-the-art results on Human3.6M, MPI-INF-3DHP and MuPoTS-3D. Our code will be made publicly available to facilitate further research.

» Show BibTeX

@article{sarandi2020metrabs,
title={MeTRAbs: Metric-Scale Truncation-Robust Heatmaps for Absoute 3{D} Human Pose Estimation},
author={Istv\'an S\'ar\'andi and Timm Linder and Kai O. Arras and Bastian Leibe},
journal={arXiv preprint arXiv:2007.07227},
year={2020}
}





Accurately Solving Physical Systems with Graph Learning


Han Shao, Tassilo Kugelstadt, Wojciech Palubicki, Jan Bender, Sören Pirk, Dominik L. Michels
arXiv
pubimg

Iterative solvers are widely used to accurately simulate physical systems. These solvers require initial guesses to generate a sequence of improving approximate solutions. In this contribution, we introduce a novel method to accelerate iterative solvers for physical systems with graph networks (GNs) by predicting the initial guesses to reduce the number of iterations. Unlike existing methods that aim to learn physical systems in an end-to-end manner, our approach guarantees long-term stability and therefore leads to more accurate solutions. Furthermore, our method improves the run time performance of traditional iterative solvers. To explore our method we make use of position-based dynamics (PBD) as a common solver for physical systems and evaluate it by simulating the dynamics of elastic rods. Our approach is able to generalize across different initial conditions, discretizations, and realistic material properties. Finally, we demonstrate that our method also performs well when taking discontinuous effects into account such as collisions between individual rods.

» Show BibTeX

@misc{shao2020accurately,
title={Accurately Solving Physical Systems with Graph Learning},
author={Han Shao and Tassilo Kugelstadt and Wojciech Pa{\l{}}ubicki and Jan Bender and S{\"o}ren Pirk and Dominik L. Michels},
year={2020},
eprint={2006.03897},
archivePrefix={arXiv},
primaryClass={physics.comp-ph}
}





Single-Shot Panoptic Segmentation


Mark Weber, Jonathon Luiten, Bastian Leibe
Arxiv
pubimg

We present a novel end-to-end single-shot method that segments countable object instances (things) as well as background regions (stuff) into a non-overlapping panoptic segmentation at almost video frame rate. Current state-of-the-art methods are far from reaching video frame rate and mostly rely on merging instance segmentation with semantic background segmentation. Our approach relaxes this requirement by using an object detector but is still able to resolve inter- and intra-class overlaps to achieve a non-overlapping segmentation. On top of a shared encoder-decoder backbone, we utilize multiple branches for semantic segmentation, object detection, and instance center prediction. Finally, our panoptic head combines all outputs into a panoptic segmentation and can even handle conflicting predictions between branches as well as certain false predictions. Our network achieves 32.6% PQ on MS-COCO at 21.8 FPS, opening up panoptic segmentation to a broader field of applications.

» Show BibTeX

@article{weber2019single,
title={Single-Shot Panoptic Segmentation},
author={Weber, Mark and Luiten, Jonathon and Leibe, Bastian},
journal={arXiv preprint arXiv:1911.00764},
year={2019}
}






Previous Year (2019)
Disclaimer Home Visual Computing institute RWTH Aachen University