004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Bachelor Thesis (6)
- Diploma Thesis (5)
- Doctoral Thesis (5)
- Master's Thesis (3)
- Part of Periodical (3)
- Study Thesis (2)
- Conference Proceedings (1)
Language
- English (25) (remove)
Keywords
- Bildverarbeitung (2)
- Computer Graphics (2)
- Computergraphik (2)
- Graphik (2)
- Line Space (2)
- OpenGL (2)
- Volumen-Rendering (2)
- Acceleration Structures (1)
- Action Recognition (1)
- Action Segmentation (1)
Institute
- Institut für Computervisualistik (25) (remove)
Leichte Sprache (LS, easy-to-read German) is a simplified variety of German. It is used to provide barrier-free texts for a broad spectrum of people, including lowliterate individuals with learning difficulties, intellectual or developmental disabilities (IDD) and/or complex communication needs (CCN). In general, LS authors are proficient in standard German and do not belong to the aforementioned group of people. Our goal is to empower the latter to participate in written discourse themselves. This requires a special writing system whose linguistic support and ergonomic software design meet the target group’s specific needs. We present EasyTalk a system profoundly based on natural language processing (NLP) for assistive writing in an extended variant of LS (ELS). EasyTalk provides users with a personal vocabulary underpinned with customizable communication symbols and supports in writing at their individual level of proficiency through interactive user guidance. The system minimizes the grammatical knowledge needed to produce correct and coherent complex contents by intuitively formulating linguistic decisions. It provides easy dialogs for selecting options from a natural-language paraphrase generator, which provides context-sensitive suggestions for sentence components and correctly inflected word forms. In addition, EasyTalk reminds users to add text elements that enhance text comprehensibility in terms of audience design (e.g., time and place of an event) and improve text coherence (e.g., explicit connectors to express discourse-relations). To tailor the system to the needs of the target group, the development of EasyTalk followed the principles of human-centered design (HCD). Accordingly, we matured the system in iterative development cycles, combined with purposeful evaluations of specific aspects conducted with expert groups from the fields of CCN, LS, and IT, as well as L2 learners of the German language. In a final case study, members of the target audience tested the system in free writing sessions. The study confirmed that adults with IDD and/or CCN who have low reading, writing, and computer skills can write their own personal texts in ELS using EasyTalk. The positive feedback from all tests inspires future long-term studies with EasyTalk and further development of this prototypical system, such as the implementation of a so-called Schreibwerkstatt (writing workshop)
On the recognition of human activities and the evaluation of its imitation by robotic systems
(2023)
This thesis addresses the problem of action recognition through the analysis of human motion and the benchmarking of its imitation by robotic systems.
For our action recognition related approaches, we focus on presenting approaches that generalize well across different sensor modalities. We transform multivariate signal streams from various sensors to a common image representation. The action recognition problem on sequential multivariate signal streams can then be reduced to an image classification task for which we utilize recent advances in machine learning. We demonstrate the broad applicability of our approaches formulated as a supervised classification task for action recognition, a semi-supervised classification task for one-shot action recognition, modality fusion and temporal action segmentation.
For action classification, we use an EfficientNet Convolutional Neural Network (CNN) model to classify the image representations of various data modalities. Further, we present approaches for filtering and the fusion of various modalities on a representation level. We extend the approach to be applicable for semi-supervised classification and train a metric-learning model that encodes action similarity. During training, the encoder optimizes the distances in embedding space for self-, positive- and negative-pair similarities. The resulting encoder allows estimating action similarity by calculating distances in embedding space. At training time, no action classes from the test set are used.
Graph Convolutional Network (GCN) generalized the concept of CNNs to non-Euclidean data structures and showed great success for action recognition directly operating on spatio-temporal sequences like skeleton sequences. GCNs have recently shown state-of-the-art performance for skeleton-based action recognition but are currently widely neglected as the foundation for the fusion of various sensor modalities. We propose incorporating additional modalities, like inertial measurements or RGB features, into a skeleton-graph, by proposing fusion on two different dimensionality levels. On a channel dimension, modalities are fused by introducing additional node attributes. On a spatial dimension, additional nodes are incorporated into the skeleton-graph.
Transformer models showed excellent performance in the analysis of sequential data. We formulate the temporal action segmentation task as an object detection task and use a detection transformer model on our proposed motion image representations. Experiments for our action recognition related approaches are executed on large-scale publicly available datasets. Our approaches for action recognition for various modalities, action recognition by fusion of various modalities, and one-shot action recognition demonstrate state-of-the-art results on some datasets.
Finally, we present a hybrid imitation learning benchmark. The benchmark consists of a dataset, metrics, and a simulator integration. The dataset contains RGB-D image sequences of humans performing movements and executing manipulation tasks, as well as the corresponding ground truth. The RGB-D camera is calibrated against a motion-capturing system, and the resulting sequences serve as input for imitation learning approaches. The resulting policy is then executed in the simulated environment on different robots. We propose two metrics to assess the quality of the imitation. The trajectory metric gives insights into how close the execution was to the demonstration. The effect metric describes how close the final state was reached according to the demonstration. The Simitate benchmark can improve the comparability of imitation learning approaches.
Ray tracing acceleration through dedicated data structures has long been an important topic in computer graphics. In general, two different approaches are proposed: spatial and directional acceleration structures. The thesis at hand presents an innovative combined approach of these two areas, which enables a further acceleration of the tracing process of rays. State-of-the-art spatial data structures are used as base structures and enhanced by precomputed directional visibility information based on a sophisticated abstraction concept of shafts within an original structure, the Line Space.
In the course of the work, novel approaches for the precomputed visibility information are proposed: a binary value that indicates whether a shaft is empty or non-empty as well as a single candidate approximating the actual surface as a representative candidate. It is shown how the binary value is used in a simple but effective empty space skipping technique, which allows a performance gain in ray tracing of up to 40% compared to the pure base data structure, regardless of the spatial structure that is actually used. In addition, it is shown that this binary visibility information provides a fast technique for calculating soft shadows and ambient occlusion based on blocker approximations. Although the results contain a certain inaccuracy error, which is also presented and discussed, it is shown that a further tracing acceleration of up to 300% compared to the base structure is achieved. As an extension of this approach, the representative candidate precomputation is demonstrated, which is used to accelerate the indirect lighting computation, resulting in a significant performance gain at the expense of image errors. Finally, techniques based on two-stage structures and a usage heuristic are proposed and evaluated. These reduce memory consumption and approximation errors while maintaining the performance gain and also enabling further possibilities with object instancing and rigid transformations.
All performance and memory values as well as the approximation errors are measured, presented and discussed. Overall, the Line Space is shown to result in a considerate improvement in ray tracing performance at the cost of higher memory consumption and possible approximation errors. The presented findings thus demonstrate the capability of the combined approach and enable further possibilities for future work.
The mitral valve is one of four human heart valves. It is located in the left heart and acts as a unidirectional passageway for blood between the left atrium and the left ventricle. A correctly functioning mitral valve prevents a backflow of blood into the pulmonary circulation (lungs) and thus constitutes a vital part of the cardiac cycle. Pathologies of the mitral valve can manifest in a variety of symptoms with severity ranging from chest pain and fatigue to pulmonary edema (fluid accumulation in the tissue and air space of lungs), which may ultimately cause respiratory failure.
Malfunctioning mitral valves can be restored through complex surgical interventions, which greatly benefit from intensive planning and pre-operative analysis. Visualization techniques provide a possibility to enhance such preparation processes and can also facilitate post-operative evaluation. The work at hand extends current research in this field, building upon patient-specific mitral valve segmentations developed at the German Cancer Research Center, which result in triangulated 3D models of the valve surface. The core of this work will be the construction of a 2D-view of these models through global parameterization, a method that can be used to establish a bijective mapping between a planar parameter domain and a surface embedded in higher dimensions.
A flat representation of the mitral valve provides physicians with a view of the whole surface at once, similar to a map. This allows assessment of the valve's area and shape without the need for different viewing angles. Parts of the valve that are occluded by geometry in 3D become visible in 2D.
An additional contribution of this work will be the exploration of different visualizations of the 3D and 2D mitral valve representations. Features of the valve can be highlighted by associating them with specified colors, which can for instance directly convey pathology indicators.
Quality and effectiveness of the proposed methods were evaluated through a survey conducted at the Heidelberg University Hospital.
This thesis addresses the automated identification and localization of a time-varying number of objects in a stream of sensor data. The problem is challenging due to its combinatorial nature: If the number of objects is unknown, the number of possible object trajectories grows exponentially with the number of observations. Random finite sets are a relatively new theory that has been developed to derive at principled and efficient approximations. It is based around set-valued random variables that contain an unknown number of elements which appear in arbitrary order and are themselves random. While extensively studied in theory, random finite sets have not yet become a leading paradigm in practical computer vision and robotics applications. This thesis explores random finite sets in visual tracking applications. The first method developed in this thesis combines set-valued recursive filtering with global optimization. The problem is approached in a min-cost flow network formulation, which has become a standard inference framework for multiple object tracking due to its efficiency and optimality. A main limitation of this formulation is a restriction to unary and pairwise cost terms. This circumstance makes integration of higher-order motion models challenging. The method developed in this thesis approaches this limitation by application of a Probability Hypothesis Density filter. The Probability Hypothesis Density filter was the first practically implemented state estimator based on random finite sets. It circumvents the combinatorial nature of data association itself by propagation of an object density measure that can be computed efficiently, without maintaining explicit trajectory hypotheses. In this work, the filter recursion is used to augment measurements with an additional hidden kinematic state to be used for construction of more informed flow network cost terms, e.g., based on linear motion models. The method is evaluated on public benchmarks where a considerate improvement is achieved compared to network flow formulations that are based on static features alone, such as distance between detections and appearance similarity. A second part of this thesis focuses on the related task of detecting and tracking a single robot operator in crowded environments. Different from the conventional multiple object tracking scenario, the tracked individual can leave the scene and later reappear after a longer period of absence. Therefore, a re-identification component is required that picks up the track on reentrance. Based on random finite sets, the Bernoulli filter is an optimal Bayes filter that provides a natural representation for this type of problem. In this work, it is shown how the Bernoulli filter can be combined with a Probability Hypothesis Density filter to track operator and non-operators simultaneously. The method is evaluated on a publicly available multiple object tracking dataset as well as on custom sequences that are specific to the targeted application. Experiments show reliable tracking in crowded scenes and robust re-identification after long term occlusion. Finally, a third part of this thesis focuses on appearance modeling as an essential aspect of any method that is applied to visual object tracking scenarios. Therefore, a feature representation that is robust to pose variations and changing lighting conditions is learned offline, before the actual tracking application. This thesis proposes a joint classification and metric learning objective where a deep convolutional neural network is trained to identify the individuals in the training set. At test time, the final classification layer can be stripped from the network and appearance similarity can be queried using cosine distance in representation space. This framework represents an alternative to direct metric learning objectives that have required sophisticated pair or triplet sampling strategies in the past. The method is evaluated on two large scale person re-identification datasets where competitive results are achieved overall. In particular, the proposed method better generalizes to the test set compared to a network trained with the well-established triplet loss.
This paper describes the robot Lisa used by team
homer@UniKoblenz of the University of Koblenz Landau, Germany, for the participation at the RoboCup@Home 2016 in Leipzig, Germany. A special focus is put on novel system components and the open source contributions of our team. We have released packages for object recognition, a robot face including speech synthesis, mapping and navigation, speech recognition interface via android and a GUI. The packages are available (and new packages will be released) on http://wiki.ros.org/agas-ros-pkg.
Proceedings of the 9th Open German-Russian Workshop on Pattern Recognition and Image Understanding
(2015)
The Proceedings of the 9th Open German-Russian Workshop on Pattern Recognition and Image Understanding include publications (extended abstracts), that cover but are not limited to the following topics: - Mathematical Theory of Pattern Recognition, Image and Speech Processing, Analysis, Recognition and Understanding. - Cognitive Technologies, Information Technologies, Automated Systems and Software for Pattern Recognition, Image, Speech and Signal Processing, Analysis and Understanding - Databases, Knowledge Bases, and Linguistic Tools - Special-Purpose Architectures, Software and Hardware Tools - Vision and Sensor Data Interpretation for Robotics - Industrial, Medical, Multimedia and Other Applications - Algorithms, Software, Automated Systems and Information Technologies in Bioinformatics and Medical Informatics. The workshop took place from December 1st-5th, 2014, at the University of Koblenz-Landau in Koblenz, Germany.
The mitral valve is one of the four valves in the human heart. It is located in the left heart chamber and its function is to control the blood flow from the left atrium to the left ventricle. Pathologies can lead to malfunctions of the valve so that blood can flow back to the atrium. Patients with a faulty mitral valve function may suffer from fatigue and chest pain. The functionality can be surgically restored, which is often a long and exhaustive intervention. Thorough planning is necessary to ensure a safe and effective surgery. This can be supported by creating pre-operative segmentations of the mitral valve. A post-operative analysis can determine the success of an intervention. This work will combine existing and new ideas to propose a new approach to (semi-)automatically create such valve models. The manual part can guarantee a high quality model and reliability, whereas the automatic part contributes to saving valuable labour time.
The main contributions of the automatic algorithm are an estimated semantic separation of the two leaflets of the mitral valve and an optimization process that is capable of finding a coaptation-line and -area between the leaflets. The segmentation method can perform a fully automatic segmentation of the mitral leaflets if the annulus ring is already given. The intermediate steps of this process will be integrated into a manual segmentation method so a user can guide the whole procedure. The quality of the valve models generated by the method proposed in this work will be measured by comparing them to completely manually segmented models. This will show that commonly used methods to measure the quality of a segmentation are too general and do not suffice to reflect the real quality of a model. Consequently the work at hand will introduce a set of measurements that can qualify a mitral valve segmentation in more detail and with respect to anatomical landmarks. Besides the intra-operative support for a surgeon, a segmented mitral valve provides additional benefits. The ability to patient-specifically obtain and objectively describe the valve anatomy may be the base for future medical research in this field and automation allows to process large data sets with reduced expert dependency. Further, simulation methods that use the segmented models as input may predict the outcome of a surgery.
In this thesis we present an approach to track a RGB-D camera in 6DOF andconstruct 3D maps. We first acquire, register and synchronize RGB and depth images. After preprocessing we extract FAST features and match them between two consecutive frames. By depth projection we regain the z-value for the inlier correspondences. Afterwards we estimate the camera motion by 3D point set alignment between the correspondence set using least-squares. This local motion estimate is incrementally applied to a global transformation. Additionally wernpresent methods to build maps based on point cloud data acquired by a RGB-D camera. For map creation we use the OctoMap framework and optionally create a colored point cloud map. The system is evaluated with the widespread RGB-D benchmark.
Real-time graphics applications are tending to get more realistic and approximate real world illumination gets more reasonable due to improvement of graphics hardware. Using a wide variation of algorithms and ideas, graphics processing units (GPU) can simulate complex lighting situations rendering computer generated imagery with complicated effects such as shadows, refraction and reflection of light. Particularly, reflections are an improvement of realism, because they make shiny materials, e.g. brushed metals, wet surfaces like puddles or polished floors, appear more realistic and reveal information of their properties such as roughness and reflectance. Moreover, reflections can get more complex, depending on the view: a wet surface like a street during rain for example will reflect lights depending on the distance of the viewer, resulting in more streaky reflection, which will look more stretched, if the viewer is locatedrnfarther away from the light source. This bachelor thesis aims to give an overview of the state-of-the-art in terms of rendering reflections. Understanding light is a basic need to understand reflections and therefore a physical model of light and its reflection will be covered in section 2, followed by the motivational section 2.2, that will give visual appealing examples for reflections from the real world and the media. Coming to rendering techniques, first, the main principle will be explained in section 3 followed by a short general view of a wide variety of approaches that try to generate correct reflections in section 4. This thesis will describe the implementation of three major algorithms, that produce plausible local reflections. Therefore, the developed framework is described in section 5, then three major algorithms will be covered, that are common methods in most current game and graphics engines: Screen space reflections (SSR), parallax-corrected cube mapping (PCCM) and billboard reflections (BBR). After describing their functional principle, they will be analysed of their visual quality and the possibilities of their real-time application. Finally they will be compared to each other to investigate the advantages and disadvantages over each other. In conclusion, the gained experiences will be described by summarizing advantages and disadvantages of each technique and giving suggestions for improvements. A short perspective will be given, trying to create a view of upcoming real-time rendering techniques for the creation of reflections as specular effects.