004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Diploma Thesis (77)
- Study Thesis (76)
- Bachelor Thesis (37)
- Master's Thesis (13)
- Doctoral Thesis (8)
- Part of Periodical (8)
- Conference Proceedings (3)
Language
- German (195)
- English (25)
- Multiple languages (2)
Keywords
- Bildverarbeitung (13)
- Robotik (10)
- Augmented Reality (8)
- Computergraphik (8)
- OpenGL (8)
- Programmierung (5)
- Shader (5)
- Volumendaten (5)
- Android (4)
- Computergrafik (4)
Institute
- Institut für Computervisualistik (222) (remove)
On the recognition of human activities and the evaluation of its imitation by robotic systems
(2023)
This thesis addresses the problem of action recognition through the analysis of human motion and the benchmarking of its imitation by robotic systems.
For our action recognition related approaches, we focus on presenting approaches that generalize well across different sensor modalities. We transform multivariate signal streams from various sensors to a common image representation. The action recognition problem on sequential multivariate signal streams can then be reduced to an image classification task for which we utilize recent advances in machine learning. We demonstrate the broad applicability of our approaches formulated as a supervised classification task for action recognition, a semi-supervised classification task for one-shot action recognition, modality fusion and temporal action segmentation.
For action classification, we use an EfficientNet Convolutional Neural Network (CNN) model to classify the image representations of various data modalities. Further, we present approaches for filtering and the fusion of various modalities on a representation level. We extend the approach to be applicable for semi-supervised classification and train a metric-learning model that encodes action similarity. During training, the encoder optimizes the distances in embedding space for self-, positive- and negative-pair similarities. The resulting encoder allows estimating action similarity by calculating distances in embedding space. At training time, no action classes from the test set are used.
Graph Convolutional Network (GCN) generalized the concept of CNNs to non-Euclidean data structures and showed great success for action recognition directly operating on spatio-temporal sequences like skeleton sequences. GCNs have recently shown state-of-the-art performance for skeleton-based action recognition but are currently widely neglected as the foundation for the fusion of various sensor modalities. We propose incorporating additional modalities, like inertial measurements or RGB features, into a skeleton-graph, by proposing fusion on two different dimensionality levels. On a channel dimension, modalities are fused by introducing additional node attributes. On a spatial dimension, additional nodes are incorporated into the skeleton-graph.
Transformer models showed excellent performance in the analysis of sequential data. We formulate the temporal action segmentation task as an object detection task and use a detection transformer model on our proposed motion image representations. Experiments for our action recognition related approaches are executed on large-scale publicly available datasets. Our approaches for action recognition for various modalities, action recognition by fusion of various modalities, and one-shot action recognition demonstrate state-of-the-art results on some datasets.
Finally, we present a hybrid imitation learning benchmark. The benchmark consists of a dataset, metrics, and a simulator integration. The dataset contains RGB-D image sequences of humans performing movements and executing manipulation tasks, as well as the corresponding ground truth. The RGB-D camera is calibrated against a motion-capturing system, and the resulting sequences serve as input for imitation learning approaches. The resulting policy is then executed in the simulated environment on different robots. We propose two metrics to assess the quality of the imitation. The trajectory metric gives insights into how close the execution was to the demonstration. The effect metric describes how close the final state was reached according to the demonstration. The Simitate benchmark can improve the comparability of imitation learning approaches.
Leichte Sprache (LS, easy-to-read German) is a simplified variety of German. It is used to provide barrier-free texts for a broad spectrum of people, including lowliterate individuals with learning difficulties, intellectual or developmental disabilities (IDD) and/or complex communication needs (CCN). In general, LS authors are proficient in standard German and do not belong to the aforementioned group of people. Our goal is to empower the latter to participate in written discourse themselves. This requires a special writing system whose linguistic support and ergonomic software design meet the target group’s specific needs. We present EasyTalk a system profoundly based on natural language processing (NLP) for assistive writing in an extended variant of LS (ELS). EasyTalk provides users with a personal vocabulary underpinned with customizable communication symbols and supports in writing at their individual level of proficiency through interactive user guidance. The system minimizes the grammatical knowledge needed to produce correct and coherent complex contents by intuitively formulating linguistic decisions. It provides easy dialogs for selecting options from a natural-language paraphrase generator, which provides context-sensitive suggestions for sentence components and correctly inflected word forms. In addition, EasyTalk reminds users to add text elements that enhance text comprehensibility in terms of audience design (e.g., time and place of an event) and improve text coherence (e.g., explicit connectors to express discourse-relations). To tailor the system to the needs of the target group, the development of EasyTalk followed the principles of human-centered design (HCD). Accordingly, we matured the system in iterative development cycles, combined with purposeful evaluations of specific aspects conducted with expert groups from the fields of CCN, LS, and IT, as well as L2 learners of the German language. In a final case study, members of the target audience tested the system in free writing sessions. The study confirmed that adults with IDD and/or CCN who have low reading, writing, and computer skills can write their own personal texts in ELS using EasyTalk. The positive feedback from all tests inspires future long-term studies with EasyTalk and further development of this prototypical system, such as the implementation of a so-called Schreibwerkstatt (writing workshop)