004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Diploma Thesis (185)
- Bachelor Thesis (162)
- Study Thesis (136)
- Part of Periodical (126)
- Master's Thesis (81)
- Doctoral Thesis (48)
- Conference Proceedings (6)
- Book (1)
- Habilitation (1)
- Report (1)
Language
- German (543)
- English (201)
- Multiple languages (3)
Has Fulltext
- yes (747) (remove)
Keywords
- Bildverarbeitung (13)
- Augmented Reality (10)
- Robotik (10)
- Computergraphik (9)
- Computersimulation (9)
- OpenGL (8)
- Routing (8)
- Semantic Web (8)
- Computerspiel (7)
- Informatik (7)
Institute
- Fachbereich 4 (270)
- Institut für Computervisualistik (222)
- Institut für Informatik (114)
- Institut für Wirtschafts- und Verwaltungsinformatik (104)
- Institut für Management (48)
- Institut für Softwaretechnik (47)
- Institute for Web Science and Technologies (34)
- Institut für Integrierte Naturwissenschaften (4)
- An-Institute (1)
Im Rahmen dieser Arbeit soll eine Methodik erarbeitet werden, die englische, keyword-basierte Anfragen in SPARQL übersetzt und bewertet. Aus allen generierten SPARQL-Queries sollen die relevantesten ermittelt und ein Favorit bestimmt werden. Das Ergebnis soll in einer Nutzerevaluation bewertet werden.
Traditional Driver Assistance Systems (DAS) like for example Lane Departure Warning Systems or the well-known Electronic Stability Program have in common that their system and software architecture is static. This means that neither the number and topology of Electronic Control Units (ECUs) nor the presence and functionality of software modules changes after the vehicles leave the factory.
However, some future DAS do face changes at runtime. This is true for example for truck and trailer DAS as their hardware components and software entities are spread over both parts of the combination. These new requirements cannot be faced by state-of-the-art approaches of automotive software systems. Instead, a different technique of designing such Distributed Driver Assistance Systems (DDAS) needs to be developed. The main contribution of this thesis is the development of a novel software and system architecture for dynamically changing DAS using the example of driving assistance for truck and trailer. This architecture has to be able to autonomously detect and handle changes within the topology. In order to do so, the system decides which degree of assistance and which types of HMI can be offered every time a trailer is connected or disconnected. Therefore an analysis of the available software and hardware components as well as a determination of possible assistance functionality and a re-configuration of the system take place. Such adaptation can be granted by the principles of Service-oriented Architecture (SOA). In this architectural style all functionality is encapsulated in self-contained units, so-called Services. These Services offer the functionality through well-defined interfaces whose behavior is described in contracts. Using these Services, large-scale applications can be built and adapted at runtime. This thesis describes the research conducted in achieving the goals described by introducing Service-oriented Architectures into the automotive domain. SOA deals with the high degree of distribution, the demand for re-usability and the heterogeneity of the needed components.
It also applies automatic re-configuration in the event of a system change. Instead of adapting one of the frameworks available to this scenario, the main principles of Service-orientation are picked up and tailored. This leads to the development of the Service-oriented Driver Assistance (SODA) framework, which implements the benefits of Service-orientation while ensuring compatibility and compliance to automotive requirements, best-practices and standards. Within this thesis several state-of-the-art Service-oriented frameworks are analyzed and compared. Furthermore, the SODA framework as well as all its different aspects regarding the automotive software domain are described in detail. These aspects include a well-defined reference model that introduces and relates terms and concepts and defines an architectural blueprint. Furthermore, some of the modules of this blueprint such as the re-configuration module and the Communication Model are presented in full detail. In order to prove the compliance of the framework regarding state-of-the-art automotive software systems, a development process respecting today's best practices in automotive design procedures as well as the integration of SODA into the AUTOSAR standard are discussed. Finally, the SODA framework is used to build a full-scale demonstrator in order to evaluate its performance and efficiency.
Thematik dieser Arbeit ist das dreidimensionale Image-Warping für diffuse und reflektierende Oberflächen. Das Warpingverfahren für den reflektierenden Fall gibt es erst seit 2014. Bei diesem neuen Algorithmus treten Artefakte auf, sobald ein Bild für einen alternativen Blickwinkel auf eine sehr unebene Fläche berechnet werden soll.
In dieser Arbeit wird der Weg von einem Raytracer, der die Eingabetexturen erzeugt, über das Warpingverfahren für beide Arten der Oberflächen, bis zur Optimierung des Reflective-Warping-Verfahrens erarbeitet. Schließlich werden die Ergebnisse der Optimierung bewertet und in den aktuellen sowie zukünftigen Stand der Technik eingeordnet.
In der vorliegenden Arbeit sollen weltweit herrschende inhaltliche Ausprägungen und Schwerpunkte des Themengebiets "BMI" bzw. "GMI" mit Hilfe des Literatur-Reviews akademischer Artikel herausgearbeitet werden. Die festgestellten Beziehungen und Zusammenhänge sollen visualisiert und lokalisiert werden, um eine globale Sicht über das Thema herzustellen. Unter anderem sollen die in das finale Set aufgenommenen Artikel auf eine Korrelation zwischen BMI und Controlling bzw. Management hin überprüft werden. Als letzter Schritt soll eine Ableitung möglicher Forschungslücken unternommen werden.
Nutzung von Big Data im Marketing : theoretische Grundlagen, Anwendungsfelder und Best-Practices
(2015)
Die zunehmende Digitalisierung des Alltags und die damit verbundene omnipräsente Datengenerierung bieten für Unternehmen und insbesondere Marketingabteilungen die Chance, Informationen in bisher ungekannter Fülle über ihre Kunden und Produkte zu erhalten. Die Gewinnung solcher Informationen aus riesigen Datenmengen, die durch neue Technologien ermöglicht wird, hat sich dabei unter dem Begriff Big Data etabliert.
Die vorliegende Arbeit analysiert diese Entwicklung im Hinblick auf das Potenzial für die unternehmerische Disziplin des Marketings. Dazu werden die theoretischen Grundlagen des Einsatzes von Big Data im Marketing identifiziert und daraus Anwendungsfelder und Best-Practice-Lösungen abgeleitet. Die Untersuchung basiert auf einer Literaturanalyse zu dem Thema Big Data Marketing, welche neben verschiedenen Studien und Befragungen auch Expertenmeinungen und Zukunftsprognosen einschließt. Die Literatur wird dabei zunächst auf die theoretischen Grundlagen des Konstrukts Big Data analysiert.
Anschließend wird die Eignung von Big Data Lösungen für den Einsatz in Unternehmen geprüft, bevor die Anwendung im Bereich des Marketings konkretisiert und analysiert wird. Es wurde dabei festgestellt, dass anhand der theoretischen Aspekte von Big Data eine starke Eignung für den Einsatz im Rahmen des Marketings besteht. Diese zeichnet sich vor allem durch die detaillierten Informationen über Verhaltensmuster von Kunden und ihre Kaufentscheidungen aus. Weiterhin wurden potenzielle Anwendungsfelder identifiziert, welche besonders im Bereich der Kundenorientierung und der Marktforschung liegen. Im Hinblick auf Best-Practice-Lösungen konnte ein grober Leitfaden für die Integration von Big Data in die Unternehmensorganisation entwickelt werden. Abschließend wurde festgehalten, dass das Thema Big Data eine hohe Relevanz für das Marketing aufweist und dies in der Zukunft maßgeblich mitbestimmen wird.
Today, augmented reality is becoming more and more important in several areas like industrial sectors, medicine, or tourism. This gain of importance can easily be explained by its powerful extension of real world content. Therefore, augmented realty became a way to explain and enhance the real world information. Yet, to create a system which can enhance a scene with additional information, the relation between the system and the real world must be known. In order to establish this relationship a commonly used method is optical tracking. The system calculates its relation to the real world from camera images. To do so, a reference which is known is needed in the scene to serve as an orientation. Today, this is mostly a 2D-marker or a 2D-texture. These are placed in the real world scenery to serve as a reference. But, this is an intrusion in the scene. That is why it is desirable that the system works without such an additional aid. An strategy without manipulating the scene is object-tracking. In this approach, any object from the scene can be used as a reference for the system. As an object is far more complex than a marker, it is harder for the system to establish its relationship with the real world. That is why most methods for 3D-object-tracking reduce the object by not using the whole object as reference. The focus of this thesis is to research how a whole object can be used as a reference in a way that either the system or the camera can be moved in any 360 degree angle around the object without loosing the relation to the real world. As a basis the augmented reality framework, the so called VisionLib, is used. Extensions to this system for 360 degree tracking are implemented in different ways and analyzed in the scope of this work. Also, the different extensions are compared. The best results were achieved by improving the reinitialization process. With this extension, current camera images of the scene are given to the system. With the hek of these images, the system can calculate the relation to the real world faster in case the relation went missing.
Code package managers like Cabal track dependencies between packages. But packages rarely use the functionality that their dependencies provide. This leads to unnecessary compilation of unused parts and to speculative conflicts between package versions where there are no conflicts. In two case studies we show how relevant these two problems are. We then describe how we could avoid them by tracking dependencies not between packages but between individual code fragments.
Einfluss eines Ausrichtungswerkzeugs auf die Bedienbarkeit in unbeaufsichtigten Eyetrackingsystemen
(2015)
Eye gaze trackers are devices that can estimate the direction of gaze of a person. Among usability testing eye tracking also allows persons with decreased limb mobility to control or to interact with the computer. The quality and availability of eye tracking equipment has been increasing while costs have been decreasing. This development leads to entering new markets by using eye tracking as an additional input dimension for a variety of applications. Up to now eye tracking has been supervised by qualified experts, who assured that the important conditions like position in front of the eye tracking device, calibration and light conditions has been kept, while using.
This thesis examines an adjustment tool, which is helping the user to adjust in front of the eye tracker and helping to keep this position during the experiment. Furthermore the accuracy while moving the head has been analysed. In this experiment an remote eye gaze tracker has been used to control a game character in the video game called 'Schau Genau!'. The goal was to determine whether the game is playable without the barrier of adjusting and calibration. The results show that adjusting in front of an eye tracker is not a problem, keeping this position is. Small changes of the head position after the calibration process leads to a lack of accuracy. Giving up the calibration and using someone else calibration shows way bigger deviation. Additional head movement increases error rate and makes controlling more difficult.
The present work introduces a rigid-body physics engine, focusing on the collision detection by GPU. The increasing performance and accessibility of modern graphics cards ensures that they can be also used for algorithms that are meant not only for imaging. This advantage is used to implement an efficient collision detection based on particles. The performance differences between CPU and GPU are presented by using a test environment.
Software systems are often developed as a set of variants to meet diverse requirements. Two common approaches to this are "clone-and-owning" and software product lines. Both approaches have advantages and disadvantages. In previous work we and collaborators proposed an idea which combines both approaches to manage variants, similarities, and cloning by using a virtual platform and cloning-related operators.
In this thesis, we present an approach for aggregating essential metadata to enable a propagate operator, which implements a form of change propagation. For this we have developed a system to annotate code similarities which were extracted throughout the history of a software repository. The annotations express similarity maintenance tasks, which can then either be executed automatically by propagate or have to be performed manually by the user. In this work we outline the automated metadata extraction process and the system for annotating similarities; we explain how the implemented system can be integrated into the workflow of an existing version control system (Git); and, finally, we present a case study using the 101haskell corpus of variants.
Die Arbeitsgruppe Echtzeitsysteme an der Universität Koblenz beschäftigt sich seit mehreren Jahren mit der Thematik autonomes und assistiertes Fahren. Eine große Herausforderung stellen in diesem Zusammenhang mehrgliedrige Fahrzeuge dar, deren Steuerung für den Fahrer während der Rückwärtsfahrt sehr anspruchsvoll ist. Um präzise Manöver zu ermöglichen, können elektronische Fahrerassistenzsysteme zum Einsatz kommen. Im Rahmen vorhergehender Arbeiten sind bereits einige Prototypen entstanden, von denen jedoch keiner eine geeignete Lösung für moderne, zweiachsige Anhänger darstellt. Im Rahmen dieser Arbeit wurde ein prototypisches Fahrerassistenzsystem entwickelt, wobei es noch weiterer Forschungs- und Entwicklungsarbeit bedarf, um das System straßentauglich zu machen.
In this work a framework is developed that is used to create an evaluation scheme for the evaluation of text processing tools. The evaluation scheme is developed using a model-dependent software evaluation approach and the focus of the model-dependent part is the text-processing process which is derived from the Conceptual Analysis Process developed in the GLODERS project. As input data a German court document is used containing two incidents of extortion racketeering which happened in 2011 and 2012. The evaluation of six different tools shows that one tool offers great results for the given dataset when it is compared to manual results. It is able to identify and visualize relations between concepts without any additional manual work. Other tools also offer good results with minor drawbacks. The biggest drawback for some tools is the unavailability of models for the German language. They can perform automated tasks only on English documents. Nonetheless some tools can be enhanced by self-written code which allows users with development experience to apply additional methods.
Durch eine systematische Literaturanalyse sollen die wichtigsten Aspekte des Phänomens Crowdsourcing abgedeckt werden. Da die Summe an Forschungsfragen relativ breit gefächert ist, soll der Fokus der Arbeit auf die im Folgenden aufgelisteten Fragen gelegt werden: Was ist unter dem Begriff Crowdsourcing gezielt zu verstehen? Wie lässt sich das Phänomen Crowdsourcing von anderen angrenzenden Konzepten trennen? Wo liegen die Gemeinsamkeiten und wesentlichen Unterschiede zwischen den einzelnen Konzepten? Welche Ausprägungsformen von Crowdsourcing sind in Theorie und Praxis vorzufinden? In welchen Bereichen kommt Crowdsourcing zum Einsatz? Welche Unternehmen setzen Crowdsourcing erfolgreich um? Welche Plattformen zur Unterstützung von Crowdsourcing sind vorhanden? Welche Ziele bzw. Ergebnisse sollen mit dem Einsatz von Crowdsourcing erreicht bzw. erzielt werden? Wie läuft der Crowdsourcing-Prozess ab und in welche Phasen lässt sich dieser unterteilen? Wie sieht die Wertschöpfung durch Crowdsourcing (a) allgemein und (b) speziell für Unternehmen aus? Welche Chancen und Potenziale sowie Risiken und Grenzen entstehen dabei den Unternehmen? Was lässt sich in Zukunft im Bereich des Crowdsourcing noch verbessern, das heißt in welchen Bereichen besteht noch Forschungsbedarf?
In this thesis we present an approach to track a RGB-D camera in 6DOF andconstruct 3D maps. We first acquire, register and synchronize RGB and depth images. After preprocessing we extract FAST features and match them between two consecutive frames. By depth projection we regain the z-value for the inlier correspondences. Afterwards we estimate the camera motion by 3D point set alignment between the correspondence set using least-squares. This local motion estimate is incrementally applied to a global transformation. Additionally wernpresent methods to build maps based on point cloud data acquired by a RGB-D camera. For map creation we use the OctoMap framework and optionally create a colored point cloud map. The system is evaluated with the widespread RGB-D benchmark.
This thesis presents an approach to optimizing the computation of soft shadows from area lights. The light source is sampled uniformly by traversing shadow rays as packets through an N-tree. This data structure stores an additional line space for every node. A line space stores precomputed information about geometry inside of shafts from one to another side of the node. This visibility information is used to terminate a ray. Additionally the graphics processing unit (short GPU) is used to speed up the computations through parallelism. The scene is rendered with OpenGL and the shadow value is computed on the GPU for each pixel. Evaluating the implementation shows a performance gain of 86% by comparison to the CPU, if using the GPU implementation. Using the line space instead of triangle intersections also increases the performance. The implementation provides good scaling with an increasing amount of triangles and has no visual disadvantages for many rays.
This habilitation thesis collects works addressing several challenges on handling uncertainty and inconsistency in knowledge representation. In particular, this thesis contains works which introduce quantitative uncertainty based on probability theory into abstract argumentation frameworks. The formal semantics of this extension is investigated and its application for strategic argumentation in agent dialogues is discussed. Moreover, both the computational as well as the meaningfulness of approaches to analyze inconsistencies, both in classical logics as well as logics for uncertain reasoning is investigated. Finally, this thesis addresses the implementation challenges for various kinds of knowledge representation formalisms employing any notion of inconsistency tolerance or uncertainty.
The goal of this thesis is to create and develop a concept for a mobile city guide combined with game-based contents.
The application is intented to support flexible and independent exploration of the city of Koblenz.
Based on the geographical data, historical information for and interesting stories of various places were provided in this application. These informations are combined with playful elements in order to create a motivating concept.
Therefore, related approaches were examined and, combined with own ideas, a new concept has been developed. This concept has been prototypically implemented as an Android application and afterwards evaluated by 15 test persons. A questionnaire was used to examine the operability, the motivation of game patterns and the additional value of the application.
Proceedings des FWS 2015
(2016)
Die Aufnahme, Verarbeitung und Analyse farbiger bzw. mehrkanaliger Bilder gewinnt seit Jahren ständig an Bedeutung. Diese Entwicklung wird durch die verbesserten technischen Möglichkeiten und die stetig steigenden Ansprüche aus den vielfältigen Anwendungsfeldern in Industrie, Medizin, Umwelt und Medien befördert. Diesem Trend folgend wurde in Koblenz 1995 erstmals der Workshop Farbbildverarbeitung durchgeführt und hat sich seitdem als jährlich stattfindende Veranstaltung etabliert. Als Veranstaltung der German ColorGroup bietet der Workshop ein Diskussionsforum für Forscher, Entwickler und Anwender, das sich den Problemen der Farbtheorie, Farbmessung, Farbbildaufnahme und spektralen Bildgewinnung ("hyper-spectral imaging") genauso wie der Entwicklung von neuen Methoden und Algorithmen zur Verarbeitung und Analyse von Farbbildern und mehrkanaligen (spektroskopischen) Bilddaten widmet. Ebenso nehmen Fragestellungen der farbtreuen Bildreproduktion auf verschiedenen Ausgabemedien wie auch die Nutzung von Methoden und Verfahren der Farbbildverarbeitung im Rahmen der industriellen Qualitätskontrolle sowie in Robotik und Automatisierung gebührenden Platz ein.
“Did I say something wrong?” A word-level analysis of Wikipedia articles for deletion discussions
(2016)
This thesis focuses on gaining linguistic insights into textual discussions on a word level. It was of special interest to distinguish messages that constructively contribute to a discussion from those that are detrimental to them. Thereby, we wanted to determine whether “I”- and “You”-messages are indicators for either of the two discussion styles. These messages are nowadays often used in guidelines for successful communication. Although their effects have been successfully evaluated multiple times, a large-scale analysis has never been conducted. Thus, we used Wikipedia Articles for Deletion (short: AfD) discussions together with the records of blocked users and developed a fully automated creation of an annotated data set. In this data set, messages were labelled either constructive or disruptive. We applied binary classifiers to the data to determine characteristic words for both discussion styles. Thereby, we also investigated whether function words like pronouns and conjunctions play an important role in distinguishing the two. We found that “You”-messages were a strong indicator for disruptive messages which matches their attributed effects on communication. However, we found “I”-messages to be indicative for disruptive messages as well which is contrary to their attributed effects. The importance of function words could neither be confirmed nor refuted. Other characteristic words for either communication style were not found. Yet, the results suggest that a different model might represent disruptive and constructive messages in textual discussions better.
Der Fachbereich 4 (Informatik) besteht aus fünfundzwanzig Arbeitsgruppen unter der Leitung von Professorinnen und Professoren, die für die Forschung und Lehre in sechs Instituten zusammenarbeiten.
In jedem Jahresbericht stellen sich die Arbeitsgruppen nach einem einheitlichen Muster dar, welche personelle Zusammensetzung sie haben, welche Projekte in den Berichtszeitraum fallen und welche wissenschaftlichen Leistungen erbracht wurden. In den folgenden Kapiteln werden einzelne Parameter aufgeführt, die den Fachbereich in quantitativer Hinsicht, was Drittmitteleinwerbungen, Abdeckung der Lehre, Absolventen oder Veröffentlichungen angeht, beschreiben.
Der Fachbereich 4 (Informatik) besteht aus fünfundzwanzig Arbeitsgruppen unter der Leitung von Professorinnen und Professoren, die für die Forschung und Lehre in sechs Instituten zusammenarbeiten.
In jedem Jahresbericht stellen sich die Arbeitsgruppen nach einem einheitlichen Muster dar, welche personelle Zusammensetzung sie haben, welche Projekte in den Berichtszeitraum fallen und welche wissenschaftlichen Leistungen erbracht wurden. In den folgenden Kapiteln werden einzelne Parameter aufgeführt, die den Fachbereich in quantitativer Hinsicht, was Drittmitteleinwerbungen, Abdeckung der Lehre, Absolventen oder Veröffentlichungen angeht, beschreiben.
The content aggregator platform Reddit has established itself as one of the most popular websites in the world. However, scientific research on Reddit is hindered as Reddit allows (and even encourages) user anonymity, i.e., user profiles do not contain personal information such as the gender. Inferring the gender of users in large-scale could enable the analysis of gender-specific areas of interest, reactions to events, and behavioral patterns. In this direction, this thesis suggests a machine learning approach of estimating the gender of Reddit users. By exploiting specific conventions in parts of the website, we obtain a ground truth for more than 190 million comments of labeled users. This data is then used to train machine learning classifiers to use them to gain insights about the gender balance of particular subreddits and the platform in general. By comparing a variety of different approaches for classification algorithm, we find that character-level convolutional neural network achieves performance with an 82.3% F1 score on a task of predicting a gender of a user based on his/her comments. The score surpasses 85% mark for frequent users with more than 50 comments. Furthermore, we discover that female users are less active on Reddit platform, they write fewer comments and post in fewer subreddits on average, when compared to male users.
Retrospektive Analyse der Ausbreitung und dynamische Erkennung von Web-Tracking durch Sandboxing
(2018)
Aktuelle quantitative Analysen von Web-Tracking bieten keinen umfassenden Überblick über dessen Entstehung, Ausbreitung und Entwicklung. Diese Arbeit ermöglicht durch Auswertung archivierter Webseiten eine rückblickende Erfassung der Entstehungsgeschichte des Web-Trackings zwischen den Jahren 2000 und 2015. Zu diesem Zweck wurde ein geeignetes Werkzeug entworfen, implementiert, evaluiert und zur Analyse von 10000 Webseiten eingesetzt. Während im Jahr 2005 durchschnittlich 1,17 Ressourcen von Drittparteien eingebettet wurden, zeigt sich ein Anstieg auf 6,61 in den darauffolgenden 10 Jahren. Netzwerkdiagramme visualisieren den Trend zu einer monopolisierten Netzstruktur, in der bereits ein einzelnes Unternehmen 80 % der Internetnutzung überwachen kann.
Trotz vielfältiger Versuche, dieser Entwicklung durch technische Maßnahmen entgegenzuwirken, erweisen sich nur wenige Selbst- und Systemschutzmaßnahmen als wirkungsvoll. Diese gehen häufig mit einem Verlust der Funktionsfähigkeit einer Webseite oder mit einer Einschränkung der Nutzbarkeit des Browsers einher. Mit der vorgestellten Studie wird belegt, dass rechtliche Vorschriften ebenfalls keinen hinreichenden Schutz bieten. An Webauftritten von Bildungseinrichtungen werden Mängel bei Erfüllung der datenschutzrechtlichen Pflichten festgestellt. Diese zeigen sich durch fehlende, fehlerhafte oder unvollständige Datenschutzerklärungen, deren Bereitstellung zu den Informationspflichten eines Diensteanbieters gehören.
Die alleinige Berücksichtigung klassischer Tracker ist nicht ausreichend, wie mit einer weiteren Studie nachgewiesen wird. Durch die offene Bereitstellung funktionaler Webseitenbestandteile kann ein Tracking-Unternehmen die Abdeckung von 38 % auf 61 % erhöhen. Diese Situation wird durch Messungen von Webseiten aus dem Gesundheitswesen belegt und aus technischer sowie rechtlicher Perspektive bewertet.
Bestehende systemische Werkzeuge zum Erfassen von Web-Tracking verwenden für ihre Messung die Schnittstellen der Browser. In der vorliegenden Arbeit wird mit DisTrack ein Framework zur Web-Tracking-Analyse vorgestellt, welches eine Sandbox-basierte Messmethodik verfolgt. Dies ist eine Vorgehensweise, die in der dynamischen Schadsoftwareanalyse erfolgreich eingesetzt wird und sich auf das Erkennen von Seiteneffekten auf das umliegende System spezialisiert. Durch diese Verhaltensanalyse, die unabhängig von den Schnittstellen des Browsers operiert, wird eine ganzheitliche Untersuchung des Browsers ermöglicht. Auf diese Weise können systemische Schwachstellen im Browser aufgezeigt werden, die für speicherbasierte Web-Tracking-Verfahren nutzbar sind.
This thesis addresses the automated identification and localization of a time-varying number of objects in a stream of sensor data. The problem is challenging due to its combinatorial nature: If the number of objects is unknown, the number of possible object trajectories grows exponentially with the number of observations. Random finite sets are a relatively new theory that has been developed to derive at principled and efficient approximations. It is based around set-valued random variables that contain an unknown number of elements which appear in arbitrary order and are themselves random. While extensively studied in theory, random finite sets have not yet become a leading paradigm in practical computer vision and robotics applications. This thesis explores random finite sets in visual tracking applications. The first method developed in this thesis combines set-valued recursive filtering with global optimization. The problem is approached in a min-cost flow network formulation, which has become a standard inference framework for multiple object tracking due to its efficiency and optimality. A main limitation of this formulation is a restriction to unary and pairwise cost terms. This circumstance makes integration of higher-order motion models challenging. The method developed in this thesis approaches this limitation by application of a Probability Hypothesis Density filter. The Probability Hypothesis Density filter was the first practically implemented state estimator based on random finite sets. It circumvents the combinatorial nature of data association itself by propagation of an object density measure that can be computed efficiently, without maintaining explicit trajectory hypotheses. In this work, the filter recursion is used to augment measurements with an additional hidden kinematic state to be used for construction of more informed flow network cost terms, e.g., based on linear motion models. The method is evaluated on public benchmarks where a considerate improvement is achieved compared to network flow formulations that are based on static features alone, such as distance between detections and appearance similarity. A second part of this thesis focuses on the related task of detecting and tracking a single robot operator in crowded environments. Different from the conventional multiple object tracking scenario, the tracked individual can leave the scene and later reappear after a longer period of absence. Therefore, a re-identification component is required that picks up the track on reentrance. Based on random finite sets, the Bernoulli filter is an optimal Bayes filter that provides a natural representation for this type of problem. In this work, it is shown how the Bernoulli filter can be combined with a Probability Hypothesis Density filter to track operator and non-operators simultaneously. The method is evaluated on a publicly available multiple object tracking dataset as well as on custom sequences that are specific to the targeted application. Experiments show reliable tracking in crowded scenes and robust re-identification after long term occlusion. Finally, a third part of this thesis focuses on appearance modeling as an essential aspect of any method that is applied to visual object tracking scenarios. Therefore, a feature representation that is robust to pose variations and changing lighting conditions is learned offline, before the actual tracking application. This thesis proposes a joint classification and metric learning objective where a deep convolutional neural network is trained to identify the individuals in the training set. At test time, the final classification layer can be stripped from the network and appearance similarity can be queried using cosine distance in representation space. This framework represents an alternative to direct metric learning objectives that have required sophisticated pair or triplet sampling strategies in the past. The method is evaluated on two large scale person re-identification datasets where competitive results are achieved overall. In particular, the proposed method better generalizes to the test set compared to a network trained with the well-established triplet loss.
This paper describes the robot Lisa used by team
homer@UniKoblenz of the University of Koblenz Landau, Germany, for the participation at the RoboCup@Home 2016 in Leipzig, Germany. A special focus is put on novel system components and the open source contributions of our team. We have released packages for object recognition, a robot face including speech synthesis, mapping and navigation, speech recognition interface via android and a GUI. The packages are available (and new packages will be released) on http://wiki.ros.org/agas-ros-pkg.
Data flow models in the literature are often very fine-grained, which transfers to the data flow analysis performed on them and thus leads to a decrease in the analysis' understandability. Since a data flow model, which abstracts from the majority of implementation details of the program modeled, allows for potentially easier to understand data flow analyses, this master thesis deals with the specification and construction of a highly abstracted data flow model and the application of data flow analyses on this model. The model and the analyses performed on it have been developed in a test-driven manner, so that a wide range of possible data flow scenarios could be covered. As a concrete data flow analysis, a static security check in the form of a detection of insufficient user input sanitization has been performed. To date, there's no data flow model on a similarly high level of abstraction. The proposed solution is therefore unique and facilitates developers without expertise in data flow analysis to perform such analyses.
Software systems have an increasing impact on our daily lives. Many systems process sensitive data or control critical infrastructure. Providing secure software is therefore inevitable. Such systems are rarely being renewed regularly due to the high costs and effort. Oftentimes, systems that were planned and implemented to be secure, become insecure because their context evolves. These systems are connected to the Internet and therefore also constantly subject to new types of attacks. The security requirements of these systems remain unchanged, while, for example, discovery of a vulnerability of an encryption algorithm previously assumed to be secure requires a change of the system design. Some security requirements cannot be checked by the system’s design but only at run time. Furthermore, the sudden discovery of a security violation requires an immediate reaction to prevent a system shutdown. Knowledge regarding security best practices, attacks, and mitigations is generally available, yet rarely integrated part of software development or covering evolution.
This thesis examines how the security of long-living software systems can be preserved taking into account the influence of context evolutions. The goal of the proposed approach, S²EC²O, is to recover the security of model-based software systems using co-evolution.
An ontology-based knowledge base is introduced, capable of managing common, as well as system-specific knowledge relevant to security. A transformation achieves the connection of the knowledge base to the UML system model. By using semantic differences, knowledge inference, and the detection of inconsistencies in the knowledge base, context knowledge evolutions are detected.
A catalog containing rules to manage and recover security requirements uses detected context evolutions to propose potential co-evolutions to the system model which reestablish the compliance with security requirements.
S²EC²O uses security annotations to link models and executable code and provides support for run-time monitoring. The adaptation of running systems is being considered as is round-trip engineering, which integrates insights from the run time into the system model.
S²EC²O is amended by prototypical tool support. This tool is used to show S²EC²O’s applicability based on a case study targeting the medical information system iTrust.
This thesis at hand contributes to the development and maintenance of long-living software systems, regarding their security. The proposed approach will aid security experts: It detects security-relevant changes to the system context, determines the impact on the system’s security and facilitates co-evolutions to recover the compliance with the security requirements.
The mitral valve is one of four human heart valves. It is located in the left heart and acts as a unidirectional passageway for blood between the left atrium and the left ventricle. A correctly functioning mitral valve prevents a backflow of blood into the pulmonary circulation (lungs) and thus constitutes a vital part of the cardiac cycle. Pathologies of the mitral valve can manifest in a variety of symptoms with severity ranging from chest pain and fatigue to pulmonary edema (fluid accumulation in the tissue and air space of lungs), which may ultimately cause respiratory failure.
Malfunctioning mitral valves can be restored through complex surgical interventions, which greatly benefit from intensive planning and pre-operative analysis. Visualization techniques provide a possibility to enhance such preparation processes and can also facilitate post-operative evaluation. The work at hand extends current research in this field, building upon patient-specific mitral valve segmentations developed at the German Cancer Research Center, which result in triangulated 3D models of the valve surface. The core of this work will be the construction of a 2D-view of these models through global parameterization, a method that can be used to establish a bijective mapping between a planar parameter domain and a surface embedded in higher dimensions.
A flat representation of the mitral valve provides physicians with a view of the whole surface at once, similar to a map. This allows assessment of the valve's area and shape without the need for different viewing angles. Parts of the valve that are occluded by geometry in 3D become visible in 2D.
An additional contribution of this work will be the exploration of different visualizations of the 3D and 2D mitral valve representations. Features of the valve can be highlighted by associating them with specified colors, which can for instance directly convey pathology indicators.
Quality and effectiveness of the proposed methods were evaluated through a survey conducted at the Heidelberg University Hospital.
Diese Arbeit soll das von Dietz und Oppermann entwickelte Planspiel „Datenschutz 2.0“ an den heutigen Alltag der Schüler anpassen, die Benutzung in der Sekundarstufe II ermöglichen und die technischen und gesetzlichen Problematiken des Planspiels beheben. Das mit dem Planspiel aufgegriffene Thema Datenschutz ist im rheinland-pfälzischen Informatik-Lehrplan für die Sekundarstufe II verankert. Hier wird der Begriff Datenschutz in der Reihe „Datenerhebung unter dem Aspekt Datenschutz beurteilen“ genannt. Jedoch werden in dem Planspiel keine Daten erhoben, sondern die selbst hinterlassenen Datenspuren untersucht. Diese Form des Datenschutzes ist im Grundkurs in der vorgeschlagenen Reihe „Datensicherheit unter der Berücksichtigung kryptologischer Verfahren erklären und beachten“ unter dem Thema Kommunikation in Rechnernetzen zu finden. Im Leistungskurs steht die Datensicherheit in gleichbenannter Reihe und Thema und in der Reihe „Datenerhebung unter dem Aspekt Datenschutz beurteilen“ im Thema Wechselwirkung zwischen Informatiksysteme, Individuum und Gesellschaft.
The goal of this thesis is to create a recommender system (RS) for business processes, based on the existing ProM plugin RegPFA. To accomplish this task, firstly an interface must be created that sets up and expands a database receiving probabilistic finite automata (PFA) created by RegPFA in tsml format as input. Secondly, a Java program must be designed that uses said database to recommend the process elements that are most likely to follow a given sequence of process elements.
Abstract
This bachelor thesis delivers a comprehensive overview of the topic Internet of Things (IoT). With the help of a first literature review, important characteristics, architectures, and properties have been identified. The main aim of this bachelor thesis is to determine whether the use of IoT in the transport of food, considering the compliance with the cold chain, can provide advantages for companies to reduce food waste. For this purpose, a second literature review has been carried out with food transport systems without the use, as well as with the use of IoT. Based on the literature review, it is possible at the end to determine a theoretical ‘ideal’ system for food transport in refrigerated trucks. The respective used technologies are also mentioned. The findings of several authors have shown that often significant improvements can be achieved in surveillance, transport in general, or traceability of food, and ultimately food waste can be reduced. However, benefits can also be gained using new non-IoT-based technologies. Thus, the main knowledge of this bachelor thesis is that a theoretical ‘ideal’ transport system contains a sensible combination of technologies with and without IoT. This system includes the use of a Wireless Sensor Network (WSN) for real-time food monitoring, as well as an alarm function when the temperature exceeds a maximum. Real-time monitoring with GPS coupled with a monitoring center to prevent traffic jams is another task. Smart and energy-efficient packaging, and finally the use of the new supercooling-technology, make the system significantly more efficient in reducing food waste. These highlights, that when choosing a transport system, which is as efficient and profitable as possible for food with refrigerated transport, companies need not just rely on the use of IoT. On this basis, it is advisable to combine the systems and technologies used so far with IoT in order to avoid as much food waste as possible.
Current political issues are often reflected in social media discussions, gathering politicians and voters on common platforms. As these can affect the public perception of politics, the inner dynamics and backgrounds of such debates are of great scientific interest. This thesis takes user generated messages from an up-to-date dataset of considerable relevance as Time Series, and applies a topic-based analysis of inspiration and agenda setting to it. The Institute for Web Science and Technologies of the University Koblenz-Landau has collected Twitter data generated beforehand by candidates of the European Parliament Election 2019. This work processes and analyzes the dataset for various properties, while focusing on the influence of politicians and media on online debates. An algorithm to cluster tweets into topical threads is introduced. Subsequently, Sequential Association Rules are mined, yielding wide array of potential influence relations between both actors and topics. The elaborated methodology can be configured with different parameters and is extensible in functionality and scope of application.
Data-minimization and fairness are fundamental data protection requirements to avoid privacy threats and discrimination. Violations of data protection requirements often result from: First, conflicts between security, data-minimization and fairness requirements. Second, data protection requirements for the organizational and technical aspects of a system that are currently dealt with separately, giving rise to misconceptions and errors. Third, hidden data correlations that might lead to influence biases against protected characteristics of individuals such as ethnicity in decision-making software. For the effective assurance of data protection needs,
it is important to avoid sources of violations right from the design modeling phase. However, a model-based approach that addresses the issues above is missing.
To handle the issues above, this thesis introduces a model-based methodology called MoPrivFair (Model-based Privacy & Fairness). MoPrivFair comprises three sub-frameworks: First, a framework that extends the SecBPMN2 approach to allow detecting conflicts between security, data-minimization and fairness requirements. Second, a framework for enforcing an integrated data-protection management throughout the development process based on a business processes model (i.e., SecBPMN2 model) and a software architecture model (i.e., UMLsec model) annotated with data protection requirements while establishing traceability. Third, the UML extension UMLfair to support individual fairness analysis and reporting discriminatory behaviors. Each of the proposed frameworks is supported by automated tool support.
We validated the applicability and usability of our conflict detection technique based on a health care management case study, and an experimental user study, respectively. Based on an air traffic management case study, we reported on the applicability of our technique for enforcing an integrated data-protection management. We validated the applicability of our individual fairness analysis technique using three case studies featuring a school management system, a delivery management system and a loan management system. The results show a promising outlook on the applicability of our proposed frameworks in real-world settings.
Nowadays, almost any IT system involves personal data processing. In
such systems, many privacy risks arise when privacy concerns are not
properly addressed from the early phases of the system design. The
General Data Protection Regulation (GDPR) prescribes the Privacy by
Design (PbD) principle. As its core, PbD obliges protecting personal
data from the onset of the system development, by effectively
integrating appropriate privacy controls into the design. To
operationalize the concept of PbD, a set of challenges emerges: First, we need a basis to define privacy concerns. Without such a basis, we are not able to verify whether personal data processing is authorized. Second, we need to identify where precisely in a system, the controls have to be applied. This calls for system analysis concerning privacy concerns. Third, with a view to selecting and integrating appropriate controls, based on the results of system analysis, a mechanism to identify the privacy risks is required. Mitigating privacy risks is at the core of the PbD principle. Fourth, choosing and integrating appropriate controls into a system are complex tasks that besides risks, have to consider potential interrelations among privacy controls and the costs of the controls.
This thesis introduces a model-based privacy by design methodology to handle the above challenges. Our methodology relies on a precise definition of privacy concerns and comprises three sub-methodologies: model-based privacy analysis, modelbased privacy impact assessment and privacy-enhanced system design modeling. First, we introduce a definition of privacy preferences, which provides a basis to specify privacy concerns and to verify whether personal data processing is authorized. Second, we present a model-based methodology to analyze a system model. The results of this analysis denote a set of privacy design violations. Third, taking into account the results of privacy analysis, we introduce a model-based privacy impact assessment methodology to identify concrete privacy risks in a system model. Fourth, concerning the risks, and taking into account the interrelations and the costs of the controls, we propose a methodology to select appropriate controls and integrate them into a system design. Using various practical case studies, we evaluate our concepts, showing a promising outlook on the applicability of our methodology in real-world settings.
Social media platforms such as Twitter or Reddit allow users almost unrestricted access to publish their opinions on recent events or discuss trending topics. While the majority of users approach these platforms innocently, some groups have set their mind on spreading misinformation and influencing or manipulating public opinion. These groups disguise as native users from various countries to spread frequently manufactured articles, strong polarizing opinions in the political spectrum and possibly become providers of hate-speech or extremely political positions. This thesis aims to implement an AutoML pipeline for identifying second language speakers from English social media texts. We investigate style differences of text in different topics and across the platforms Reddit and Twitter, and analyse linguistic features. We employ feature-based models with datasets from Reddit, which include mostly English conversation from European users, and Twitter, which was newly created by collecting English tweets from selected trending topics in different countries. The pipeline classifies language family, native language and origin (Native or non-Native English speakers) of a given textual input. We evaluate the resulting classifications by comparing prediction accuracy, precision and F1 scores of our classification pipeline to traditional machine learning processes. Lastly, we compare the results from each dataset and find differences in language use for topics and platforms. We obtained high prediction accuracy for all categories on the Twitter dataset and observed high variance in features such as average text length especially for Balto-Slavic countries.
In the context of augmented reality we define tracking as a collection of methods to obtain the position and orientation (pose) of a user. By means of various displaying techniques, this ensures a correct visual overlay of graphical information onto the reality perceived. Precise results for calculation of the camera pose are gained by methods of image processing, usually analyzing the pixels of an image and extracing features, which can be recognized over the image sequence. However, these methods do not regard the process of image synthesis or at least in a very simplyfied way. In contrast, the class of model-based methods assumes a given 3D model of the observed scene. Based on the model data features can be identified to establish correspondences in the camera image. From these feature correspondences the camera pose is calculated. An interesting approach is the strategy of analysis-by-synthesis, regarding the computer graphics rendering process for extending the knowledge about the model by information from image synthesis and other environment variables.
In this thesis the components of a tracking system are identified and further it is analyzed, to what extend information about the model, the rendering process and the environment can contribute to the components for improvement of the tracking process using analysis-by-synthesis. In particular, by using knowledge as topological information, lighting or perspective, the feature synthesis and correspondence finding should lead to visually unambiguous features that can be predicted and evaluated to be suitable for stable tracking of the camera pose.
The Web is an essential component of moving our society to the digital age. We use it for communication, shopping, and doing our work. Most user interaction in the Web happens with Web page interfaces. Thus, the usability and accessibility of Web page interfaces are relevant areas of research to make the Web more useful. Eye tracking is a tool that can be helpful in both areas, performing usability testing and improving accessibility. It can be used to understand users' attention on Web pages and to support usability experts in their decision-making process. Moreover, eye tracking can be used as an input method to control an interface. This is especially useful for people with motor impairment, who cannot use traditional input devices like mouse and keyboard. However, interfaces on Web pages become more and more complex due to dynamics, i.e., changing contents like animated menus and photo carousels. We need general approaches to comprehend dynamics on Web pages, allowing for efficient usability analysis and enjoyable interaction with eye tracking. In the first part of this thesis, we report our work on improving gaze-based analysis of dynamic Web pages. Eye tracking can be used to collect the gaze signals of users, who browse a Web site and its pages. The gaze signals show a usability expert what parts in the Web page interface have been read, glanced at, or skipped. The aggregation of gaze signals allows a usability expert insight into the users' attention on a high-level, before looking into individual behavior. For this, all gaze signals must be aligned to the interface as experienced by the users. However, the user experience is heavily influenced by changing contents, as these may cover a substantial portion of the screen. We delineate unique states in Web page interfaces including changing contents, such that gaze signals from multiple users can be aggregated correctly. In the second part of this thesis, we report our work on improving the gaze-based interaction with dynamic Web pages. Eye tracking can be used to retrieve gaze signals while a user operates a computer. The gaze signals may be interpreted as input controlling an interface. Nowadays, eye tracking as an input method is mostly used to emulate mouse and keyboard functionality, hindering an enjoyable user experience. There exist a few Web browser prototypes that directly interpret gaze signals for control, but they do not work on dynamic Web pages. We have developed a method to extract interaction elements like hyperlinks and text inputs efficiently on Web pages, including changing contents. We adapt the interaction with those elements for eye tracking as the input method, such that a user can conveniently browse the Web hands-free. Both parts of this thesis conclude with user-centered evaluations of our methods, assessing the improvements in the user experience for usability experts and people with motor impairment, respectively.
Ray tracing acceleration through dedicated data structures has long been an important topic in computer graphics. In general, two different approaches are proposed: spatial and directional acceleration structures. The thesis at hand presents an innovative combined approach of these two areas, which enables a further acceleration of the tracing process of rays. State-of-the-art spatial data structures are used as base structures and enhanced by precomputed directional visibility information based on a sophisticated abstraction concept of shafts within an original structure, the Line Space.
In the course of the work, novel approaches for the precomputed visibility information are proposed: a binary value that indicates whether a shaft is empty or non-empty as well as a single candidate approximating the actual surface as a representative candidate. It is shown how the binary value is used in a simple but effective empty space skipping technique, which allows a performance gain in ray tracing of up to 40% compared to the pure base data structure, regardless of the spatial structure that is actually used. In addition, it is shown that this binary visibility information provides a fast technique for calculating soft shadows and ambient occlusion based on blocker approximations. Although the results contain a certain inaccuracy error, which is also presented and discussed, it is shown that a further tracing acceleration of up to 300% compared to the base structure is achieved. As an extension of this approach, the representative candidate precomputation is demonstrated, which is used to accelerate the indirect lighting computation, resulting in a significant performance gain at the expense of image errors. Finally, techniques based on two-stage structures and a usage heuristic are proposed and evaluated. These reduce memory consumption and approximation errors while maintaining the performance gain and also enabling further possibilities with object instancing and rigid transformations.
All performance and memory values as well as the approximation errors are measured, presented and discussed. Overall, the Line Space is shown to result in a considerate improvement in ray tracing performance at the cost of higher memory consumption and possible approximation errors. The presented findings thus demonstrate the capability of the combined approach and enable further possibilities for future work.
Graph-based data formats are flexible in representing data. In particular semantic data models, where the schema is part of the data, gained traction and commercial success in recent years. Semantic data models are also the basis for the Semantic Web - a Web of data governed by open standards in which computer programs can freely access the provided data. This thesis is concerned with the correctness of programs that access semantic data. While the flexibility of semantic data models is one of their biggest strengths, it can easily lead to programmers accidentally not accounting for unintuitive edge cases. Often, such exceptions surface during program execution as run-time errors or unintended side-effects. Depending on the exact condition, a program may run for a long time before the error occurs and the program crashes.
This thesis defines type systems that can detect and avoid such run-time errors based on schema languages available for the Semantic Web. In particular, this thesis uses the Web Ontology Language (OWL) and its theoretic underpinnings, i.e., description logics, as well as the Shapes Constraint Language (SHACL) to define type systems that provide type-safe data access to semantic data graphs. Providing a safe type system is an established methodology for proving the absence of run-time errors in programs without requiring execution. Both schema languages are based on possible world semantics but differ in the treatment of incomplete knowledge. While OWL allows for modelling incomplete knowledge through an open-world semantics, SHACL relies on a fixed domain and closed-world semantics. We provide the formal underpinnings for type systems based on each of the two schema languages. In particular, we base our notion of types on sets of values which allows us to specify a subtype relation based on subset semantics. In case of description logics, subsumption is a routine problem. For
the type system based on SHACL, we are able to translate it into a description
logic subsumption problem.
The flexible integration of information from distributed and complex information systems poses a major challenge for organisations. The ontology-based information integration concept SoNBO (Social Network of Business Objects) developed and presented in this dissertation addresses these challenges. In an ontology-based concept, the data structure in the source systems (e.g. operational application systems) is described with the help of a schema (= ontology). The ontology and the data from the source systems can be used to create a (virtualised or materialised) knowledge graph, which is used for information access. The schema can be flexibly adapted to the changing needs of a company regarding their information integration. SoNBO differs from existing concepts known from the Semantic Web (OBDA = Ontology-based Data Access, EKG = Enterprise Knowledge Graph) both in the structure of the company-specific ontology (= Social Network of Concepts) as well as in the structure of the user-specific knowledge graph (= Social Network of Business Objects) and makes use of social principles (known from Enterprise Social Software). Following a Design Science Research approach, the SoNBO framework was developed and the findings documented in this dissertation. The framework provides guidance for the introduction of SoNBO in a company and the knowledge gained from the evaluation (in the company KOSMOS Verlag) is used to demonstrate its viability. The results (SoNBO concept and SoNBO framework) are based on the synthesis of the findings from a structured literature review and the investigation of the status quo of ontology-based information integration in practice: For the status quo in practice, the basic idea of SoNBO is demonstrated in an in-depth case study about the engineering office Vössing, which has been using a self-developed SoNBO application for a few years. The status quo in the academic literature is presented in the form of a structured literature analysis on ontology-based information integration approaches. This dissertation adds to theory in the field of ontology-based information integration approaches (e. g. by an evaluated artefact) and provides an evaluated artefact (the SoNBO Framework) for practice.
In this thesis the possibilities for real-time visualization of OpenVDB
files are investigated. The basics of OpenVDB, its possibilities, as well
as NanoVDB and its GPU port, were studied. A system was developed
using PNanoVDB, the graphics API port of OpenVDB. Techniques were
explored to improve and accelerate a single ray approach of ray tracing.
To prove real-time capability, two single scattering approaches were
also implemented. One of these was selected, further investigated and
optimized to achieve interactive real-time rendering.
It is important to give artists immediate feedback on their adjustments, as
well as the possibility to change all parameters to ensure a user friendly
creation process.
In addition to the optical rendering, corresponding benchmarks were
collected to compare different improvement approaches and to prove
their relevance. Attention was paid to the rendering times and memory
consumption on the GPU to ensure optimal use. A special focus, when
rendering OpenVDB files, was put on the integrability and extensibility of
the program to allow easy integration into an existing real-time renderer
like U-Render.
Social networks are ubiquitous structures that we generate and enrich every-day while connecting with people through social media platforms, emails, and any other type of interaction. While these structures are intangible to us, they carry important information. For instance, the political leaning of our friends can be a proxy to identify our own political preferences. Similarly, the credit score of our friends can be decisive in the approval or rejection of our own loans. This explanatory power is being leveraged in public policy, business decision-making and scientific research because it helps machine learning techniques to make accurate predictions. However, these generalizations often benefit the majority of people who shape the general structure of the network, and put in disadvantage under-represented groups by limiting their resources and opportunities. Therefore it is crucial to first understand how social networks form to then verify to what extent their mechanisms of edge formation contribute to reinforce social inequalities in machine learning algorithms.
To this end, in the first part of this thesis, I propose HopRank and Janus two methods to characterize the mechanisms of edge formation in real-world undirected social networks. HopRank is a model of information foraging on networks. Its key component is a biased random walker based on transition probabilities between k-hop neighborhoods. Janus is a Bayesian framework that allows to identify and rank plausible hypotheses of edge formation in cases where nodes possess additional information. In the second part of this thesis, I investigate the implications of these mechanisms - that explain edge formation in social networks - on machine learning. Specifically, I study the influence of homophily, preferential attachment, edge density, fraction of inorities, and the directionality of links on both performance and bias of collective classification, and on the visibility of minorities in top-k ranks. My findings demonstrate a strong correlation between network structure and machine learning outcomes. This suggests that systematic discrimination against certain people can be: (i) anticipated by the type of network, and (ii) mitigated by connecting strategically in the network.
Semantic Web technologies have been recognized to be key for the integration of distributed and heterogeneous data sources on the Web, as they provide means to define typed links between resources in a dynamic manner and following the principles of dataspaces. The widespread adoption of these technologies in the last years led to a large volume and variety of data sets published as machine-readable RDF data, that once linked constitute the so-called Web of Data. Given the large scale of the data, these links are typically generated by computational methods that given a set of RDF data sets, analyze their content and identify the entities and schema elements that should be connected via the links. Analogously to any other kind of data, in order to be truly useful and ready to be consumed, links need to comply with the criteria of high quality data (e.g., syntactically and semantically accurate, consistent, up-to-date). Despite the progress in the field of machine learning, human intelligence is still essential in the quest for high quality links: humans can train algorithms by labeling reference examples, validate the output of algorithms to verify their performance on a data set basis, as well as augment the resulting set of links. Humans —especially expert humans, however, have limited availability. Hence, extending data quality management processes from data owners/publishers to a broader audience can significantly improve the data quality management life cycle.
Recent advances in human computation and peer-production technologies opened new avenues for human-machine data management techniques, allowing to involve non-experts in certain tasks and providing methods for cooperative approaches. The research work presented in this thesis takes advantage of such technologies and investigates human-machine methods that aim at facilitating link quality management in the Semantic Web. Firstly, and focusing on the dimension of link accuracy, a method for crowdsourcing ontology alignment is presented. This method, also applicable to entities, is implemented as a complement to automatic ontology alignment algorithms. Secondly, novel measures for the dimension of information gain facilitated by the links are introduced. These entropy-centric measures provide data managers with information about the extent the entities in the linked data set gain information in terms of entity description, connectivity and schema heterogeneity. Thirdly, taking Wikidata —the most successful case of a linked data set curated, linked and maintained by a community of humans and bots— as a case study, we apply descriptive and predictive data mining techniques to study participation inequality and user attrition. Our findings and method can help community managers make decisions on when/how to intervene with user retention plans. Lastly, an ontology to model the history of crowd contributions across marketplaces is presented. While the field of human-machine data management poses complex social and technical challenges, the work in this thesis aims to contribute to the development of this still emerging field.
Currently, there are a variety of digital tools in the humanities, such
as annotation, visualization, or analysis software, which support researchers in their work and offer them new opportunities to address different research questions. However, the use of these tools falls far
short of expectations. In this thesis, twelve improvement measures are
developed within the framework of a design science theory to counteract the lack of usage acceptance. By implementing the developed design science theory, software developers can increase the acceptance of their digital tools in the humanities context.
For software engineers, conceptually understanding the tools they are using in the context of their projects is a daily challenge and a prerequisite for complex tasks. Textual explanations and code examples serve as knowledge resources for understanding software languages and software technologies. This thesis describes research on integrating and interconnecting
existing knowledge resources, which can then be used to assist with understanding and comparing software languages and software technologies on a conceptual level. We consider the following broad research questions that we later refine: What knowledge resources can be systematically reused for recovering structured knowledge and how? What vocabulary already exists in literature that is used to express conceptual knowledge? How can we reuse the
online encyclopedia Wikipedia? How can we detect and report on instances of technology usage? How can we assure reproducibility as the central quality factor of any construction process for knowledge artifacts? As qualitative research, we describe methodologies to recover knowledge resources by i.) systematically studying literature, ii.) mining Wikipedia, iii.) mining available textual explanations and code examples of technology usage. The theoretical findings are backed by case studies. As research contributions, we have recovered i.) a reference semantics of vocabulary for describing software technology usage with an emphasis on software languages, ii.) an annotated corpus of Wikipedia articles on software languages, iii.) insights into technology usage on GitHub with regard to a catalog of pattern and iv.) megamodels of technology usage that are interconnected with existing textual explanations and code examples.
On the recognition of human activities and the evaluation of its imitation by robotic systems
(2023)
This thesis addresses the problem of action recognition through the analysis of human motion and the benchmarking of its imitation by robotic systems.
For our action recognition related approaches, we focus on presenting approaches that generalize well across different sensor modalities. We transform multivariate signal streams from various sensors to a common image representation. The action recognition problem on sequential multivariate signal streams can then be reduced to an image classification task for which we utilize recent advances in machine learning. We demonstrate the broad applicability of our approaches formulated as a supervised classification task for action recognition, a semi-supervised classification task for one-shot action recognition, modality fusion and temporal action segmentation.
For action classification, we use an EfficientNet Convolutional Neural Network (CNN) model to classify the image representations of various data modalities. Further, we present approaches for filtering and the fusion of various modalities on a representation level. We extend the approach to be applicable for semi-supervised classification and train a metric-learning model that encodes action similarity. During training, the encoder optimizes the distances in embedding space for self-, positive- and negative-pair similarities. The resulting encoder allows estimating action similarity by calculating distances in embedding space. At training time, no action classes from the test set are used.
Graph Convolutional Network (GCN) generalized the concept of CNNs to non-Euclidean data structures and showed great success for action recognition directly operating on spatio-temporal sequences like skeleton sequences. GCNs have recently shown state-of-the-art performance for skeleton-based action recognition but are currently widely neglected as the foundation for the fusion of various sensor modalities. We propose incorporating additional modalities, like inertial measurements or RGB features, into a skeleton-graph, by proposing fusion on two different dimensionality levels. On a channel dimension, modalities are fused by introducing additional node attributes. On a spatial dimension, additional nodes are incorporated into the skeleton-graph.
Transformer models showed excellent performance in the analysis of sequential data. We formulate the temporal action segmentation task as an object detection task and use a detection transformer model on our proposed motion image representations. Experiments for our action recognition related approaches are executed on large-scale publicly available datasets. Our approaches for action recognition for various modalities, action recognition by fusion of various modalities, and one-shot action recognition demonstrate state-of-the-art results on some datasets.
Finally, we present a hybrid imitation learning benchmark. The benchmark consists of a dataset, metrics, and a simulator integration. The dataset contains RGB-D image sequences of humans performing movements and executing manipulation tasks, as well as the corresponding ground truth. The RGB-D camera is calibrated against a motion-capturing system, and the resulting sequences serve as input for imitation learning approaches. The resulting policy is then executed in the simulated environment on different robots. We propose two metrics to assess the quality of the imitation. The trajectory metric gives insights into how close the execution was to the demonstration. The effect metric describes how close the final state was reached according to the demonstration. The Simitate benchmark can improve the comparability of imitation learning approaches.
Leichte Sprache (LS, easy-to-read German) is a simplified variety of German. It is used to provide barrier-free texts for a broad spectrum of people, including lowliterate individuals with learning difficulties, intellectual or developmental disabilities (IDD) and/or complex communication needs (CCN). In general, LS authors are proficient in standard German and do not belong to the aforementioned group of people. Our goal is to empower the latter to participate in written discourse themselves. This requires a special writing system whose linguistic support and ergonomic software design meet the target group’s specific needs. We present EasyTalk a system profoundly based on natural language processing (NLP) for assistive writing in an extended variant of LS (ELS). EasyTalk provides users with a personal vocabulary underpinned with customizable communication symbols and supports in writing at their individual level of proficiency through interactive user guidance. The system minimizes the grammatical knowledge needed to produce correct and coherent complex contents by intuitively formulating linguistic decisions. It provides easy dialogs for selecting options from a natural-language paraphrase generator, which provides context-sensitive suggestions for sentence components and correctly inflected word forms. In addition, EasyTalk reminds users to add text elements that enhance text comprehensibility in terms of audience design (e.g., time and place of an event) and improve text coherence (e.g., explicit connectors to express discourse-relations). To tailor the system to the needs of the target group, the development of EasyTalk followed the principles of human-centered design (HCD). Accordingly, we matured the system in iterative development cycles, combined with purposeful evaluations of specific aspects conducted with expert groups from the fields of CCN, LS, and IT, as well as L2 learners of the German language. In a final case study, members of the target audience tested the system in free writing sessions. The study confirmed that adults with IDD and/or CCN who have low reading, writing, and computer skills can write their own personal texts in ELS using EasyTalk. The positive feedback from all tests inspires future long-term studies with EasyTalk and further development of this prototypical system, such as the implementation of a so-called Schreibwerkstatt (writing workshop)