OPUS 4 | 004 Datenverarbeitung; Informatik

Edge Formation and its Influence in Machine Learning (2022)

Espín-Noboa, Lisette

Social networks are ubiquitous structures that we generate and enrich every-day while connecting with people through social media platforms, emails, and any other type of interaction. While these structures are intangible to us, they carry important information. For instance, the political leaning of our friends can be a proxy to identify our own political preferences. Similarly, the credit score of our friends can be decisive in the approval or rejection of our own loans. This explanatory power is being leveraged in public policy, business decision-making and scientific research because it helps machine learning techniques to make accurate predictions. However, these generalizations often benefit the majority of people who shape the general structure of the network, and put in disadvantage under-represented groups by limiting their resources and opportunities. Therefore it is crucial to first understand how social networks form to then verify to what extent their mechanisms of edge formation contribute to reinforce social inequalities in machine learning algorithms. To this end, in the first part of this thesis, I propose HopRank and Janus two methods to characterize the mechanisms of edge formation in real-world undirected social networks. HopRank is a model of information foraging on networks. Its key component is a biased random walker based on transition probabilities between k-hop neighborhoods. Janus is a Bayesian framework that allows to identify and rank plausible hypotheses of edge formation in cases where nodes possess additional information. In the second part of this thesis, I investigate the implications of these mechanisms - that explain edge formation in social networks - on machine learning. Specifically, I study the influence of homophily, preferential attachment, edge density, fraction of inorities, and the directionality of links on both performance and bias of collective classification, and on the visibility of minorities in top-k ranks. My findings demonstrate a strong correlation between network structure and machine learning outcomes. This suggests that systematic discrimination against certain people can be: (i) anticipated by the type of network, and (ii) mitigated by connecting strategically in the network.

Real-Time Implementation of OpenVDB Rendering (2022)

Gaida, Sebastian

In this thesis the possibilities for real-time visualization of OpenVDB files are investigated. The basics of OpenVDB, its possibilities, as well as NanoVDB and its GPU port, were studied. A system was developed using PNanoVDB, the graphics API port of OpenVDB. Techniques were explored to improve and accelerate a single ray approach of ray tracing. To prove real-time capability, two single scattering approaches were also implemented. One of these was selected, further investigated and optimized to achieve interactive real-time rendering. It is important to give artists immediate feedback on their adjustments, as well as the possibility to change all parameters to ensure a user friendly creation process. In addition to the optical rendering, corresponding benchmarks were collected to compare different improvement approaches and to prove their relevance. Attention was paid to the rendering times and memory consumption on the GPU to ensure optimal use. A special focus, when rendering OpenVDB files, was put on the integrability and extensibility of the program to allow easy integration into an existing real-time renderer like U-Render.

Methods for Human-Machine Link Quality Management on the Web of Data (2022)

Sarasua, Cristina

Semantic Web technologies have been recognized to be key for the integration of distributed and heterogeneous data sources on the Web, as they provide means to define typed links between resources in a dynamic manner and following the principles of dataspaces. The widespread adoption of these technologies in the last years led to a large volume and variety of data sets published as machine-readable RDF data, that once linked constitute the so-called Web of Data. Given the large scale of the data, these links are typically generated by computational methods that given a set of RDF data sets, analyze their content and identify the entities and schema elements that should be connected via the links. Analogously to any other kind of data, in order to be truly useful and ready to be consumed, links need to comply with the criteria of high quality data (e.g., syntactically and semantically accurate, consistent, up-to-date). Despite the progress in the field of machine learning, human intelligence is still essential in the quest for high quality links: humans can train algorithms by labeling reference examples, validate the output of algorithms to verify their performance on a data set basis, as well as augment the resulting set of links. Humans —especially expert humans, however, have limited availability. Hence, extending data quality management processes from data owners/publishers to a broader audience can significantly improve the data quality management life cycle. Recent advances in human computation and peer-production technologies opened new avenues for human-machine data management techniques, allowing to involve non-experts in certain tasks and providing methods for cooperative approaches. The research work presented in this thesis takes advantage of such technologies and investigates human-machine methods that aim at facilitating link quality management in the Semantic Web. Firstly, and focusing on the dimension of link accuracy, a method for crowdsourcing ontology alignment is presented. This method, also applicable to entities, is implemented as a complement to automatic ontology alignment algorithms. Secondly, novel measures for the dimension of information gain facilitated by the links are introduced. These entropy-centric measures provide data managers with information about the extent the entities in the linked data set gain information in terms of entity description, connectivity and schema heterogeneity. Thirdly, taking Wikidata —the most successful case of a linked data set curated, linked and maintained by a community of humans and bots— as a case study, we apply descriptive and predictive data mining techniques to study participation inequality and user attrition. Our findings and method can help community managers make decisions on when/how to intervene with user retention plans. Lastly, an ontology to model the history of crowd contributions across marketplaces is presented. While the field of human-machine data management poses complex social and technical challenges, the work in this thesis aims to contribute to the development of this still emerging field.

RoboCup 2016 – homer@UniKoblenz (Germany) (2018)

Memmesheimer, Raphael

This paper describes the robot Lisa used by team homer@UniKoblenz of the University of Koblenz Landau, Germany, for the participation at the RoboCup@Home 2016 in Leipzig, Germany. A special focus is put on novel system components and the open source contributions of our team. We have released packages for object recognition, a robot face including speech synthesis, mapping and navigation, speech recognition interface via android and a GUI. The packages are available (and new packages will be released) on http://wiki.ros.org/agas-ros-pkg.

Crowdsourcing for Survey Research : where Amazon Mechanical Turks deviates from conventional survey methods (2015)

Schaarschmidt, Mario ; Ivens, Stefan ; Homscheid, Dirk ; Bilo, Pascal

Information systems research has started to use crowdsourcing platforms such as Amazon Mechanical Turks (MTurk) for scientific research, recently. In particular, MTurk provides a scalable, cheap work-force that can also be used as a pool of potential respondents for online survey research. In light of the increasing use of crowdsourcing platforms for survey research, the authors aim to contribute to the understanding of its appropriate usage. Therefore, they assess if samples drawn from MTurk deviate from those drawn via conventional online surveys (COS) in terms of answers in relation to relevant e-commerce variables and test the data in a nomological network for assessing differences in effects. The authors compare responses from 138 MTurk workers with those of 150 German shoppers recruited via COS. The findings indicate, inter alia, that MTurk workers tend to exhibit more positive word-of mouth, perceived risk, customer orientation and commitment to the focal company. The authors discuss the study- results, point to limitations, and provide avenues for further research.

Categorising Social Media Business Risks (2014)

Hausmann, Verena ; Williams, Susan P.

The aim of this paper is to identify and understand the risks and issues companies are experiencing from the business use of social media and to develop a framework for describing and categorising those social media risks. The goal is to contribute to the evolving theorisation of social media risk and to provide a foundation for the further development of social media risk management strategies and processes. The study findings identify thirty risk types organised into five categories (technical, human, content, compliance and reputational). A risk-chain is used to illustrate the complex interrelated, multi-stakeholder nature of these risks and directions for future work are identified.

Micro Modelling of User Perception and Generation Processes for Macro Level Predictions in Online Communities (2014)

Schwagereit, Felix ; Gottron, Thomas ; Staab, Steffen

The way information is presented to users in online community platforms has an influence on the way the users create new information. This is the case, for instance, in question-answering fora, crowdsourcing platforms or other social computation settings. To better understand the effects of presentation policies on user activity, we introduce a generative model of user behaviour in this paper. Running simulations based on this user behaviour we demonstrate the ability of the model to evoke macro phenomena comparable to the ones observed on real world data.

Extended Description of the Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling (2014)

Schaible, Johann ; Gottron, Thomas ; Scherp, Ansgar

Modeling and publishing Linked Open Data (LOD) involves the choice of which vocabulary to use. This choice is far from trivial and poses a challenge to a Linked Data engineer. It covers the search for appropriate vocabulary terms, making decisions regarding the number of vocabularies to consider in the design process, as well as the way of selecting and combining vocabularies. Until today, there is no study that investigates the different strategies of reusing vocabularies for LOD modeling and publishing. In this paper, we present the results of a survey with 79 participants that examines the most preferred vocabulary reuse strategies of LOD modeling. Participants of our survey are LOD publishers and practitioners. Their task was to assess different vocabulary reuse strategies and explain their ranking decision. We found significant differences between the modeling strategies that range from reusing popular vocabularies, minimizing the number of vocabularies, and staying within one domain vocabulary. A very interesting insight is that the popularity in the meaning of how frequent a vocabulary is used in a data source is more important than how often individual classes and properties arernused in the LOD cloud. Overall, the results of this survey help in understanding the strategies how data engineers reuse vocabularies, and theyrnmay also be used to develop future vocabulary engineering tools.

Semantically Guided Evolution of SHI ABoxes (2013)

Furbach, Ulrich ; Schon, Claudia

This paper presents a method for the evolution of SHI ABoxes which is based on a compilation technique of the knowledge base. For this the ABox is regarded as an interpretation of the TBox which is close to a model. It is shown, that the ABox can be used for a semantically guided transformation resulting in an equisatisfiable knowledge base. We use the result of this transformation to effciently delete assertions from the ABox. Furthermore, insertion of assertions as well as repair of inconsistent ABoxes is addressed. For the computation of the necessary actions for deletion, insertion and repair, the E-KRHyper theorem prover is used.

Concept Network Extraction from Text (2013)

Krukow, Oliver

Large amounts of qualitative data make the utilization of computer-assisted methods for their analysis inevitable. In this thesis Text Mining as an interdisciplinary approach, as well as the methods established in the empirical social sciences for analyzing written utterances are introduced. On this basis a process of extracting concept networks from texts is outlined and the possibilities of utilitzing natural language processing methods within are highlighted. The core of this process is text processing, to whose execution software solutions supporting manual as well as automated work are necessary. The requirements to be met by these solutions, against the background of the initiating project GLODERS, which is devoted to investigating extortion racket systems as part of the global fiσnancial system, are presented, and their fulσlment by the two most preeminent candidates reviewed. The gap between theory and pratical application is closed by a prototypical application of the method to a data set of the research project utilizing the two given software solutions.

004 Datenverarbeitung; Informatik

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Keywords

Institute

105 search hits