004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Doctoral Thesis (48) (remove)
Keywords
- Software Engineering (4)
- Information Retrieval (3)
- model-based (3)
- Abduktion <Logik> (2)
- Maschinelles Lernen (2)
- Modellgetriebene Entwicklung (2)
- Petri-Netz (2)
- Visualisierung (2)
- 3D Modell Referenz Bildsynthese (1)
- AUTOSAR (1)
Folksonomies are Web 2.0 platforms where users share resources with each other. Furthermore, they can assign keywords (called tags) to the resources for categorizing and organizing the resources. Numerous types of resources like websites (Delicious), images (Flickr), and videos (YouTube) are supported by different folksonomies. The folksonomies are easy to use and thus attract the attention of millions of users. Together with the ease they offer, there are also some problems. This thesis addresses different problems of folksonomies and proposes solutions for these problems. The first problem occurs when users search for relevant resources in folksonomies. Often, the users are not able to find all relevant resources because they don't know which tags are relevant. The second problem is assigning tags to resources. Although many folksonomies (like Delicious) recommend tags for the resources, other folksonomies (like Flickr) do not recommend any tags. Tag recommendation helps the users to easily tag their resources. The third problem is that tags and resources are lacking semantics. This leads for example to ambiguous tags. The tags are lacking semantics because they are freely chosen keywords. The automatic identification of the semantics of tags and resources helps in reducing problems that arise from this freedom of the users in choosing the tags. This thesis proposes methods which exploit semantics to address the problems of search, tag recommendation, and the identification of tag semantics. The semantics are discovered from a variety of sources. In this thesis, we exploit web search engines, online social communities and the co-occurrences of tags as sources of semantics. Using different sources for discovering semantics reduces the efforts to build systems which solve the problems mentioned earlier. This thesis evaluates the proposed methods on a large scale data set. The evaluation results suggest that it is possible to exploit the semantics for improving search, recommendation of tags, and automatic identification of the semantics of tags and resources.
Nowadays, almost any IT system involves personal data processing. In
such systems, many privacy risks arise when privacy concerns are not
properly addressed from the early phases of the system design. The
General Data Protection Regulation (GDPR) prescribes the Privacy by
Design (PbD) principle. As its core, PbD obliges protecting personal
data from the onset of the system development, by effectively
integrating appropriate privacy controls into the design. To
operationalize the concept of PbD, a set of challenges emerges: First, we need a basis to define privacy concerns. Without such a basis, we are not able to verify whether personal data processing is authorized. Second, we need to identify where precisely in a system, the controls have to be applied. This calls for system analysis concerning privacy concerns. Third, with a view to selecting and integrating appropriate controls, based on the results of system analysis, a mechanism to identify the privacy risks is required. Mitigating privacy risks is at the core of the PbD principle. Fourth, choosing and integrating appropriate controls into a system are complex tasks that besides risks, have to consider potential interrelations among privacy controls and the costs of the controls.
This thesis introduces a model-based privacy by design methodology to handle the above challenges. Our methodology relies on a precise definition of privacy concerns and comprises three sub-methodologies: model-based privacy analysis, modelbased privacy impact assessment and privacy-enhanced system design modeling. First, we introduce a definition of privacy preferences, which provides a basis to specify privacy concerns and to verify whether personal data processing is authorized. Second, we present a model-based methodology to analyze a system model. The results of this analysis denote a set of privacy design violations. Third, taking into account the results of privacy analysis, we introduce a model-based privacy impact assessment methodology to identify concrete privacy risks in a system model. Fourth, concerning the risks, and taking into account the interrelations and the costs of the controls, we propose a methodology to select appropriate controls and integrate them into a system design. Using various practical case studies, we evaluate our concepts, showing a promising outlook on the applicability of our methodology in real-world settings.
This dissertation introduces a methodology for formal specification and verification of user interfaces under security aspects. The methodology allows to use formal methods pervasively in the specification and verification of human-computer interaction. This work consists of three parts. In the first part, a formal methodology for the description of human-computer interaction is developed. In the second part, existing definitions of computer security are adapted for human-computer interaction and formalized. A generic formal model of human-computer interaction is developed. In the third part, the methodology is applied to the specification and verification of a secure email client.
Software systems have an increasing impact on our daily lives. Many systems process sensitive data or control critical infrastructure. Providing secure software is therefore inevitable. Such systems are rarely being renewed regularly due to the high costs and effort. Oftentimes, systems that were planned and implemented to be secure, become insecure because their context evolves. These systems are connected to the Internet and therefore also constantly subject to new types of attacks. The security requirements of these systems remain unchanged, while, for example, discovery of a vulnerability of an encryption algorithm previously assumed to be secure requires a change of the system design. Some security requirements cannot be checked by the system’s design but only at run time. Furthermore, the sudden discovery of a security violation requires an immediate reaction to prevent a system shutdown. Knowledge regarding security best practices, attacks, and mitigations is generally available, yet rarely integrated part of software development or covering evolution.
This thesis examines how the security of long-living software systems can be preserved taking into account the influence of context evolutions. The goal of the proposed approach, S²EC²O, is to recover the security of model-based software systems using co-evolution.
An ontology-based knowledge base is introduced, capable of managing common, as well as system-specific knowledge relevant to security. A transformation achieves the connection of the knowledge base to the UML system model. By using semantic differences, knowledge inference, and the detection of inconsistencies in the knowledge base, context knowledge evolutions are detected.
A catalog containing rules to manage and recover security requirements uses detected context evolutions to propose potential co-evolutions to the system model which reestablish the compliance with security requirements.
S²EC²O uses security annotations to link models and executable code and provides support for run-time monitoring. The adaptation of running systems is being considered as is round-trip engineering, which integrates insights from the run time into the system model.
S²EC²O is amended by prototypical tool support. This tool is used to show S²EC²O’s applicability based on a case study targeting the medical information system iTrust.
This thesis at hand contributes to the development and maintenance of long-living software systems, regarding their security. The proposed approach will aid security experts: It detects security-relevant changes to the system context, determines the impact on the system’s security and facilitates co-evolutions to recover the compliance with the security requirements.
Virtueller Konsum - Warenkörbe, Wägungsschemata und Verbraucherpreisindizes in virtuellen Welten
(2015)
Virtual worlds have been investigated by several academic disciplines for several years, e.g. sociology, psychology, law and education. Since the developers of virtual worlds have implemented aspects like scarcity and needs, even economic research has become interested in these virtual environments. Exploring virtual economies mainly deals with the research of trade regarding the virtual goods used to supply the emerged needs. On the one hand, economics analyzes the meaning of virtual trade according to the overall interpretation of the economical characteristics of virtual worlds. As some virtual worlds allow the change of virtual world money with real money and vice versa, virtual goods are traded by the users for real money, researchers on the other hand, study the impact of the interdependencies between virtual economies and the real world. The presented thesis mainly focuses on the trade within virtual worlds in the context of virtual consumption and the observation of consumer prices. Therefore, the four virtual worlds World of Warcraft, RuneScape, Entropia Universe and Second Life have been selected. There are several components required to calculate consumer price indices. First, a market basket, which contains the relevant consumed goods existing in virtual worlds, must be developed. Second, a weighting scheme has to be established, which shows the dispersion of consumer tendencies. Third, prices of relevant consumer goods have to be taken. Following real world methods, it is the challenge to apply those methods within virtual worlds. Therefore, this dissertation contains three corresponding investigation parts. Within a first analysis, it will be evaluated, in how far virtual worlds can be explored to identify consumable goods. As a next step, the consumption expenditures of the avatars will be examined based on an online survey. At last, prices of consumable goods will be recorded. Finally, it will be possible to calculate consumer price indices. While investigating those components, the thesis focuses not only on the general findings themselves, but also on methodological issues arising, like limited access to relevant data, missing legal legitimation or security concerns of the users. Beside these aspects, the used methods also allow the examination of several other economic aspects like the consumption habits of the avatars. At the end of the thesis, it will be considered to what extent virtual world economic characteristics can be compared with the real world.
Aspects like the important role of weapons or the different usage of food show significant differences to the real world, caused by the business models of virtual worlds.
Tagging systems are intriguing dynamic systems, in which users collaboratively index resources with the so-called tags. In order to leverage the full potential of tagging systems, it is important to understand the relationship between the micro-level behavior of the individual users and the macro-level properties of the whole tagging system. In this thesis, we present the Epistemic Dynamic Model, which tries to bridge this gap between the micro-level behavior and the macro-level properties by developing a theory of tagging systems. The model is based on the assumption that the combined influence of the shared background knowledge of the users and the imitation of tag recommendations are sufficient for explaining the emergence of the tag frequency distribution and the vocabulary growth in tagging systems. Both macro-level properties of tagging systems are closely related to the emergence of the shared community vocabulary. rnrnWith the help of the Epistemic Dynamic Model, we show that the general shape of the tag frequency distribution and of the vocabulary growth have their origin in the shared background knowledge of the users. Tag recommendations can then be used for selectively influencing this general shape. In this thesis, we especially concentrate on studying the influence of recommending a set of popular tags. Recommending popular tags adds a feedback mechanism between the vocabularies of individual users that increases the inter-indexer consistency of the tag assignments. How does this influence the indexing quality in a tagging system? For this purpose, we investigate a methodology for measuring the inter-resource consistency of tag assignments. The inter-resource consistency is an indicator of the indexing quality, which positively correlates with the precision and recall of query results. It measures the degree to which the tag vectors of indexed resources reflect how the users perceive the similarity between resources. We argue with our model, and show it with a user experiment, that recommending popular tags decreases the inter-resource consistency in a tagging system. Furthermore, we show that recommending the user his/her previously used tags helps to increase the inter-resource consistency. Our measure of the inter-resource consistency complements existing measures for the evaluation and comparison of tag recommendation algorithms, moving the focus to evaluating their influence on the indexing quality.
Social networks are ubiquitous structures that we generate and enrich every-day while connecting with people through social media platforms, emails, and any other type of interaction. While these structures are intangible to us, they carry important information. For instance, the political leaning of our friends can be a proxy to identify our own political preferences. Similarly, the credit score of our friends can be decisive in the approval or rejection of our own loans. This explanatory power is being leveraged in public policy, business decision-making and scientific research because it helps machine learning techniques to make accurate predictions. However, these generalizations often benefit the majority of people who shape the general structure of the network, and put in disadvantage under-represented groups by limiting their resources and opportunities. Therefore it is crucial to first understand how social networks form to then verify to what extent their mechanisms of edge formation contribute to reinforce social inequalities in machine learning algorithms.
To this end, in the first part of this thesis, I propose HopRank and Janus two methods to characterize the mechanisms of edge formation in real-world undirected social networks. HopRank is a model of information foraging on networks. Its key component is a biased random walker based on transition probabilities between k-hop neighborhoods. Janus is a Bayesian framework that allows to identify and rank plausible hypotheses of edge formation in cases where nodes possess additional information. In the second part of this thesis, I investigate the implications of these mechanisms - that explain edge formation in social networks - on machine learning. Specifically, I study the influence of homophily, preferential attachment, edge density, fraction of inorities, and the directionality of links on both performance and bias of collective classification, and on the visibility of minorities in top-k ranks. My findings demonstrate a strong correlation between network structure and machine learning outcomes. This suggests that systematic discrimination against certain people can be: (i) anticipated by the type of network, and (ii) mitigated by connecting strategically in the network.
The flexible integration of information from distributed and complex information systems poses a major challenge for organisations. The ontology-based information integration concept SoNBO (Social Network of Business Objects) developed and presented in this dissertation addresses these challenges. In an ontology-based concept, the data structure in the source systems (e.g. operational application systems) is described with the help of a schema (= ontology). The ontology and the data from the source systems can be used to create a (virtualised or materialised) knowledge graph, which is used for information access. The schema can be flexibly adapted to the changing needs of a company regarding their information integration. SoNBO differs from existing concepts known from the Semantic Web (OBDA = Ontology-based Data Access, EKG = Enterprise Knowledge Graph) both in the structure of the company-specific ontology (= Social Network of Concepts) as well as in the structure of the user-specific knowledge graph (= Social Network of Business Objects) and makes use of social principles (known from Enterprise Social Software). Following a Design Science Research approach, the SoNBO framework was developed and the findings documented in this dissertation. The framework provides guidance for the introduction of SoNBO in a company and the knowledge gained from the evaluation (in the company KOSMOS Verlag) is used to demonstrate its viability. The results (SoNBO concept and SoNBO framework) are based on the synthesis of the findings from a structured literature review and the investigation of the status quo of ontology-based information integration in practice: For the status quo in practice, the basic idea of SoNBO is demonstrated in an in-depth case study about the engineering office Vössing, which has been using a self-developed SoNBO application for a few years. The status quo in the academic literature is presented in the form of a structured literature analysis on ontology-based information integration approaches. This dissertation adds to theory in the field of ontology-based information integration approaches (e. g. by an evaluated artefact) and provides an evaluated artefact (the SoNBO Framework) for practice.
Education and training of the workforce have become an important competitive factor for companies because of the rapid technological changes in the economy and the corresponding ever shorter innovation cycles. Traditional training methods, however, are limited in terms of meeting the resulting demand for education and training in a company, which continues to grow and become faster all the time. Therefore, the use of technology-based training programs (that is, courseware) is increasing because courseware enables self-organized and self-paced learning and, through integration into daily work routines, allows optimal transfer of knowledge and skills, resulting in high learning outcome. To achieve these prospects, high-quality courseware is required, with quality being defined as supporting learners optimally in achieving their learning goals. Developing high-quality courseware, however, usually requires more effort and takes longer than developing other programs, which limits the availability of this courseware in time and with the required quality.
This dissertation therefore deals with the research question of how courseware has to be developed in order to produce high-quality courseware with less development effort and shorter project duration. In addition to its high quality, this courseware should be optimally aligned to the characteristics and learning goals of the learners as well as to the planned usage scenarios for the knowledge and skills being trained. The IntView Method for the systematic and efficient development of high-quality courseware was defined to answer the research question of this dissertation. It aims at increasing the probability of producing courseware in time without exceeding project schedules and budgets while developing a high-quality product optimally focused on the target groups and usage scenarios.
The IntView Methods integrates those execution variants of all activities and activity steps required to develop high-quality courseware, which were identified in a detailed analysis of existing courseware development approaches as well as production approaches from related fields, such as multimedia, web, or software engineering, into a systematic process that in their interaction constitute the most efficient way to develop courseware. The main part of the proposed method is therefore a systematic process for engineering courseware that encompasses all courseware lifecycle phases and integrates the activities and methods of all disciplines involved in courseware engineering, including a lifecycle encompassing quality assurance, into a consolidated process. This process is defined as a lifecycle model as well as a derived process model in the form of a dependency model in order to optimally support courseware project teams in coordinating and synchronizing their project work. In addition to the models, comprehensive, ready-to-apply enactment support materials are provided, consisting of work sheets and document templates that include detailed activity descriptions and examples.
The evaluation of the IntView Method proved that the method together with the enactment support materials enables efficient as well as effective courseware development. The projects and case studies conducted in the context of this evaluation demonstrate that, on the one hand, the method is easily adaptable to the production of different kinds of courseware or to different project contexts, and, on the other hand, that it can be used efficiently and effectively.
The publication of freely available and machine-readable information has increased significantly in the last years. Especially the Linked Data initiative has been receiving a lot of attention. Linked Data is based on the Resource Description Framework (RDF) and anybody can simply publish their data in RDF and link it to other datasets. The structure is similar to the World Wide Web where individual HTML documents are connected with links. Linked Data entities are identified by URIs which are dereferenceable to retrieve information describing the entity. Additionally, so called SPARQL endpoints can be used to access the data with an algebraic query language (SPARQL) similar to SQL. By integrating multiple SPARQL endpoints it is possible to create a federation of distributed RDF data sources which acts like one big data store.
In contrast to the federation of classical relational database systems there are some differences for federated RDF data. RDF stores are accessed either via SPARQL endpoints or by resolving URIs. There is no coordination between RDF data sources and machine-readable meta data about a source- data is commonly limited or not available at all. Moreover, there is no common directory which can be used to discover RDF data sources or ask for sources which offer specific data. The federation of distributed and linked RDF data sources has to deal with various challenges. In order to distribute queries automatically, suitable data sources have to be selected based on query details and information that is available about the data sources. Furthermore, the minimization of query execution time requires optimization techniques that take into account the execution cost for query operators and the network communication overhead for contacting individual data sources. In this thesis, solutions for these problems are discussed. Moreover, SPLENDID is presented, a new federation infrastructure for distributed RDF data sources which uses optimization techniques based on statistical information.