004 Datenverarbeitung; Informatik
Refine
Year of publication
Document Type
- Doctoral Thesis (19) (remove)
Keywords
- Software Engineering (3)
- Information Retrieval (2)
- Modellgetriebene Entwicklung (2)
- AUTOSAR (1)
- Abduktion <Logik> (1)
- Adaptation (1)
- Anpassung (1)
- Architektur <Informatik> (1)
- Auditing (1)
- Automotive Systems (1)
Institute
- Institut für Informatik (19) (remove)
Currently, there are a variety of digital tools in the humanities, such
as annotation, visualization, or analysis software, which support researchers in their work and offer them new opportunities to address different research questions. However, the use of these tools falls far
short of expectations. In this thesis, twelve improvement measures are
developed within the framework of a design science theory to counteract the lack of usage acceptance. By implementing the developed design science theory, software developers can increase the acceptance of their digital tools in the humanities context.
For software engineers, conceptually understanding the tools they are using in the context of their projects is a daily challenge and a prerequisite for complex tasks. Textual explanations and code examples serve as knowledge resources for understanding software languages and software technologies. This thesis describes research on integrating and interconnecting
existing knowledge resources, which can then be used to assist with understanding and comparing software languages and software technologies on a conceptual level. We consider the following broad research questions that we later refine: What knowledge resources can be systematically reused for recovering structured knowledge and how? What vocabulary already exists in literature that is used to express conceptual knowledge? How can we reuse the
online encyclopedia Wikipedia? How can we detect and report on instances of technology usage? How can we assure reproducibility as the central quality factor of any construction process for knowledge artifacts? As qualitative research, we describe methodologies to recover knowledge resources by i.) systematically studying literature, ii.) mining Wikipedia, iii.) mining available textual explanations and code examples of technology usage. The theoretical findings are backed by case studies. As research contributions, we have recovered i.) a reference semantics of vocabulary for describing software technology usage with an emphasis on software languages, ii.) an annotated corpus of Wikipedia articles on software languages, iii.) insights into technology usage on GitHub with regard to a catalog of pattern and iv.) megamodels of technology usage that are interconnected with existing textual explanations and code examples.
Traditional Driver Assistance Systems (DAS) like for example Lane Departure Warning Systems or the well-known Electronic Stability Program have in common that their system and software architecture is static. This means that neither the number and topology of Electronic Control Units (ECUs) nor the presence and functionality of software modules changes after the vehicles leave the factory.
However, some future DAS do face changes at runtime. This is true for example for truck and trailer DAS as their hardware components and software entities are spread over both parts of the combination. These new requirements cannot be faced by state-of-the-art approaches of automotive software systems. Instead, a different technique of designing such Distributed Driver Assistance Systems (DDAS) needs to be developed. The main contribution of this thesis is the development of a novel software and system architecture for dynamically changing DAS using the example of driving assistance for truck and trailer. This architecture has to be able to autonomously detect and handle changes within the topology. In order to do so, the system decides which degree of assistance and which types of HMI can be offered every time a trailer is connected or disconnected. Therefore an analysis of the available software and hardware components as well as a determination of possible assistance functionality and a re-configuration of the system take place. Such adaptation can be granted by the principles of Service-oriented Architecture (SOA). In this architectural style all functionality is encapsulated in self-contained units, so-called Services. These Services offer the functionality through well-defined interfaces whose behavior is described in contracts. Using these Services, large-scale applications can be built and adapted at runtime. This thesis describes the research conducted in achieving the goals described by introducing Service-oriented Architectures into the automotive domain. SOA deals with the high degree of distribution, the demand for re-usability and the heterogeneity of the needed components.
It also applies automatic re-configuration in the event of a system change. Instead of adapting one of the frameworks available to this scenario, the main principles of Service-orientation are picked up and tailored. This leads to the development of the Service-oriented Driver Assistance (SODA) framework, which implements the benefits of Service-orientation while ensuring compatibility and compliance to automotive requirements, best-practices and standards. Within this thesis several state-of-the-art Service-oriented frameworks are analyzed and compared. Furthermore, the SODA framework as well as all its different aspects regarding the automotive software domain are described in detail. These aspects include a well-defined reference model that introduces and relates terms and concepts and defines an architectural blueprint. Furthermore, some of the modules of this blueprint such as the re-configuration module and the Communication Model are presented in full detail. In order to prove the compliance of the framework regarding state-of-the-art automotive software systems, a development process respecting today's best practices in automotive design procedures as well as the integration of SODA into the AUTOSAR standard are discussed. Finally, the SODA framework is used to build a full-scale demonstrator in order to evaluate its performance and efficiency.
The publication of freely available and machine-readable information has increased significantly in the last years. Especially the Linked Data initiative has been receiving a lot of attention. Linked Data is based on the Resource Description Framework (RDF) and anybody can simply publish their data in RDF and link it to other datasets. The structure is similar to the World Wide Web where individual HTML documents are connected with links. Linked Data entities are identified by URIs which are dereferenceable to retrieve information describing the entity. Additionally, so called SPARQL endpoints can be used to access the data with an algebraic query language (SPARQL) similar to SQL. By integrating multiple SPARQL endpoints it is possible to create a federation of distributed RDF data sources which acts like one big data store.
In contrast to the federation of classical relational database systems there are some differences for federated RDF data. RDF stores are accessed either via SPARQL endpoints or by resolving URIs. There is no coordination between RDF data sources and machine-readable meta data about a source- data is commonly limited or not available at all. Moreover, there is no common directory which can be used to discover RDF data sources or ask for sources which offer specific data. The federation of distributed and linked RDF data sources has to deal with various challenges. In order to distribute queries automatically, suitable data sources have to be selected based on query details and information that is available about the data sources. Furthermore, the minimization of query execution time requires optimization techniques that take into account the execution cost for query operators and the network communication overhead for contacting individual data sources. In this thesis, solutions for these problems are discussed. Moreover, SPLENDID is presented, a new federation infrastructure for distributed RDF data sources which uses optimization techniques based on statistical information.
This thesis addresses the problem of terrain classification in unstructured outdoor environments. Terrain classification includes the detection of obstacles and passable areas as well as the analysis of ground surfaces. A 3D laser range finder is used as primary sensor for perceiving the surroundings of the robot. First of all, a grid structure is introduced for data reduction. The chosen data representation allows for multi-sensor integration, e.g., cameras for color and texture information or further laser range finders for improved data density. Subsequently, features are computed for each terrain cell within the grid. Classification is performedrnwith a Markov random field for context-sensitivity and to compensate for sensor noise and varying data density within the grid. A Gibbs sampler is used for optimization and is parallelized on the CPU and GPU in order to achieve real-time performance. Dynamic obstacles are detected and tracked using different state-of-the-art approaches. The resulting information - where other traffic participants move and are going to move to - is used to perform inference in regions where the terrain surface is partially or completely invisible for the sensors. Algorithms are tested and validated on different autonomous robot platforms and the evaluation is carried out with human-annotated ground truth maps of millions of measurements. The terrain classification approach of this thesis proved reliable in all real-time scenarios and domains and yielded new insights. Furthermore, if combined with a path planning algorithm, it enables full autonomy for all kinds of wheeled outdoor robots in natural outdoor environments.
In the recent years, Software Engineering research has shown the rise of interest in the empirical studies. Such studies are often based on empirical evidence derived from corpora - collections of software artifacts. While there are established forms of carrying out empirical research (experiments, case studies, surveys, etc.), the common task of preparing the underlying collection of software artifacts is typically addressed in ad hoc manner.
In this thesis, by means of a literature survey we show how frequently software engineering research employs software corpora and using a developed classification scheme we discuss their characteristics. Addressing the lack of methodology, we suggest a method of corpus (re-)engineering and apply it to an existing collection of Java projects.
We report two extensive empirical studies, where we perform a broad and diverse range of analyses on the language for privacy preferences (P3P) and on object-oriented application programming interfaces (APIs). In both cases, we are driven by the data at hand, by the corpus itself, discovering the actual usage of the languages.
Web 2.0 provides technologies for online collaboration of users as well as the creation, publication and sharing of user-generated contents in an interactive way. Twitter, CNET, CiteSeerX, etc. are examples of Web 2.0 platforms which facilitate users in these activities and are viewed as rich sources of information. In the platforms mentioned as examples, users can participate in discussions, comment others, provide feedback on various issues, publish articles and write blogs, thereby producing a high volume of unstructured data which at the same time leads to an information overload. To satisfy various types of human information needs arising from the purpose and nature of the platforms requires methods for appropriate aggregation and automatic analysis of this unstructured data. In this thesis, we propose methods which attempt to overcome the problem of information overload and help in satisfying user information needs in three scenarios.
To this end, first we look at two of the main challenges of sparsity and content quality in Twitter and how these challenges can influence standard retrieval models. We analyze and identify Twitter content features that reflect high quality information. Based on this analysis we introduce the concept of "interestingness" as a static quality measure. We empirically show that our proposed measure helps in retrieving and filtering high quality information in Twitter. Our second contribution relates to the content diversification problem in a collaborative social environment, where the motive of the end user is to gain a comprehensive overview of the pros and cons of a discussion track which results from social collaboration of the people. For this purpose, we develop the FREuD approach which aims at solving the content diversification problem by combining latent semantic analysis with sentiment estimation approaches. Our evaluation results show that the FREuD approach provides a representative overview of sub-topics and aspects of discussions, characteristic user sentiments under different aspects, and reasons expressed by different opponents. Our third contribution presents a novel probabilistic Author-Topic-Time model, which aims at mining topical trends and user interests from social media. Our approach solves this problem by means of Bayesian modeling of relations between authors, latent topics and temporal information. We present results of application of the model to the scientific publication datasets from CiteSeerX showing improved semantically cohesive topic detection and capturing shifts in authors" interest in relation to topic evolution.
Diffusion imaging captures the movement of water molecules in tissue by applying varying gradient fields in a magnetic resonance imaging (MRI)-based setting. It poses a crucial contribution to in vivo examinations of neuronal connections: The local diffusion profile enables inference of the position and orientation of fiber pathways. Diffusion imaging is a significant technique for fundamental neuroscience, in which pathways connecting cortical activation zones are examined, and for neurosurgical planning, where fiber reconstructions are considered as intervention related risk structures.
Diffusion tensor imaging (DTI) is currently applied in clinical environments in order to model the MRI signal due to its fast acquisition and reconstruction time. However, the inability of DTI to model complex intra-voxel diffusion distributions gave rise to an advanced reconstruction scheme which is known as high angular resolution diffusion imaging (HARDI). HARDI received increasing interest in neuroscience due to its potential to provide a more accurate view of pathway configurations in the human brain.
In order to fully exploit the advantages of HARDI over DTI, advanced fiber reconstructions and visualizations are required. This work presents novel approaches contributing to current research in the field of diffusion image processing and visualization. Diffusion classification, tractography, and visualizations approaches were designed to enable a meaningful exploration of neuronal connections as well as their constitution. Furthermore, an interactive neurosurgical planning tool with consideration of neuronal pathways was developed.
The research results in this work provide an enhanced and task-related insight into neuronal connections for neuroscientists as well as neurosurgeons and contribute to the implementation of HARDI in clinical environments.
The amount of information on the Web is constantly increasing and also there is a wide variety of information available such as news, encyclopedia articles, statistics, survey data, stock information, events, bibliographies etc. The information is characterized by heterogeneity in aspects such as information type, modality, structure, granularity, quality and by its distributed nature. The two primary techniques by which users on the Web are looking for information are (1) using Web search engines and (2) browsing the links between information. The dominant mode of information presentation is mainly static in the form of text, images and graphics. Interactive visualizations offer a number of advantages for the presentation and exploration of heterogeneous information on the Web: (1) They provide different representations for different, very large and complex types of information and (2) large amounts of data can be explored interactively using their attributes and thus can support and expand the cognition process of the user. So far, interactive visualizations are still not an integral part in the search process of the Web. The technical standards and interaction paradigms to make interactive visualization usable by the mass are introduced only slowly through standardatization organizations. This work examines how interactive visualizations can be used for the linking and search process of heterogeneous information on the Web. Based on principles in the areas of information retrieval (IR), information visualization and information processing, a model is created, which extends the existing structural models of information visualization with two new processes: (1) linking of information in visualizations and (2) searching, browsing and filtering based on glyphs. The Vizgr toolkit implements the developed model in a web application. In four different application scenarios, aspects of the model will be instantiated and are evaluated in user tests or examined by examples.
This dissertation investigates the usage of theorem provers in automated question answering (QA). QA systems attempt to compute correct answers for questions phrased in a natural language. Commonly they utilize a multitude of methods from computational linguistics and knowledge representation to process the questions and to obtain the answers from extensive knowledge bases. These methods are often syntax-based, and they cannot derive implicit knowledge. Automated theorem provers (ATP) on the other hand can compute logical derivations with millions of inference steps. By integrating a prover into a QA system this reasoning strength could be harnessed to deduce new knowledge from the facts in the knowledge base and thereby improve the QA capabilities. This involves challenges in that the contrary approaches of QA and automated reasoning must be combined: QA methods normally aim for speed and robustness to obtain useful results even from incomplete of faulty data, whereas ATP systems employ logical calculi to derive unambiguous and rigorous proofs. The latter approach is difficult to reconcile with the quantity and the quality of the knowledge bases in QA. The dissertation describes modifications to ATP systems in order to overcome these obstacles. The central example is the theorem prover E-KRHyper which was developed by the author at the Universität Koblenz-Landau. As part of the research work for this dissertation E-KRHyper was embedded into a framework of components for natural language processing, information retrieval and knowledge representation, together forming the QA system LogAnswer.
Also presented are additional extensions to the prover implementation and the underlying calculi which go beyond enhancing the reasoning strength of QA systems by giving access to external knowledge sources like web services. These allow the prover to fill gaps in the knowledge during the derivation, or to use external ontologies in other ways, for example for abductive reasoning. While the modifications and extensions detailed in the dissertation are a direct result of adapting an ATP system to QA, some of them can be useful for automated reasoning in general. Evaluation results from experiments and competition participations demonstrate the effectiveness of the methods under discussion.