Thumbnail

Transparency in Interactive Feature-based Machine Learning: Challenges and Solutions

F. Stoffel

2018
Dissertation

Machine learning is ubiquitous in everyday life; techniques from the area of automated data analysis are used in various application scenarios, ranging from recommendations for movies over routes to drive to automated analysis of data in critical domains. To make appropriate use of such techniques, a calibration between human trust and the trustworthiness of the machine learning techniques is required. If the calibration does not take place, research shows that disuse and misuse of machine learning techniques may happen. In this thesis, we elaborate on the problem of providing transparency in feature-based machine learning. In particular, we outline a number of challenges and present solutions for transparency. The solutions are based on interactive visual interfaces operating on feature-level. First, we elaborate on the connection between trust and transparency and outline the fundamental framework that builds the ground for this thesis, and introduce different audiences of transparency. In the following, we present interactive, visualization, and visual analytics-based solutions for specific aspects of transparency. First, the solution for the task of error analysis in supervised learning is presented. The proposed visual analytics system contains a number of coordinated views that facilitate sensemaking and reasoning of the influence of single features or groups of features in the machine learning process. The second solution is a visualization technique tailored to the interactive, visual exploration of ambiguous feature sets that arise in certain machine learning scenarios. Statistical and semantical information is combined to present a clear picture of the targeted type of ambiguities that can be interactively modified, eventually leading to a more specific feature set with fewer ambiguities. Afterward, we illustrate how the concept of transparency and observable behavior can be of use in a real-world scenario. We contribute an interactive, visualization-driven system to explore spatial clustering, giving the human control of the feature set, feature weights, and associated hyperparameters. To observe different behaviors of spatial clustering, an interactive visualization is provided that allows the comparison of different feature combinations and hyperparameters. In the same application domain, we contribute a visual analytics system that enables analysts to interactively visualize the output of a machine learning system in context with additional data that have a common, spatial context. The system bridges the gap between the analysts utilizing a machine learning system and users of the results, which in the targeted scenario are two different user groups. Our solutions show that both groups profit from insights in the feature set of machine learning. The thesis concludes with a reflection regarding further research directions and a summary of the results.

Materials
Title