Information
- Publication Type: PhD-Thesis
- Workgroup(s)/Project(s):
- Date: August 2018
- Date (Start): June 2012
- Date (End): August 2018
- Second Supervisor: Harald Piringer
- 1st Reviewer: Helwig Hauser
- 2nd Reviewer: Cagatay Turkay
- Rigorosum: 9. November 2018
- First Supervisor: Eduard Gröller
Abstract
Statistical modeling is a key technology for generating business value from data. While the number of available algorithms and the need for them is growing, the number of people with the skills to effectively use such methods lags behind. Many application domain experts find it hard to use and trust algorithms that come as black boxes with insufficient interfaces to adapt. The field of Visual Analytics aims to solve this problem by a human-oriented approach that puts users in control of algorithms through interactive
visual interfaces. However, designing accessible solutions for a broad set of users while re-using existing, proven algorithms poses significant challenges for the design of analytical infrastructures, visualizations, and interactions.
This thesis provides multiple contributions towards a more human-oriented modeling
process: As a theoretical basis, it investigates how user involvement during the execution of algorithms can be realized from a technical perspective. Based on a characterization of needs regarding intermediate feedback and control, a set of formal strategies to realize user involvement in algorithms with different characteristics is presented. Guidelines
for the design of algorithmic APIs are identified, and requirements for the re-use of algorithms are discussed. From a survey of frequently used algorithms within R, the
thesis concludes that a range of pragmatic options for enabling user involvement in new and existing algorithms exist and should be used. After these conceptual considerations, the thesis presents two methodological contributions that demonstrate how even inexperienced modelers can be effectively involved in the
modeling process. First, a new technique called TreePOD guides the selection of decision trees along trade-offs between accuracy and other objectives, such as interpretability.
Users can interactively explore a diverse set of candidate models generated by sampling the parameters of tree construction algorithms. Visualizations provide an overview of possible tree characteristics and guide model selection, while details on the underlying machine learning process are only exposed on demand. Real-world evaluation with
domain experts in the energy sector suggests that TreePOD enables users with and without statistical background a confident identification of suitable decision trees. As the second methodological contribution, the thesis presents a framework for interactive
building and validation of regression models. The framework addresses limitations of automated regression algorithms regarding the incorporation of domain knowledge, identifying local dependencies, and building trust in the models. Candidate variables for model refinement are ranked, and their relationship with the target variable is visualized to support an interactive workflow of building regression models. A real-world case study and feedback from domain experts in the energy sector indicate a significant effort
reduction and increased transparency of the modeling process.
All methodological contributions of this work were implemented as part of a commercially distributed Visual Analytics software called Visplore. As the last contribution, this thesis reflects upon years of experience in deploying Visplore for modeling-related tasks in the energy sector. Dissemination and adoption are important aspects of making statistical
models more accessible for domain experts, making this work relevant for practitioners
and application-oriented researchers alike.
Additional Files and Images
Additional images and videos
Additional files
Weblinks
No further information available.
BibTeX
@phdthesis{Muehlbacher_diss_2018,
title = "Human-Oriented Statistical Modeling: Making Algorithms
Accessible through Interactive Visualization",
author = "Thomas M\"{u}hlbacher",
year = "2018",
abstract = "Statistical modeling is a key technology for generating
business value from data. While the number of available
algorithms and the need for them is growing, the number of
people with the skills to effectively use such methods lags
behind. Many application domain experts find it hard to use
and trust algorithms that come as black boxes with
insufficient interfaces to adapt. The field of Visual
Analytics aims to solve this problem by a human-oriented
approach that puts users in control of algorithms through
interactive visual interfaces. However, designing accessible
solutions for a broad set of users while re-using existing,
proven algorithms poses significant challenges for the
design of analytical infrastructures, visualizations, and
interactions. This thesis provides multiple contributions
towards a more human-oriented modeling process: As a
theoretical basis, it investigates how user involvement
during the execution of algorithms can be realized from a
technical perspective. Based on a characterization of needs
regarding intermediate feedback and control, a set of formal
strategies to realize user involvement in algorithms with
different characteristics is presented. Guidelines for the
design of algorithmic APIs are identified, and requirements
for the re-use of algorithms are discussed. From a survey of
frequently used algorithms within R, the thesis concludes
that a range of pragmatic options for enabling user
involvement in new and existing algorithms exist and should
be used. After these conceptual considerations, the thesis
presents two methodological contributions that demonstrate
how even inexperienced modelers can be effectively involved
in the modeling process. First, a new technique called
TreePOD guides the selection of decision trees along
trade-offs between accuracy and other objectives, such as
interpretability. Users can interactively explore a diverse
set of candidate models generated by sampling the parameters
of tree construction algorithms. Visualizations provide an
overview of possible tree characteristics and guide model
selection, while details on the underlying machine learning
process are only exposed on demand. Real-world evaluation
with domain experts in the energy sector suggests that
TreePOD enables users with and without statistical
background a confident identification of suitable decision
trees. As the second methodological contribution, the thesis
presents a framework for interactive building and validation
of regression models. The framework addresses limitations of
automated regression algorithms regarding the incorporation
of domain knowledge, identifying local dependencies, and
building trust in the models. Candidate variables for model
refinement are ranked, and their relationship with the
target variable is visualized to support an interactive
workflow of building regression models. A real-world case
study and feedback from domain experts in the energy sector
indicate a significant effort reduction and increased
transparency of the modeling process. All methodological
contributions of this work were implemented as part of a
commercially distributed Visual Analytics software called
Visplore. As the last contribution, this thesis reflects
upon years of experience in deploying Visplore for
modeling-related tasks in the energy sector. Dissemination
and adoption are important aspects of making statistical
models more accessible for domain experts, making this work
relevant for practitioners and application-oriented
researchers alike.",
month = aug,
address = "Favoritenstrasse 9-11/E193-02, A-1040 Vienna, Austria",
school = "Institute of Computer Graphics and Algorithms, Vienna
University of Technology ",
URL = "https://www.cg.tuwien.ac.at/research/publications/2018/Muehlbacher_diss_2018/",
}