Details

Type

  • Bachelor Thesis
  • Student Project
  • Master Thesis

Persons

1-2

Description

In many domains, such as biology, chemistry, medicine, and the humanities, large amounts of data exist. Visual exploratory analysis of these data is often not practicable due to their size and their unstructured nature. Traditional machine learning (ML) requires large-scale labeled training data and a clear target definition, which is typically not available when exploring unknown data. For such large-scale, unstructured, open-ended, and domain-specific problems, we need an interactive approach combining the strengths of ML and human analytical skills into a unified process that helps users to "detect the expected and discover the unexpected".

In a project in collaboration with FH St.Pölten, we investigate how humans and machines can learn about and from the data in a joint fashion. The focus of TU Wien thereby lies on suitable visual interfaces that facilitate the joint human-machine data exploration process.

Projects

Within this project, the following student project and theses topics are available:

(BA/PR/DA, 1 person): Visual and (Inter)Active Learning - Design Space (and Study): Active learning is an established and successful strategy to effectively label large amounts of data, but also has been found to be annoying by users [Amershi et al., AI Magazine 2014]. At least for simple examples, selecting data points to label from a visualization can be similarly effective but more enjoyable [Bernard et al., TVCG 2017]. Combinations of these two approaches are possible (e.g., [Bernard et al., Visual Computer 2018]). In this work, the design space of visual interfaces combining active learning and visual interactive labeling should be explored to support labeling of large datasets with real-world complexity (i.e., "beyond MNIST"). For BA / PR, an interface with varying degree of system and user initiative should be designed and implemented within our existing software framework. For DA, the design space should be additionally validated through a user study based on the implementation. 

(PR/DA, 1 person): High-Dimensional Aggregates: Using dimensionality reduction techniques, we can visualize the similarity images, time series data, and other types of unstructured data, through a 2D projection. These projections can then be analyzed to assess the accuracy of models or to select items for labeling. When visualizing many data instances, aggregation strategies (e.g., clustering) are necessary to avoid visual clutter. Aggregates can be computed in the projection space or in a higher-dimensional feature space. There are also different methods to visualize aggregates - from simple color-coding of items belonging to the same aggregate to highly abstract representations, where aggregates are substituted by simple graphical elements. This work should explore the design space and integrate an aggregate visualization technique, where we can control the degree of visual abstraction and the space in which the aggregations are computed, into an existing software framework. For DA, the work shall rigorously validate the design space based on multiple characteristics, such as visual clutter and representativeness of the visualized aggregates. The work can build upon a previous student project on visualization of high-dimensional clusters [Wolf, student project TU Wien 2023]. 

(BA/PR/DA, 1 person): Complex Interactive Annotation: Interactive data annotation by humans is essential for a machine backend to incrementally learn the underlying semantics of the data. Existing systems typically constrain the user in the way how they can annotate the data - for instance into a set of flat labels [see Dudley and Kristensson, ACM TiiS 2018 for a survey]. However, users may wish to describe more complex semantics, such as hierarchical or multi-dimensional relations. In this work, the student shall develop an effective model and interface for letting users express such complex semantics and match them with the data items. 

Requirements

  • Strong interest in visualization, user interfaces, machine learning, and human-computer interaction
  • Very good programming skills
  • Experience with web technologies (JavaScript, d3, ...) as well as Python advantageous
  • Experience with ML libraries also advantageous 

Environment

The projects shall be developed within an existing framework based on a React and TypeScript frontend and a Python backend. 

Responsible

For more information please contact Manuela Waldner, Johannes Eschner.