Information
- Publication Type: Master Thesis
- Workgroup(s)/Project(s):
- Date: October 2019
- Date (Start): March 2018
- Date (End): 10. October 2019
- Open Access: yes
- First Supervisor:
Abstract
In many domains, the sheer quantity of text documents that have to be parsed increases daily. To keep up with this continuous text stream, a considerable amount of time has to be invested. We developed a classification interface for text streams that learns user-specific topics from the user’s labeling process and partitions the incoming data into these topics. Current approaches that try to derive content categorization from a vast number of unstructured text documents use pre-trained learning models to perform text classification. These models assign predefined categories to the text according to its content. Depending on the use case, a user’s interests might not coincide with the given categories. The model cannot adapt to changing terminology that was not present during training. Besides these factors, users often do not trust pre-trained models as they are a black box for them. To solve this problem, our application lets users define a classification problem and train a learning model through interaction with a Star Coordinates visualization. The approach that makes this interaction efficient is a variant of active learning. This active learning variant states that a learning model can achieve greater accuracy with fewer labeled training instances, if a user provides data purposefully from which it learns. We adapted this strategy for text stream classification by visualizing the topic affiliation probabilities of the learning model and providing novel interaction tools to enhance the model’s performance iteratively. By simulating different selection strategies common in active learning, we found that our visual selection strategies correspond closely to the classic active learning selection strategies. Further, users performed on par with the best simulated selection strategies in the results from our preliminary user study. Our evaluation concludes that there are benefits from incorporating information visualization into the active learning process.Additional Files and Images
Weblinks
BibTeX
@mastersthesis{mazurek-2018-vac, title = "Visual Active Learning for News Stream Classification", author = "Michael Mazurek", year = "2019", abstract = "In many domains, the sheer quantity of text documents that have to be parsed increases daily. To keep up with this continuous text stream, a considerable amount of time has to be invested. We developed a classification interface for text streams that learns user-specific topics from the user’s labeling process and partitions the incoming data into these topics. Current approaches that try to derive content categorization from a vast number of unstructured text documents use pre-trained learning models to perform text classification. These models assign predefined categories to the text according to its content. Depending on the use case, a user’s interests might not coincide with the given categories. The model cannot adapt to changing terminology that was not present during training. Besides these factors, users often do not trust pre-trained models as they are a black box for them. To solve this problem, our application lets users define a classification problem and train a learning model through interaction with a Star Coordinates visualization. The approach that makes this interaction efficient is a variant of active learning. This active learning variant states that a learning model can achieve greater accuracy with fewer labeled training instances, if a user provides data purposefully from which it learns. We adapted this strategy for text stream classification by visualizing the topic affiliation probabilities of the learning model and providing novel interaction tools to enhance the model’s performance iteratively. By simulating different selection strategies common in active learning, we found that our visual selection strategies correspond closely to the classic active learning selection strategies. Further, users performed on par with the best simulated selection strategies in the results from our preliminary user study. Our evaluation concludes that there are benefits from incorporating information visualization into the active learning process.", month = oct, address = "Favoritenstrasse 9-11/E193-02, A-1040 Vienna, Austria", school = "Research Unit of Computer Graphics, Institute of Visual Computing and Human-Centered Technology, Faculty of Informatics, TU Wien", URL = "https://www.cg.tuwien.ac.at/research/publications/2019/mazurek-2018-vac/", }