Information
- Publication Type: Master Thesis
- Workgroup(s)/Project(s): not specified
- Date: 2022
- Date (Start): January 2022
- Date (End): November 2022
- Diploma Examination: 21. November 2022
- Open Access: yes
- First Supervisor: Manuela Waldner
- Pages: 134
- Keywords: visual exploration, visual analytics, natural language processing, bias, transformer models
Abstract
In recent years, the importance of Natural Language Processing has been increasing with more and more fields of application. The word representations, such as word embedding or transformer models, used to transcribe the language are trained using large text corpora that may include stereotypes. These stereotypes may be learned by Natural Language Processing algorithms and lead to biases in their results. Extensive research has been performed on the detection, repair and visualization of the biases in the field of Natural Language Processing. Nevertheless, the methods developed so far mostly focus on word embeddings, or direct and binary biases.To fill the research gap regarding multi-class indirect biases learned by transformer models, this thesis proposes new visualisation interfaces to explore indirect and multi-class biases learned by BERT and XLNet models. These visualisations are based on an indirect quantitative method to measure the potential biases encapsulated in transformer models, the Indirect Logarithmic Probability Bias Score. This metric is adapted from an existing one, to enable the investigation of indirect biases. The evaluation of our new indirect method shows that it enables to reveal known biases and to discover new insights which could not be found using the direct method. Moreover, the user study performed on our visualization interfaces demonstrates that the visualizations supports the exploration of multi-class indirect biases, even though improvements may be needed to fully assist the investigation of the sources of the biases.Additional Files and Images
Weblinks
- Table View
- Scatterplot
- Entry in reposiTUm (TU Wien Publication Database)
- DOI: 10.34726/hss.2022.99921
BibTeX
@mastersthesis{louis-alexandre_dit_petit-frere-2022-veo, title = "Visual Exploration of Indirect Biases in Natural Language Processing Transformer Models", author = "Judith Louis-Alexandre Dit Petit-Frere", year = "2022", abstract = "In recent years, the importance of Natural Language Processing has been increasing with more and more fields of application. The word representations, such as word embedding or transformer models, used to transcribe the language are trained using large text corpora that may include stereotypes. These stereotypes may be learned by Natural Language Processing algorithms and lead to biases in their results. Extensive research has been performed on the detection, repair and visualization of the biases in the field of Natural Language Processing. Nevertheless, the methods developed so far mostly focus on word embeddings, or direct and binary biases.To fill the research gap regarding multi-class indirect biases learned by transformer models, this thesis proposes new visualisation interfaces to explore indirect and multi-class biases learned by BERT and XLNet models. These visualisations are based on an indirect quantitative method to measure the potential biases encapsulated in transformer models, the Indirect Logarithmic Probability Bias Score. This metric is adapted from an existing one, to enable the investigation of indirect biases. The evaluation of our new indirect method shows that it enables to reveal known biases and to discover new insights which could not be found using the direct method. Moreover, the user study performed on our visualization interfaces demonstrates that the visualizations supports the exploration of multi-class indirect biases, even though improvements may be needed to fully assist the investigation of the sources of the biases.", pages = "134", address = "Favoritenstrasse 9-11/E193-02, A-1040 Vienna, Austria", school = "Research Unit of Computer Graphics, Institute of Visual Computing and Human-Centered Technology, Faculty of Informatics, TU Wien", keywords = "visual exploration, visual analytics, natural language processing, bias, transformer models", URL = "https://www.cg.tuwien.ac.at/research/publications/2022/louis-alexandre_dit_petit-frere-2022-veo/", }