Details

Type

  • Bachelor Thesis
  • Student Project
  • Master Thesis

Persons

1-4

Description

A narrator tells a story slightly different each time. Imagine an invisible conductor controlling the stage lighting, projections, music, sounds in real-time and in accordance to the narrator to amplify the experience for the audience.

Tasks

Create a pipeline starting with Whisper for speech recognition of the narrator. Feed the text into a fine-tuned LLM which gives commands to the modal generators controlling the scene appearance. Each student will handle a single modality of choice. The main focus is on authoring (i.e. programming) a restricted generative environment controllable by a set of parameters, fine-tuning the LLM for it and crafting queries for the real-time LLM employment. Support tasks involve connecting the components in a unified environment, preferably .NET, and creating a framework for running the whole pipeline for experiments.

Requirements

  • Knowledge of English language (source code comments and final report should be in English)
  • Experience with C# and Python
  • Basic knowledge of large language models (LLMs)
  • Docker and Linux knowledge advantageous

Environment

The project should be implemented as a standalone .NET application wrapped in a Docker container. It may use helper libraries like for example:

References

Responsible

For more information please contact Martin Ilčík.