Experiments with DSL synthesis with LLMs

Description

Large Language Models (LLM) are evolving fast and with the most recent models focused on reasoning, it is interesting to ask how far they can be fine-tuned to synthesize programs in domain specific languages (DSL) from the visual computing field. This may include synthesis of 3D models, textures, sounds, music, interfaces, animations. Examples of such languages are L-Systems, GLSL or Blender nodes and many more.

Tasks

The subject of this topic is to select a recent reasoning LLM model (a distilled one able to run locally), select a specific DSLs and design a set of fine-tuning strategies to teach the LLM the syntax and semantics of the DSL. Each strategy will include a generator of synthetic training data and a verification set with real-world data. The strategies should be designed and evaluated scientifically to assess the success of the training. Depending on the project type (Bc. or Msc., student project) further components may be added to assist with query interpretation or learning.

The topic is specified rather general so that students may follow their own preferences.

Requirements

Knowledge of English language (source code comments and final report should be in English)
Knowledge of machine learning models
Knowledge of Python
Interest in formal aspects of programming languages
recent GPU for local training (remote is always a bit more tedious)

Environment

Unsloth, Ubuntu, Hugging Face, a visual computing DSL of your choice

Responsible

For more information please contact Martin Ilčík.

Details

Type

Persons

Description

Tasks

Requirements

Environment

Responsible