Information
- Publication Type: Other Reviewed Publication
- Workgroup(s)/Project(s):
- Date: 2021
- Booktitle: Eurographics 2021 - Tutorials
- DOI: 10.2312/egt.20211037
- Editor: Carol O'Sullivan and Dieter Schmalstieg
- Location: Vienna
- Open Access: yes
- Publisher: The Eurographics Association
- Pages: 5
- Pages: 1 – 5
- Keywords: Parallel Programming, GPU
Abstract
To provide a profound understanding of how CUDA applications can achieve peak performance, the first two parts of this tutorial outline the modern CUDA architecture. Following a basic introduction, we expose how language features are linked to---and constrained by---the underlying physical hardware components. Furthermore, we describe common applications for massively parallel programming, offer a detailed breakdown of potential issues, and list ways to mitigate performance impacts. An exemplary analysis of PTX and SASS snippets illustrates how code patterns in CUDA are mapped to actual hardware instructions.In parts 3 and 4, we focus on novel features that were enabled by the arrival of CUDA 10+ toolkits and the Volta+ architectures, such as ITS, tensor cores, and the graph API. In addition to basic use case demonstrations, we outline our own experiences with these capabilities and their potential performance benefits. We also discuss how long-standing best practices are affected by these changes and describe common caveats for dealing with legacy code on recent GPU models. We show how these considerations can be implemented in practice by presenting state-of-the-art research into task-based GPU scheduling, and how the dynamic adjustment of thread roles and group configurations can significantly increase performance.
Additional Files and Images
Weblinks
- Tutorial Homepage
Course notes and code samples for the tutorial - https://diglib.eg.org:443/handle/10.2312/egt20211037
- Entry in reposiTUm (TU Wien Publication Database)
- DOI: 10.2312/egt.20211037
BibTeX
@inproceedings{kenzel_michael_2021_cuda,
title = "CUDA and Applications to Task-based Programming",
author = "Michael Kenzel and Bernhard Kerbl and Martin Winter and
Markus Steinberger",
year = "2021",
abstract = "To provide a profound understanding of how CUDA applications
can achieve peak performance, the first two parts of this
tutorial outline the modern CUDA architecture. Following a
basic introduction, we expose how language features are
linked to---and constrained by---the underlying physical
hardware components. Furthermore, we describe common
applications for massively parallel programming, offer a
detailed breakdown of potential issues, and list ways to
mitigate performance impacts. An exemplary analysis of PTX
and SASS snippets illustrates how code patterns in CUDA are
mapped to actual hardware instructions. In parts 3 and 4,
we focus on novel features that were enabled by the arrival
of CUDA 10+ toolkits and the Volta+ architectures, such as
ITS, tensor cores, and the graph API. In addition to basic
use case demonstrations, we outline our own experiences with
these capabilities and their potential performance benefits.
We also discuss how long-standing best practices are
affected by these changes and describe common caveats for
dealing with legacy code on recent GPU models. We show how
these considerations can be implemented in practice by
presenting state-of-the-art research into task-based GPU
scheduling, and how the dynamic adjustment of thread roles
and group configurations can significantly increase
performance.",
month = may,
booktitle = "Eurographics 2021 - Tutorials",
doi = "10.2312/egt.20211037",
editor = "Carol O'Sullivan and Dieter Schmalstieg",
location = "Vienna",
publisher = "The Eurographics Association",
pages = "5",
pages = "1--5",
keywords = "Parallel Programming, GPU",
URL = "https://www.cg.tuwien.ac.at/research/publications/2021/kenzel_michael_2021_cuda/",
}