GPU Programming with CUDA

Graphics Processing Units (GPUs) were originally developed for computer gaming and other graphical tasks, but for many years have been exploited for general purpose computing in a number of areas. They offer advantages over traditional CPUs because they have greater computational capability, and use high-bandwidth memory systems (memory bandwidth is the main bottleneck for many scientific applications).


Kevin Stratford

Kevin has a background in computational physics and joined EPCC in 2001. He teaches on courses including 'Scientific Programming with Python' and 'GPU Programming with CUDA'.


Neelofer Banglawala

Neelofer is a course organiser for Scientific Python and is also involved in teaching MPI, HPC and Software Carpentry courses.



This introductory course will describe GPUs, and the advantages they offer.

It will teach participants how to start to program GPUs, which cannot be used in isolation, but are usually used in conjunction with CPUs.

Important issues affecting performance will be covered.

The course focuses on NVIDIA GPUs, and the CUDA programming language (an extension to C/C++ or Fortran). Please note the course is aimed at application programmers; it does not consider machine learning or any of the packages available in the machine learning arena.

Hands-on practical sessions are included.

You will require your laptop, and your institutional credentials to connect to eduroam. The training parctical exercises will be run on a web-based system so all you will need is a relatively recent web browser (Firefox, Chrome and Safari are known to work). You will also need a secure shell client: this is usually the terminal on Linux/MacOS platforms, and can be provided by, e.g., putty or MobaXterm on Windows.

This course is free to all academics.


Provisional timetable based on previous run - may be subject to change.

Day 1

  • 10:00 Introduction
  • 10:20 GPU Concepts/Architectures
  • 11:00 Break
  • 11:20 CUDA Programming
  • 12:00 A first CUDA exercise
  • 13:00 Lunch
  • 14:00 CUDA Optimisations
  • 14:20 Optimisation Exercise
  • 15:00 Break
  • 15:20 Constant and Shared Memory
  • 16:00 Exercise
  • 17:00 Close

Day 2

  • 10:00 Recap
  • 10:30 OpenCL and OpenACC directives
  • 11:00 Break
  • 11:20 OpenCL and / or Directives Exercises
  • 12:00 Guest Lecture Kyle Jacobs (NVIDiA) Overview of NVIDIA Volta
  • 13:00 Lunch
  • 14:00 Performance portability and Kokkos
  • 14:30 Exercise: Getting started with Kokkos patterns
  • 15:00 Break
  • 15:10 Kokkos memory management
  • 15:30 Memory management exercises
  • 16:00 Close

Course Materials

Slides and exercise material for this course are available from:


The course will be held at University of Birmingham.


Please use the registration page to register for ARCHER courses.


If you have any questions please contact the ARCHER Helpdesk.