chapter nine

9 Data Analysis using GPU computing

 

This chapter covers:

  • How GPU architectures can help with many data analysis algorithms
  • Using Numba to convert Python code to efficient GPU low-level code
  • Writing highly parallel GPU code to work on common data science structures like matrices
  • Using GPU-native data analysis libraries from Python

Graphics Processing Units - GPUs - were originally designed to make graphics applications more efficient: drawing and animation software, Computer Aided Design and, of course, games!

At some point, it became clear that GPUs could not only do graphics processing, but could also be used to do all kinds of computing, hence the appearance of GPGPUs - General Purpose computing on Graphics Processing Units. GPUs are attractive because they have substantially more computing power than CPUs. They have been successfully used for many applications like scientific computing, crypto, and artificial intelligence. They have massive applications in data science and in making computing more efficient in general.

9.1 Making sense of GPU computing power

9.1.1 Understanding the advantages of GPUs

9.1.2 The relationship between CPU and GPU

9.1.3 GPU internal architecture

9.1.4 Software architecture considerations

9.2 Using Numba to generate GPU code

9.2.1 Installation of GPU software for Python

9.2.2 The Basics of GPU programming with Numba

9.2.3 Revisiting the Mandelbrot example

9.2.4 A NumPy version of the Mandelbrot code

9.3 Performance analysis of GPU code: the case of a CuPy application

9.3.1 GPU-based data analysis libraries

9.3.2 Using CuPy: a high level data science library

9.3.3 A Basic interaction with CuPy

9.3.4 Writing a Mandelbrot generator using Numba

9.3.5 Writing a Mandelbrot generator using CUDA C

9.3.6 Profiling tools for GPU code