chapter three
3 Naive kernels
This chapter covers
- Naive but correct CUDA kernels.
-
dim3to launch 2D and 3D thread arrays - Implementing naive matrix transpose, GEMM, softmax
- Sliding-window operations: Convolutions and Pooling
In the last chapter, we crossed a major threshold. We went from being a passenger on the .to("cuda") express to getting into the driver’s seat.
This chapter is where we put that knowledge to work. We’re moving from the classical "Hello, World" vector addition kernel to the real substance of deep learning. Our goal is to get some serious practice by implementing several of the most important operations in neural networks from scratch. This chapter serves as a "road map" where we will build naive versions of six key kernels:
- Matrix Transpose: A fundamental data-reshaping operation.
- GEMM (General Matrix Multiplication): The computational heart of every dense layer.
- Softmax: The essential final activation function for classification.
- 1D Convolution: The core of processing sequential data like time series.
- 2D Convolution: The engine of modern computer vision.
- 2D Max Pooling: A critical downsampling and feature-invariance operation.