concept dimension in category deep learning

This is an excerpt from Manning's book Deep Learning for Vision Systems MEAP V08 livebook.
The image above has a size of 32 x 16. This means that the dimensions of the image is 32 pixels wide and 16 pixels tall. The X axis starts from 0 to 31 and Y axis from 0 to 16. Overall, the image has 32x16 = 512 pixels. In this grayscale image, each pixel contains a value that represents the intensity of light on this specific pixel. The pixel values vary from 0 to 255. Since the pixel value represent the intensity of light, then the value 0 represent very dark pixels (black), 255 is very bright (white) and the values in between represent the intensity on the grayscale.
Since MLPs only take the input as a 1D vector with dimensions (1, n), they cannot take the raw 2D image matrix with dimensions (x, y). To fit the image in the input layer, we first need to transform our image into one large vector with the dimensions (1, n) that contains all the pixels’ values of the image. This process is called image flattening. In this example, the total number (n) of pixels in this image is 28 x 28 = 784. Then, in order to feed this image to our network, we need to flatten the (28x28) matrix to one long vector with dimensions (1, 784). The input vector will look like this:
Figure 8.4: The GAN architecture is composed of Generator and Discriminator networks. Note that the Discriminator network is a typical CNN where the convolutional layers reduce in size until they get to the flattened layer. The Generator network on the other hand is an inverted CNN that starts with the flattened vector and the conv layers increase in size until they form the dimension of the input images.
![]()
def build_discriminator(): # instantiate a sequential model and name it discriminator discriminator = Sequential() # add a convolutional layer to the discriminator model discriminator.add(Conv2D(32, kernel_size=3, strides=2, input_shape=(28,28,1), padding="same")) # add a leakyRelu activation function discriminator.add(LeakyReLU(alpha=0.2)) # add a dropout layer with a 25% dropout probability discriminator.add(Dropout(0.25)) # add a second convolutional layer with zero padding discriminator.add(Conv2D(64, kernel_size=3, strides=2, padding="same")) # add a zero padding layer to change the dimension from 7x7 to 8x8 discriminator.add(ZeroPadding2D(padding=((0,1),(0,1)))) # add a BatchNormalization layer for faster learning and higher accuracy discriminator.add(BatchNormalization(momentum=0.8)) discriminator.add(LeakyReLU(alpha=0.2)) discriminator.add(Dropout(0.25)) # add a third convolutional layer with batch norm, leakyRelu, and a dropout discriminator.add(Conv2D(128, kernel_size=3, strides=2, padding="same")) discriminator.add(BatchNormalization(momentum=0.8)) discriminator.add(LeakyReLU(alpha=0.2)) discriminator.add(Dropout(0.25)) # add the fourth convolutional layer with batch norm, leakyRelu, and a dropout discriminator.add(Conv2D(256, kernel_size=3, strides=1, padding="same")) discriminator.add(BatchNormalization(momentum=0.8)) discriminator.add(LeakyReLU(alpha=0.2)) discriminator.add(Dropout(0.25)) # flatten the network and add the output Dense layer with sigmoid activation function discriminator.add(Flatten()) discriminator.add(Dense(1, activation='sigmoid')) # set the input image shape img = Input(shape=(28,28,1)) # run the discriminator model to get the output probability probability = discriminator(img) # return a Model that takes the image as an input and produces the probability output return Model(inputs=img, outputs=probability)