1x1 Convolution: Explainer

3 min readMay 21, 2023

In this blog, we will try to deep dive into the concept of 1x1 convolution operation which appeared in the paper ‘Network in Network’ by Lin et al in (2013) and ‘Going Deeper with Convolutions’ by Szegedy et al (2014) that proposed the GoogLeNet architecture.

One of the drawbacks of deep convolution networks (like VGG, AlexNet) is that the number of feature maps often increases with the depth of the network. This problem can result in a significant increase in the number of parameters and computation complexity when large filter sizes are used, such as 5x5 and 7x7. To address the issue, a 1x1 convolution layer can be used that offers a channel-wise pooling, often referred to as feature pooling or projection layer. This simple technique can be used for ‘dimensionality reduction’ by reducing the number of channels while introducing non-linearity.

Dimensionality Reduction

As the name suggests, the 1x1 convolution is a simple operation that involves convolving an input image with filters of size 1x1. Now, let’s try to understand this by taking an example. Suppose that the output of a convolution layer is of shape — N, F, H, W where N is the batch size, F is the #convolution layers and H,W are the output dimensions. After convolving this as input to a 1x1 conv layer with G filters, the output will be of the shape — N, G, H, W. Taking a numerical example, if an input of size 64x64x3 is passed through a single 1x1x3 filter, then the output will have the same height and width as input but only one channel 64x64x1. Now, consider an input (HxWx192) with a large number of channels for example 192. In order to reduce the dimensionality, one can apply a 1x1 convolution with, say, 32filters. The resulting output feature map will have the same spatial dimensions (HxW) but with a reduced number of channels (32). At each spatial location, the 1x1 convolution will independently apply the set of 32 filters to the input channels. Each filter will perform a linear combination of the 192 input channels, generating a single output value.

Computational Efficiency

Next, let’s look at an example to demonstrate how dimension reduction leads to a decrease in computational load. Suppose, a 28x28x128 input feature map needs to be convolved over 32 7x7 filters. This will result in approx. 236 million operations. Now, if we perform the same operation by adding a 1x1 convolution layer before the 7x7 layer, the number of operations (approx. 21 million ops) gets reduced by a factor of ~11. This efficiency is especially beneficial in reducing the computational cost of deep neural networks, enabling faster training and inference.

Network Design

1x1 convolutions are commonly used in network architectures to control the number of channels and adjust the complexity of the model. They are often employed alongside larger convolutions to create a network with a varying number of channels at different layers, allowing the model to capture features at different scales.

Conclusion

Overall, 1x1 convolutions provide a powerful tool for channel-wise feature transformations and dimensionality reduction in deep learning models. They enable efficient and flexible network design while capturing essential relationships between channels in feature maps.

1x1 Convolution: Explainer

WRITER at MLearning.ai // Control AI Video 🗿/imagine AI 3D Models

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

Written by Tauseef Ahmad