Neural Networks 101: Forward Propagation

A Comprehensive Training Handbook

Mohit Mishra
Nerd For Tech

--

Hello, everyone. I hope you are doing well. This is the second part of my Neural Networks 101 series, in this blog we are going to discuss about the training of machine learning models. With a few examples, this section will primarily cover forward propagation.

You can follow me on Twitter to learn more about this. Every day, I tweet about a variety of subjects here, such as software engineering, system design, deep learning, machine learning, and more.

I will advise going through the loss function once before beginning this blog. I wrote a detailed blog post about it already. Here is a link where you can view it.

Now that everything is in order, let’s move on to our topic:

To minimize a predetermined loss function, neural network models’ parameters are iteratively adjusted during training. This is a useful synopsis of the training procedure that covers gradient descent, backpropagation, forward propagation, and the function of loss functions:

Forward Propagation

Forward propagation is like sending a message through a pipeline. Your input data travels through the neural network, layer by layer. Each neuron in the network takes this data, does some math with it (using weights and biases), and then decides whether to pass the message along to the next neuron, kind of like deciding whether to forward an email.

The main goal of forward propagation is to make predictions. Think of it as guessing what’s going to happen next based on the information you’ve received so far. These predictions are then checked against the real answers to see how accurate they are. This helps the neural network learn and improve over time.

Understanding Neural Architecture

Let’s first review the fundamentals of neural architecture before moving on to forward propagation. Neural networks comprise interconnected layers of neurons, with each layer fulfilling a distinct purpose. Data are received by the input layer, processed by the hidden layers, and predicted by the output layer.

Source: Image by the author.

Example: Consider a simple feedforward neural network for image classification. The input layer receives pixel values, the hidden layers extract features like edges or textures, and the output layer predicts the image’s class (e.g., cat or dog).

Layer by layer, input data is fed into the neural network through forward propagation, which generates predictions. Let us examine the steps involved in the process:

Input Layer: The neural network receives input data. In the data, each input neuron is associated with a particular feature.

Hidden Layer: Multiple hidden layers process data, with neurons utilizing activation functions and weighted connections to conduct calculations. A representation that helps with prediction-making is created from the input data by these computations.

Output Layer: The output layer processes and generates predictions from the output of the last hidden layer. A problem’s type (e.g., binary classification or multi-class classification) determines how many neurons are in the output layer.

Let’s see another example

Now let’s look at an example of using the well-known MNIST dataset to classify handwritten digits. In this instance,

Input Layer: The MNIST dataset consists of 784 × 784 input neurons, with each image being a grayscale picture with a size of 28 × 28 × 28. Every neuron symbolizes a pixel’s intensity, which spans from 0 to 255.

Hidden Layers: There may be more than one hidden layer with different numbers of neurons in each. To extract features from the input images, these hidden layers carry out computations. One type of neuron in the first hidden layer might identify horizontal edges, whereas another type of neuron in the second hidden layer might identify round shapes.

Output Layer: Ten neurons, one for each digit from 0 to 9, would make up the output layer since MNIST uses digits only. The likelihood that the corresponding digit is present in the input image is represented by each neuron’s output. For example, the network predicts that the input image is most likely a 7 if the output neuron corresponding to digit 7 has the highest probability.

By going through these phases, the neural network for image classification converts unprocessed pixel values into insightful predictions, demonstrating the strength and effectiveness of deep learning in resolving challenging problems like image classification.

A Journey into Equations

Tensors, or matrices, are used to represent input data from the real world. Let X be the input data matrix, with each row denoting a sample of data and each column denoting a distinct feature.

For example, in the context of an autonomous vehicle, let X be an m × n matrix that represents sensor readings, where m denotes the number of data samples (time snapshots) and n is the number of sensor measurements.

By multiplying the weight matrix W by the input data matrix X and adding a bias term b, the weighted sum is computed. The weighted sum Z is calculated mathematically as:

Source: Image by the author.

For our example of an autonomous vehicle, let W be a k × k weight matrix and let b be a k × k bias vector, where k is the number of neurons in the layer after this one. The significance of every sensor measurement in forecasting vehicle behavior is captured by the weighted sum Z.

The network gains non-linearity from activation functions. Let the activation function be represented by f. By applying the activation function element-by-element to the elements of the weighted sum matrix Z, i.e.,

Source: Image by the author.

the activation matrix A is calculated.

Using the ReLU activation functions f(x)=max(0,x) and f(y)=max(0,y) in our autonomous vehicle example is something to think about. By introducing non-linearity, the ReLU function enables the neural network to comprehend intricate correlations between sensor readings and vehicle behavior.

Let’s stick with the autonomous car example and say we are using sensor data to determine whether to brake or accelerate the vehicle. Sensor readings from cameras, radar, and lidar are included in the input data matrix X. The weight matrix W represents the significance of every sensor reading, and the bias term b takes into account outside influences on the behavior of the vehicle.

To sum up, learning about neural network training requires exploring the nuances of forward propagation, neural architecture, and the crucial function of equations. Every part is essential, whether it is used to create intelligent systems with mathematical frameworks or transmit data through networked neurons. It is shown that neural networks are capable of handling complex problems by examining real-world scenarios like image classification and autonomous vehicle behavior prediction.

My name is Mohit Mishra, and I’m a blogger that creates intriguing content that leave readers wanting more. Anyone interested in machine learning and data science should check out my blog. My writing is designed to keep you engaged and intrigued with a regular publishing schedule of a new piece every two days. Follow along for in-depth information that will leave you wanting more!

If you liked the article, please clap and follow me since it will push me to write more and better content. I also cited my GitHub account and Portfolio at the bottom of the blog.

All images and formulas attached have been created by AlexNail and the CodeCogs site, and I do not claim ownership of them

Thank you for reading my blog post on Neural Networks 101: Forward Propagation. I hope you find it informative and helpful. If you have any questions or feedback, please feel free to leave a comment below.

I also encourage you to check out my portfolio and GitHub. You can find links to both in the description below.

I am always working on new and exciting projects, so be sure to subscribe to my blog so you don’t miss a thing!

Thanks again for reading, and I hope to see you next time!

[Portfolio Link] [Github Link]

--

--

Mohit Mishra
Nerd For Tech

My skills include Data Analysis, Data Visualization, Machine learning, and Deep Learning. I have developed a strong acumen for problem-solving, and I enjoy ML.