Pedals and Probabilities: A Practical Guide to Understanding Probability (Part 1)

Abhinandhan Raghu
12 min readJul 19, 2023

In the intricate world of machine learning algorithms, probability serves as the foundational pillar. To truly decipher the mechanisms and theories of these algorithms, it’s essential to have a firm understanding of probability fundamentals. In this three-part blog series, we delve into the basics of probability and conditional probability, using the engaging context of cycling and supplemented by dynamic Python simulations.

In Part 1, we will first navigate through the basics of probability and conditional probability, highlighting their crucial role in predictions, forecasting, and quantifying uncertainty. Subsequently, we will apply these concepts to a practical scenario — assisting a cyclist in planning a trip. As we journey through this example, we’ll also introduce important concepts such as the Law of Total Probability.

So, let’s embark on this intriguing ride through the landscape of probability, where theory meets practice, and abstract concepts come to life through real-world applications and simulations. In the next part of this series, we’ll further delve into the world of probability, introducing the fascinating subject of probability distributions. We’ll explore the most common distributions and their applications in a practical context, setting the stage for even more complex scenarios in the later parts of this series. Stay tuned for this exciting journey!

Photo by Jay Miller on Unsplash

Setting the Pace: Understanding Probability

The concept of probability measures the likelihood of an event occurring. Take, for instance, a coin toss with a fair coin. The chance of the coin landing either heads or tails is 50%. In probability terms, we say that the event of the coin landing heads or event ‘H’ has a probability of 0.5, and so does event ‘T’, where the coin lands tails. Here, ‘H’ and ‘T’ represent the sample space, which is the set of all possible outcomes. If we extend this to two coin tosses, the sample space becomes {(H,H), (H,T), (T,H), (T,T)}. In the case of a fair coin, each event in this sample space has an equal likelihood of occurring, hence a probability of 0.25 or 25%.

Example 1: Weather Forecasting

We frequently encounter the application of probability in weather forecasting. For instance, suppose we want to estimate the likelihood of rain in Toronto on June 27th, 2023. To make this prediction, we can leverage historical weather data from the Environment Canada website. This extensive data provides several weather-related metrics such as daily mean temperature, total rainfall, and total snowfall.

The process to calculate the probability of rain involves determining the ratio of the total number of rainy days in June from 2010 to 2022 to the total number of days during the same period. The Python code snippet provided below supports this analysis by performing the following steps:

· It loads the weather station inventory.

· Filters out the specific Toronto weather station data.

· Scrapes the weather data for each year of operation.

· Generates a bar chart depicting the count of rainy days in June from 2010 to 2022.

By visually representing this data, we can gain valuable insights about weather patterns in Toronto, specifically the frequency of rainy days in June over the past decade. The bar chart generated will provide a clear picture of the number of rainy days in June for each year from 2010 to 2022, thus aiding in our prediction for June 27th, 2023.

Distribution of Rainy Days in June in Toronto (2010–2022) Source: Image by the author.

As per the chart, there were 127 rainy days over the last 13 years during June. Hence, the probability of a rainy day in June 2023 is 127/390 or approximately 0.325 (33%).

Example 2: Bike Route Planning

Let’s switch gears and explore a more exciting scenario involving cycling. Consider a cyclist planning a weekend bike packing tour around Toronto. The cyclist decides on a 2-day or a 3-day trip, depending on factors like route difficulty, fitness levels, and sightseeing spots. The shortlisted routes are Toronto — Niagara Falls, Toronto — Cobourg, and Toronto — Barrie (each with total trip length of around 125–135 km) , with a return trip to Toronto via public transport.

Which route should the cyclist take? Should there be a rest day? How can the cyclist optimize the route for sightseeing and dining spots? We’ll explore these questions in depth later in the next blog. For now, applying our understanding of probability, without considering external factors like road and weather conditions, the cyclist is equally likely to choose any of the three routes. Thus, the probability of choosing the route to Barrie is 1/3 or 33%.

Cyclist’s Shortlisted Routes for Bike Packing Trip Source: Google Maps.

However, in real life, the probability of choosing a route isn’t always equal. It might depend on weather conditions, hotel availability, hotel costs, etc. How does the probability change in these circumstances? That’s where conditional probability comes into play. Conditional probability extends our understanding of probability by considering prior evidence and conditions, providing a more realistic probability value. Let’s explore this concept and its impact on the cyclist’s route selection.

Off the Beaten Path: Conditional Probability

The probability of any event captures the likelihood of that event happening. However, this does not take into account any other external factors or evidence. Conditional probability takes this one step forward by taking into account any prior evidence to give a more realistic probability value.

Let’s imagine a scenario where a coin is tossed 15 times, and our goal is to determine the probability of the coin landing heads on the last toss. Without any prior knowledge, it’s difficult to ascertain whether the coin is fair or biased. Consequently, we might need to adjust our guess for the outcome of the 15th toss based on the results of the preceding 14 tosses.

For instance, if the coin lands heads 12 out of the first 14 times, we can build a stronger case that the coin is not fair, but rather biased towards landing heads. This process is known as conditioning the probability on evidence.

In the absence of such conditioning or evidence, we would assume the coin is fair. As such, the probability of it landing heads on the 15th toss would simply be 50%, irrespective of the coin’s potential bias.

Bayes’ Theorem: A Powerful Tool in Predictive Analysis

The concept of conditional probability brings us to an important theorem in probability theory — Bayes’ Theorem. Named after the British mathematician Thomas Bayes, this theorem provides a mathematical framework for updating probabilities based on new evidence.

In mathematical terms, Bayes’ theorem can be expressed as:

P(A|B) is the conditional probability of event A occurring given that B is true.

P(B|A) is the conditional probability of event B occurring given that A is true.

P(A) and P(B) are the probabilities of events A and B respectively.

To illustrate this concept, let’s return to our coin toss example. Suppose we start with an initial belief that our coin is fair, assigning a 50% probability to the hypothesis that the coin is fair ( P(A)). However, after observing 12 heads in 14 tosses, we might question our initial belief.

The probability of getting 12 heads in 14 tosses given that the coin is fair (P(B|A) ) can be calculated using the binomial probability formula (more on binomial probability in a later blog!). It is quite small because getting 12 heads in 14 tosses is unlikely with a fair coin. Let’s assume that is 0.02 (2%).

The probability of getting 12 heads in 14 tosses, regardless of whether the coin is fair or not ( P(B)), is also small. Let’s assume P(B) is 0.05 (5%) for the sake of this example.

According to Bayes’ theorem, the updated probability ( P(A|B))— the probability that the coin is fair given that 12 out of 14 tosses are heads — is:

So, the updated probability that the coin is fair, given our new evidence, is 20%. This is significantly lower than our initial belief of 50%. This updated probability reflects our revised belief about the fairness of the coin after taking into account the new evidence.

Bayes’ theorem is a powerful tool in many fields, including statistics, computer science, and artificial intelligence. It provides a solid foundation for predictive modeling and machine learning algorithms, which rely on updating predictions as new data becomes available.

Continuing with the cycling example from previous section, let’s consider the unconditional probability of the cyclist biking along the Toronto — Cobourg route. It’s the same as the unconditional probability of biking along the Toronto — Niagara Falls and Toronto — Barrie routes, with an equal probability of 33% or (1/3). It’s Thursday, August 3, 2023, and the cyclist really wants to do the Toronto — Cobourg route during the weekend of August 5–8, 2023, as long as the chance of rain in the weather forecast for the weekend does not exceed 40% along this route. Let’s also assume that the likelihood of the cyclist choosing Cobourg as the destination reduces by 10% for every 10% increase in chances of rain during the weekend. Then, how likely is the cyclist to choose the Toronto — Cobourg route? The following chart captures the conditional probability and compares it against the unconditional probability which is always 1/3.

Unconditional Vs Conditional Probability — Probability of the Cyclist Choosing Toronto — Cobourg Route Source: Image by the author.

This scatter plot visualizes the difference between the unconditional and conditional probabilities of a cyclist choosing the Toronto-Cobourg route, given the chance of rain.

In the plot, the x-axis represents the chance of rain. The blue dots represent the unconditional probability of the cyclist choosing the Cobourg route, which remains constant at 1/3 regardless of the chance of rain.

On the other hand, the colored dots represent the conditional probability, which varies depending on the chance of rain. As the chance of rain increases beyond 40%, the conditional probability of the cyclist choosing the Cobourg route decreases, depicted by the dots descending on the plot. The colors of these dots represent the progression of the chance of rain, starting from blue (0% chance of rain) and progressing to red (100% chance of rain). This illustrates how the probability of the cyclist choosing the Cobourg route decreases as the chance of rain increases.

This plot offers a clear visualization of how conditional probability takes into account additional information (in this case, the chance of rain) and how it can differ significantly from unconditional probability.

Below is the Python code used for this exercise:

Crossroads: The Law of Total Probability (LOTP)

In the previous sections, we explored how the cyclist’s decision to bike to Cobourg could change based on a single condition — the chance of rain. But what happens when multiple variables come into play? Factors such as hilly terrain, scheduled road closures, or limited options for overnight stays can all influence the decision. This is where the Law of Total Probability (LOTP) enters the picture.

LOTP allows us to calculate the overall probability of an event (A) based on multiple independent events (X, Y, Z,…). It’s expressed as follows:

In our cycling example, we can express the probability of the cyclist biking to Cobourg as:

Where, is the overall probability of the cyclist biking to Cobourg,

P(TC|R) is the probability of the cyclist biking to Cobourg given the chances of rain,

P(R) is the probability of the rain along the Toronto — Cobourg route,

P(TC|H) is the probability of the cyclist biking to Cobourg given the route is very hilly,

P(H) is the probability of encountering hilly terrains along the route,

P(TC|RC) is the probability of the cyclist biking to Cobourg given the road closures along the route,

P(RC) is the probability of weekend road closures along the route.

As you can see, the overall probability of the cyclist choosing the Cobourg route is the sum of the individual conditional probabilities. This provides a more comprehensive picture, accounting for multiple factors.

In the next section, let’s visualize this with a Python simulation.

Navigating Through Uncertainty: Demonstrating the Law of Total Probability

We’ll now illustrate the Law of Total Probability using a practical example involving a cyclist’s decision-making process. We’ll take into account two key factors that might influence the cyclist’s choice of route: the probability of rain and the level of hilly terrain.

The factors are defined as follows:

· The probability of rain: This is a value ranging from 0 to 1, with 0 indicating no chance of rain and 1 indicating a certain chance of rain.

· The hilly terrain level: This is also a value ranging from 0 to 1, with 0 representing a flat terrain and 1 representing a very hilly terrain.

In our example, we’ll assume that the probability of the cyclist choosing the Cobourg route decreases linearly as either the rain or terrain level increases. If it’s likely to rain or the terrain is hilly, the cyclist is less likely to choose the Cobourg route.

For each combination of rain and terrain levels, we’ll calculate the probability of the cyclist choosing the Cobourg route. By visualizing these probabilities on a 3D plot, we can see how the cyclist’s route choice changes as a function of both the rain and terrain levels, providing a visual demonstration of the Law of Total Probability.

Law of Total Probability Illustration Source: Image by the author.

This 3D plot visualizes the probability of a cyclist choosing the Toronto-Cobourg route, based on the likelihood of rain and the presence of hilly terrain. The x-axis represents the probability of rain, the y-axis represents the probability of encountering hilly terrain, and the z-axis (height) represents the probability of the cyclist choosing the Cobourg route.

As you can see from the plot, as the probability of rain or the probability of hilly terrain increases, the probability of the cyclist choosing the Cobourg route decreases. This is indicated by the slope of the surface, which goes downwards as you move away from the origin (0,0).

At the origin (0,0), where there is no rain and no hilly terrain, the probability of choosing the Cobourg route is at its maximum (1.0). As we move towards higher probabilities of rain or hilly terrain, the likelihood of choosing the Cobourg route decreases, as represented by the descending surface.

In essence, this plot shows how the cyclist’s route choice is influenced by the interplay of two conditional variables — rain and terrain. The lower the chances of rain and hilly terrain, the more likely the cyclist is to choose the Cobourg route, and vice versa. This is a practical demonstration of how conditional probability can be used to make decisions in a complex environment with multiple influencing factors.

Below is the code used for this exercise.

Recap and Looking Ahead

In this first part of our journey through probability, we’ve laid the groundwork by understanding the key concepts of probability and conditional probability. We started with the basics, explaining how probability gives us a way to quantify uncertainty. We then introduced the concept of conditional probability, which refines our probability estimates based on prior evidence or conditions. We saw how conditional probability gives us a more nuanced and accurate picture of the likelihood of events.

We then delved into the Law of Total Probability, a fundamental principle that allows us to compute the probability of an event by considering all possible ways it can occur. We demonstrated this concept through an engaging example of a cyclist deciding on a route based on the likelihood of rain and the level of hilly terrain.

So, what’s next? In the second part of this series, we’ll explore the fascinating world of probability distributions. We’ll delve into the most common distributions like the binomial, normal, and Poisson, and understand their characteristics, assumptions, and applications. This knowledge will set a strong foundation for the third part of the series where we’ll return to our cyclist and apply these distributions to make informed decisions about the route, rest stops, and sightseeing spots.

Stay tuned for an exciting ride through the application of probability in decision making!

Conclusion

Understanding probability and conditional probability is not just crucial in the field of statistics, but it’s also essential in our daily decision-making. These principles lay the groundwork for more advanced topics such as probability distributions, which we’ll explore in the next part of this series.

Whether you’re an experienced data scientist looking to brush up on your knowledge or a beginner just starting out, grasping these fundamentals is an important first step. I look forward to delving into the world of probability distributions in Part 2 and showing how these theories come to life. Thank you for joining me on this journey through the landscape of probability, and I hope to see you in Part 2!

WRITER at MLearning.ai // Code Interpreter // Animate Midjourney

--

--

Navigating the data-driven world through the lens of statistics and probability