Colorful dice

An Introduction to Basic Probability

Definition and importance of Probability

What is probability?

Generally, the term probability is a measure of one’s belief in the occurrence of a future event. Probability provides a systematic way to quantify randomness and mathematically model random events and phenomena. It is a foundational discipline for fields like statistics and machine learning.

When you hear the word probability, what do you picture? Gambling? Stocks? Most likely, you imagine an event with uncertainty ties to the outcome. But, how do we measure something that we are uncertain about?

Probability is a numerical representation of this uncertainty. With this very general definition of probability, we will look for a clearer understanding of it in context and how to utilize it to make inferences about a population.

In basic probability, an event is any specific outcome or set of outcomes that can occur in a given situation or experiment. It can range from simple events such as flipping a coin and getting heads, to more complex events like drawing a specific card from a deck. Each event has an associated probability, which is a number between 0 and 1, where 0 represents an impossible event and 1 represents a certain event.

For now, let’s think of probability in its simplest, mathematical terms and refine the definition of probability as we learn so we can tailor it to our context.

The probability of an event can be calculated by dividing the number of favorable outcomes by the total number of possible outcomes. This approach assumes that all outcomes are equally likely. For example, when rolling a fair six-sided die, the probability of rolling a 3 is 1 out of 6, or 1/6.

Importance of probability in everyday life

Understanding probability is important in everyday life because it allows us to make informed decisions, assess risk and interpret things around us.

Probability theory serves as a foundation for many fields, including statistics, data analysis, and decision-making under uncertainty. It is used in everyday situations such as forecasting the weather, traffic congestion, and winning the lottery. Overall, understanding probability gives us the tools to navigate uncertainty and optimize situations in our personal and professional lives.

Probability Fundamentals

Essential Definitions

Basic probability also involves concepts such as sample space, which is the set of all possible outcomes, and complement, which represents the probability of an event not occurring. It utilizes tools such as permutations and combinations to analyze situations with multiple events or objects.

  • An ordered arrangement of r distinct objects is called a permutation. The number of ways of ordering n distinct objects taken r at a time will be designated by the symbol:Symbol for permutations written in Latex
  • The number of combinations of n objects taken r at a time is the number of subsets, each of size r, that can be formed from the n objects. This number will be denoted by (commonly said as “n choose r”):Symbol for combinations in latex
  • An experiment is the process by which an observation is made.
  • A simple event is an event that cannot be decomposed, and corresponds to only one sample point, denoted by the letter E, with a subscript.
  • A discrete sample space is one that contains either a finite or a countable number of distinct sample points
  • The sample space associated with an experiment is the set consisting of all possible sample points, denoted by S.
  • A random variable is a real-valued function for which the domain is a sample space
  • Other definitions associated with sets can be reviewed here.

Probability serves as a measure of likelihood, or the degree of uncertainty associated with an event or outcome. It provides a quantitative way to express how probable or likely it is for a specific event to occur. By assigning a numerical value between 0 and 1 to an event, probability enables us to compare and assess the chances of different outcomes.

Axioms

The concept of probability is built upon a set of fundamental principles known as the axioms of probability. These axioms serve as the foundation for the mathematical framework of probability theory and ensure its logical consistency. Understanding these axioms is crucial for comprehending and applying probability in various contexts. The three axioms are:

  1. Axiom of Non-Negativity: The probability of any event is a non-negative real number. In other words, the probability of an event cannot be negative. It is denoted by P(A) ≥ 0, where P(A) represents the probability of event A.
  2. Axiom of Normalization: The probability of the entire sample space is equal to 1. The sample space consists of all possible outcomes of a particular experiment or event. Mathematically, this axiom is expressed as P(S) = 1, where S represents the sample space.
  3. Axiom of Countable Additivity: For a countable sequence of mutually exclusive events (events that cannot occur simultaneously), the probability of their union is equal to the sum of their individual probabilities. Mathematically, if A₁, A₂, A₃, … are mutually exclusive events, then the probability of their union is given by

P (A₁ ∪ A₂ ∪ A₃ ∪ …) = P (A₁) + P (A₂) + P (A₃) + … =

Theorems

  • The mn rule: with m elements a1, a2, …, am and n elements b1, b2, …, bn, it is possible to form mn = m x n pairs containing one element from each group.
    • The mn rule can be extended to any number of sets.

Theorem for Permutations:

Proof:

Theorem for Combinations:

Proof:

Theorem for the complement:

Proof:

The Central Limit Theorem (CLT) is a fundamental concept in probability theory and statistics. It states that, under certain conditions, the sum or average of a large number of independent and identically distributed (i.i.d.) random variables will be approximately normally distributed, regardless of the shape of the original distribution. In other words, the CLT establishes that the sampling distribution of the mean tends to follow a normal distribution as the sample size increases.

Laws

The following laws play an important role to finding the solutions of probability problems: The multiplicative law of probability and the additive law of probability.

Theorem:

Proof:

Theorem:

Proof:

The law of large numbers is a theorem that explains the outcome of performing an experiment a large number of times. It provides a theoretical justification for the averaging process performed on experiments to obtain precise measurements. It states that if you repeat an experiment independently a large number of times and average the result, then the result should be close to the expected value.

Each of these fundamental concepts provide a building block to understanding probability. However, there are several different types of probability out there, each with its own methods to analyze and understand uncertainty in different contexts.

Types of Probability

There are several different types of probability used to understand the likelihood of an event occurring in various contexts. Some of these include:

  1. Classical probability
  2. Empirical probability
  3. Subjective probability
  4. Conditional probability
  5. Joint probability
  6. Marginal probability
  7. Bayesian probability

Classical Probability

This type applies to contexts where all possible outcomes are equally likely. It assumes a well defined sample space, and is sometimes referred to as “a priori” probability. An example of this is when rolling a six sided die, classical probability assigns a probability of 1/6 to each face.

It is based on theoretical principles and does not depend on empirical data or frequencies.

Empirical Probability

This type is based on empirical data or observed frequencies. Sometimes we refer to this as “experimental” probability, or frequentist approach. It involves determining the likelihood of an event occurring by counting the occurrences of that event in a sample (or, frequency) and dividing it by the total number of observations.

We calculate the empirical probability of an event A using the formula:

P(A) = Number of times event A occurs / Total number of trials or observations

For a simple example, suppose we flip a fair coin 100 times. We observe that it lands on Heads 45 times out of the 100 coin flips. The empirical probability of getting heads (A) is then calculated as 45/100, or 0.45.

It is important to note that empirical probability relies on observed frequencies and is subject to variability due to sampling fluctuations. As we collect more data, the empirical probability converges toward the theoretical or true probability, assuming the underlying process remains consistent.

Subjective Probability

This refers to probabilities assigned based on personal beliefs, or, an individual’s subjective degree of confidence in an event occurring.

Subjective probability recognizes that individuals may have different opinions or beliefs about the likelihood of events, even when faced with the same information. It acknowledges that probabilities can be influenced by subjective factors and can vary from person to person such as biases, intuition or expert opinions.

Conditional Probability

Conditional probability measures the likelihood of an event occurring given that another event has already occurred. We denote this as P(A|B), read as “the probability of event A given event B.” This type refers to contexts where events are dependent on each other.

We calculate this by dividing the joint probability of events A and B by the probability of event B:

where P(A|B) represents the probability of event A occurring given that event B has occurred, P(A and B) represents the probability of both events A and B occurring together, and P(B) represents the probability of event B occurring.

The calculation and interpretation of conditional probability involve understanding the relationship between events.

Example

For example, let’s consider the scenario of drawing cards from a deck. If we draw two cards successively without replacement, we can calculate the probability of drawing a specific card on the second draw, given that we know the outcome of the first draw.

Let’s say event A is drawing a spade on the second draw, and event B is drawing a spade on the first draw. If we already know that the first card drawn is a spade, we are left with one less spade in the deck for the second draw. The conditional probability of drawing a spade on the second draw, given that the first draw is a spade, can be calculated with the above formula: P(A|B) = P(A and B) / P(B)

Since there are 13 spades out of 52 in a standard deck of cards, P(B) = 13 / 52.

P(A and B) is the probability of drawing a spade on both the first and second draws, which is (13/52) * (12/51) since there are 12 spades remaining after the first draw. Interpreting the result of the calculation, we can determine how the knowledge of the first draw being a spade affects the likelihood of drawing a spade on the second draw. If the conditional probability is significantly different from the overall probability of drawing a spade, it indicates that the first draw has influenced the probability of drawing a spade on the second draw.

Joint Probability

Joint probability refers to the probability of two or more events occurring simultaneously. We denote this as P(A and B) or P(A ∩ B).

We calculate the joint probability by multiplying the probabilities of the individual events, if we assume they are independent. For dependent events, the calculation depends on the relationship between the events.

To calculate this, say we have a table of data we collected of students in our class. Let each person identify as either male or female, and prefer either baseball, basketball, or football. The table below shows the outcomes:

BaseballBasketballFootballTotals
Female23211761
Male12131439
Total353431100
Table of Males and Females and what sport they prefer

In this example, there are two categorical variables, sports and gender.

Say we want to calculate the probability of a male preferring baseball. We represent this as:

P (Gender = Male, Sport = Baseball) = 12 / 100 = 0.12

So the probability of both of these occurring at the same time is 12%.

Marginal Probability

Marginal probability refers to the probability of a specific event occurring without considering other events. We obtain this by summing or integrating the joint probabilities over all possible values of the other variables.

The probability that the student in the previous example is a female is 61 / 100 since there are 61 females out of the 100 students.

Bayesian Probability

Bayesian probability is a type of probability that incorporates prior information or beliefs and updates them with new evidence. It is based on Bayes’ theorem, which enables the revision of probabilities as new data becomes available. Bayesian probability allows for a flexible and iterative approach to probability estimation and inference.

These are some of the key types of probability commonly used to analyze uncertainty and make informed decisions in fields such as statistics and mathematics.

Probability Distributions

Probability distributions are mathematical functions that describe the likelihood of different outcomes or events in a given set of circumstances. They provide a way to quantify uncertainty and model random variables in fields such as statistics, physics, and engineering. Probability distributions are fundamental tools in probability theory and statistics, allowing us to analyze data, make predictions, and draw meaningful conclusions.

There are two main types of probability distributions: discrete probability distributions and continuous probability distributions:

  1. Discrete Probability Distributions – used when the random variable can take on a finite or countable number of distinct values. Examples of discrete probability distributions include the binomial distribution, Poisson distribution, and geometric distribution.
  2. Continuous Probability Distributions – used when the random variable can take on any value within a given interval or range. Unlike discrete distributions, continuous distributions use probability density functions (PDFs). The probability of a specific outcome is the area under the PDF curve. Examples of continuous probability distributions include the normal distribution, uniform distribution, and exponential distribution.

Probability distributions serve as powerful tools for analyzing and understanding random variables and their associated probabilities. By fitting data to appropriate probability distributions, we can describe, summarize, and draw meaningful inferences from empirical observations.

Expected Value & Variance

Expected Value

The expected value, also known as the mean or average, is a measure of the central tendency of a probability distribution. It represents the average value we would expect to obtain if we repeated an experiment or observed a random variable an infinite number of times. The expected value is a key concept in probability theory and statistics, and it provides valuable information for decision-making and understanding the characteristics of a random variable.

Calculation

To define and calculate the expected value, follow these steps:

  1. Understand the Random Variable: First, identify and define the random variable of interest. A random variable is a numerical outcome that can take on different values with associated probabilities. It can be discrete or continuous.
  2. Define the Probability Distribution: Determine the probability distribution of the random variable. This involves specifying the possible outcomes or values the random variable can take and assigning probabilities to each value. The probability distribution can be provided directly or inferred from available data or information.
  3. Calculate the Expected Value: Once the probability distribution is known, the expected value can be calculated using the formula:
    • For a discrete random variable X: E(X) = Σ(x * P(x))
    • For a continuous random variable X: E(X) = ∫ (x * f(x)) dx
  4. Perform the Calculation: Substitute the appropriate values into the formula and calculate the expected value. For a discrete random variable, multiply each value by its corresponding probability and sum them. For a continuous random variable, integrate the product of each value and its PDF over the entire range.

The resulting value is the expected value. It is particularly useful when comparing different alternatives or making decisions under uncertainty, as it can guide us towards options with higher expected values.

This value is important to note that the expected value is not always a value that the random variable will actually take on. It may not even be one of the possible outcomes. However, it serves as a useful summary statistic for understanding the central tendency of the distribution.

Variance

Variance is a measure of the dispersion or spread of a random variable’s distribution. It quantifies the average squared deviation of the random variable from its expected value.

The resulting value is the variance. A higher variance indicates a greater degree of variability in the values, while a lower variance suggests more clustered or concentrated outcomes. It is a fundamental concept in probability theory and statistics, often used in conjunction with the expected value for a more comprehensive understanding of the characteristics of a random variable.

Calculation

Calculating the variance involves the following steps:

  1. Identify the Random Variable: Determine the random variable for which you want to calculate the variance. The random variable can be discrete or continuous and represents the outcomes or values of interest.
  2. Define the Probability Distribution: Specify the probability distribution associated with the random variable. This involves identifying the possible values the random variable can take and assigning probabilities or probability densities to each value.
  3. Calculate the Expected Value: Before calculating the variance, compute the expected value of the random variable using the method described earlier.
  4. Calculate the Squared Deviations: For each possible value of the random variable, calculate the squared deviation from the expected value. This is done by subtracting the expected value from each value and then squaring the result. The formula for the squared deviation is (x – E(X))2, where x is the value of the random variable and E(X) is the expected value.
  5. Calculate the Variance: To calculate the variance, take the weighted average of the squared deviations, where the weights are the probabilities or probability densities of the corresponding values.
    • For a discrete random variable, use the formula: Var(X) = Σ[(x – E(X))2 * P(x)]
    • For a continuous random variable, use the formula: Var(X) = ∫ [(x – E(X))2 * f(x)] dx
  6. Perform the Calculation: Substitute the appropriate values into the formula and calculate the variance. Sum the squared deviations multiplied by their respective probabilities (for discrete distributions) or integrate them over the entire range (for continuous distributions).

Conclusion

Probability is a powerful tool for understanding uncertainty and making informed decisions. By grasping the basics of probability and its applications, you can navigate situations with multiple outcomes more effectively. Whether you’re analyzing data, playing games, or assessing risks, probability provides a solid framework to evaluate and quantify uncertainties in the world around us.

  1. Practical Statistics for Data Scientists
  2. Mathematical Statistics

References:

Spiegel, M. R., PhD, Schiller, J., & Srinivasan, R. A. (2013). Probability and Statistics (4th ed.). McGraw Hill.

Wackerly, D.D., Mendenhall, W. and Scheaffer, R.L. (2008) Mathematical Statistics with Applications. 7th Edition, Thomson Learning, Inc., USA

Bruce, P., Bruce, A., & Gedeck, P. (2019). Practical Statistics for Data Scientists (2nd ed.). O’Reilly.