A Brief Introduction to Random Variable and Inferential Statistics

Having a great knowledge on Calculus, Linear Algebra, Probability Theory and Statistics is an essential trait every great data scientists posses. A solid understanding on these topics will eventually help an aspiring data scientist a lot in learning Machine Learning models. It will give them an edge among their peers.

Let me introduce a brief introduction of some of these concepts in this article and discuss about their role as an aspiring data scientist’s career growth.

Random Variable: A random variable is a numerically valued variable which takes on different values with given probabilities. It means that it is a function that associates a real number with the events of a Sample Space.

Examples of a random variable are-

  1. The number of customers entering a store,
  2. Number of tails when we toss a coin,
  3. The sales volume of a store on a particular day, etc.

A random variable is represented by capital letters and possible values it can take with small letters. If a is an element of the sample space S and number b is associated with this outcome. Then we can write X(a)=b.

For example if X is the random variable for the number of tails in a coin toss. Here the sample space is S = {H,T}.X can take either 0 or 1 value based on the outcome of the toss either it is head(H) or tail(T) respectively. So, we get X(H) = 0 and X(T) = 1.

There are two types of random variables.

  • Discrete random variable-one that takes on a countable number of possible values. These are the discrete points. Example is getting a number when we roll a dice.
  • Continuous random variable-one that takes on an uncountable number of possible values. These are the continuous intervals. Example is the height of a randomly chosen student in a class.

Now, let’s discuss in detail about different types of random variable and their probability distribution, Cumulative Distribution Function, etc. and how to compute mean, variance, and standard deviation.

1.Discrete random variable:

Probability distribution: The probability distribution of a discrete random variable X is a list of each possible value of X together with the probability that X takes that value in one trial of the experiment.

The probabilities in the probability distribution of a random variable X must satisfy the following two conditions:

  • Each probability P(x) must be between 0 and 1 :0≤P(x)≤1.(4.2.1)
  • The sum of all the possible probabilities is 1 :∑P(x)=1

Mean and Standard Deviation:

  • The mean (expected value) of a discrete random variable X is the number μ=E(X)=∑xP(x)
  • The variance ( σ2 ) of a discrete random variable X is the number σ²=[(x−μ)²].P(x) and is equivalent to the formula σ²=[∑(x²)P(x)]−μ²

Cumulative Distribution Function: The cumulative distribution function (CDF) of random variable X is defined as Fx(x)=P(X≤x), for all x∈R.

Different distributions of Discrete random variable: Here, four popular distributions of discrete random variables are discussed.

1.Bernoulli Distribution: It is generated when we perform an experiment once and it has only two possible outcomes — success and failure. The trials of this type are called Bernoulli trials, which form the basis for many distributions discussed below. Let p be the probability of success and 1 — p is the probability of failure.

Probability Mass Function(PMF):

Here, E(X)=p and V(X)= p(1-p) .

2.Binomial distribution: This is generated for random variables with only two possible outcomes. Let p denote the probability of an event is a success which implies 1 — p is the probability of the event being a failure. Performing the experiment repeatedly and plotting the probability each time gives us the Binomial distribution. Example is PMF of flipping a coin n number of times and calculating the probabilities of getting a particular number of heads. PMF is given as-

,where p is the probability of success, n is the number of trials and x is the number of times we obtain a success. Also E(X) = np and Var(X) = np*q .

3.Geometric Distribution: It measures the number of failures we get before one success. PMF is

,where p is the probability of success and k is the number of failures. Here, E(X)=1/p and Var(X)=(1-p)/p .

4.Poisson Distribution: This distribution describes the events that occur in a fixed interval of time or space. An example might make this clear. Consider the case of the number of calls received by a customer care center per hour. We can estimate the average number of calls per hour but we cannot determine the exact number and the exact time at which there is a call. Each occurrence of an event is independent of the other occurrences. PMF is

,where λ is the average number of times the event has occurred in a certain period of time, x is the desired outcome and e is the Euler’s number. Also E(X)=Var(X)=λ.

2.Continuous random variable:

Probability density function: The probability density function (pdf), denoted f , of a continuous random variable X satisfies the following properties-

  • f(x)≥0 ,for all x∈R
  • f is piecewise continuous
  • f(x)dx=1,with integral intervals in (−∞<x<∞)
  • P(a≤X≤b)=∫f(x)dx where integral is done in x=a and x=b intervals.

Following are some points that is valid for a continuous random variable.

  • A continuous random variables have zero point probabilities
  • Probability for a continuous random variable is given by areas under pdf’s.

Cumulative Distribution Function: Let X have pdf f , then the cdf F is given by

Mean and Standard Deviation:

  1. Expected value (or mean) of X is given by

2. Variance is given by

Different distributions of continuous random variable: Here, some popular distributions of continuous random variables are discussed.

  1. Uniform Distribution: A variable X is said to be uniformly distributed if the density function is-

Graph of uniform distribution is

Here E(X) = (a+b)/2 and V(X) = (b-a)²/12 .

2. Normal Distribution: Any distribution is known as Normal distribution if it has the following characteristics-

  • The mean, median and mode of the distribution coincide.
  • The curve of the distribution is bell-shaped and symmetrical about the line x=μ.
  • The total area under the curve is 1.
  • Exactly half of the values are to the left of the center and the other half to the right.

The PDF of a random variable X following a normal distribution is given as

Here, E(X) = µ and Var(X) = σ². The graph of a random variable X ~ N (µ, σ) is shown below.

A standard normal distribution is defined as the distribution with mean 0 and standard deviation 1. For such a case, the PDF becomes

3. Exponential Distribution: Exponential distribution is widely used for survival analysis. From the expected life of a machine to the expected life of a human, exponential distribution successfully delivers the result.

A random variable X is said to have an exponential distribution with PDF-

and parameter λ>0 which is also called the rate.

For survival analysis, λ is called the failure rate of a device at any time t, given that it has survived up to t. Here, E(X) = 1/λ and Var(X) = (1/λ)².

An exponential distribution graph is given below as

Joint Distributions:

Definition for discrete random variables: If discrete random variables X and Y are defined on the same sample space S , then their joint probability mass function (joint pmf) is given by p(x,y)=P(X=x and Y=y),where (x,y) is a pair of possible values for the pair of random variables (X,Y) , and p(x,y) satisfies the following conditions-

  • 0≤p(x,y)≤1
  • ∑∑(x,y)p(x,y)=1
  • P((X,Y)∈A))=∑∑p(x,y)

Marginal probability mass functions of X and Y are respectively given by the following: Suppose that discrete random variables X and Y have joint pmf p(x,y) . Let x1,x2,…,xi,… denote the possible values of X , and let y1,y2,…,yj,… denote the possible values of Y . The marginal probability mass functions (marginal pmf’s) of X and Y are respectively given by the following-

Definition for continuous random variables: Two random variables X and Y are jointly continuous if there exists a nonnegative function fXY:R²→R, such that, for any set A∈R², we have-

The function fXY(x,y) is called the joint probability density function (PDF) of X and Y.

Marginal PDF is-

Suppose that X and Y are jointly distributed discrete random variables with joint pmf p(x,y) .If g(X,Y) is a function of these two random variables, then its expected value is given by the following-

Below is the joint probability distribution graph-

Sampling: Samples are parts of a population. For example, you might have a list of information on 1000 people (your “sample”) out of 10,0000 people (the “population”). You can use that list to make some assumptions about the entire population’s behavior. However, it’s not that simple. When you do stats, your sample size has to be ideal — not too large or too small. Then once you’ve decided on a sample size, you must use a sound technique to collect the sample from the population.

Sampling Distribution is about how a sample statistic is distributed when repeated trials of size n are taken. Sampling a random variable X means generating a domain value x ∈ X in such a way that the probability of generating x is in accordance with p(x) (respectively, f(x)), the probability distribution(respectively, probability density) function associated with X

Central Limit Theorem(CLT): It states that the distribution of sample approximates a normal distribution (also known as a “bell curve”) as the sample size becomes larger, assuming that all samples are identical in size, and regardless of the population distribution shape.

In other words, average of your sample means will be the population mean. In other words, add up the means from all of your samples, find the average and that average will be your actual population mean. Similarly, if you find the average of all of the standard deviations in your sample, you’ll find the actual standard deviation for your population. It’s a pretty useful phenomenon that can help accurately predict characteristics of a population.

Following points can be deduced for CLT:

  1. The central limit theorem (CLT) states that the distribution of sample means approximates a normal distribution as the sample size gets larger.
  2. Sample sizes equal to or greater than 30 are considered sufficient for the CLT to hold.
  3. A key aspect of CLT is that the average of the sample means and standard deviations will equal the population mean and standard deviation.
  4. A sufficiently large sample size can predict the characteristics of a population accurately.

Below shows one of the simplest types of test: rolling a fair die by applying CLT. The more times you roll the die, the more likely the shape of the distribution of the means tends to look like a normal distribution graph.

Confidence Intervals: A confidence interval is how much uncertainty there is with any particular statistic. Confidence intervals are often used with a margin of error. It tells you how confident you can be that the results from a poll or survey reflect what you would expect to find if it were possible to survey the entire population. Confidence intervals are intrinsically connected to confidence levels.

A 95% confidence interval gives you a very specific set of numbers for your confidence level. For example, let’s suppose you were surveying a local school to see what the student’s state test scores are. You set a 95% confidence level and find that the 95% confidence interval is (780,900). That means if you repeated this over and over, 95 percent of the time the scores would fall somewhere between 780 and 900.

Aspiring Data Scientist