Posted on October 20, 2017 by Harish Krishnamurthy .
Jensen’s Inequality states that given g, a strictly convex function, and X a random variable, then,
Here we shall consider a scenario of a strictly convex function that maps a uniform random variable and visualize the inequality theorem. Consider a parabolic function with an offset which is strictly convex as shown in the above diagram (cover-figure), where f(x) is the pdf of the uniform random variable. The EM Algorithm is guaranteed to converge as log-likelihood is a strictly concave function and hence the opposite of the inequality holds true. This means any maximum value that the expected value of the probability distribution of the latent variables can take, is guaranteed to lie below the log-likelihood function. Hence, we can always maximize the expected values over many iterations leading to full convergence. The EM derivation follows from the fact that if g is strictly convex, then E[g(X)] = g(E[X]) holds true if and only if X = E[X] with probability 1 which implies X is a constant. At refactored.ai, we constantly work on such problems that help illustrate concepts in Machine Learning through visual mathematical examples.
%matplotlib inline from scipy import stats, integrate import numpy as np import scipy import pandas as pd import matplotlib.pyplot as plt import seaborn as sns
Plot the Strictly Convex Region
c = -12 x = np.arange(-10, 11, 1) y = np.square(x) + c sns.set_style("darkgrid") plt.plot(x, y)
Expected value of a continuous function of a random variable
Let us look at a uniform random variable $X \in U(a, b)$ where the function on the variable is a convex function. The expected value of a function can be derived with preliminary calculus and it results in the equations as shown.
Uniform Random Variable
Let us create a set of uniform random variables with (a, b)s in a list and plot them:
# (a, b)s in a udf urv_list = [(-1, 2), (-1, 3), (-1, 4), (-1, 5), (-1, 6), (-9, 1), (-9, 2), (-9, 3), (-9, 4), (-9, 5), (-9, 6), (-9, 7)] fig, ax = plt.subplots(figsize=(15, 10)) for (a, b) in urv_list: spread = (b - a) linestyles = ['-'] mu = 1/2*(a + b) x_uniform = np.linspace(a-1, b+1, 1000) left = mu - 0.5 * spread dist = stats.uniform(left, spread) plt.plot(x_uniform, dist.pdf(x_uniform),c='green', label=r'$a=%i, b=%i$' % (a, b)) plt.xlim(-15, 15)
Compare E[g(X)] and g(E[X])
x = np.arange(-10, 11, 1) y = np.square(x) + c def g_ex(a, b): ''' Computes g(E[X]) Args: a (float): Initial Point of Uniform Random Variable. b (float): End Point of Uniform Random Variable. Returns: f(E[X]) : Function of the Expected Value. ''' p_x = 1/(b - a) g_ex_val = ((b**3 - a**3)/3.0 + c*(b - a))*p_x return g_ex_val def e_gx(a, b): ''' Computes E[g(X)] Args: a (float): Initial Point of Uniform Random Variable. b (float): End Point of Uniform Random Variable. Returns: E[g(X)] : Expected value of function. ''' mean = (a + b)/2 return mean**2 + c fig, ax = plt.subplots() for (a, b) in urv_list: mean = (a + b)/2 plt.plot(x, y) plt.plot(mean, e_gx(a, b), 'ro', color='green') plt.plot(mean, g_ex(a, b), 'ro') print(e_gx(a, b), g_ex(a, b)) fig.set_size_inches(12, 8) plt.xlim(-5, 5)
You can see from the above that for all distributions, the E[g(X)] >= g[E(X)]. The green dots show the value of the function of expected value of X and the red dots show corresponding expected values of the function aligned on the y-axis.