Jensens Inequality hat Guarantees Convergence of EM Algorithm

Jensens Inequality that Guarantees Convergence of the EM Algorithm. Enjoy Colaberry blogs which cover this and many other cutting-edge topics.

Jensen’s Inequality states that given g, a strictly convex function, and X a random variable, then,

Here we shall consider a scenario of a strictly convex function that maps a uniform random variable and visualize the inequality theorem. Consider a parabolic function with an offset that is strictly convex as shown in the above diagram (cover figure), where f(x) is the pdf of the uniform random variable. The EM Algorithm is guaranteed to converge as log-likelihood is a strictly concave function and hence the opposite of the inequality holds. This means any maximum value that the expected value of the probability distribution of the latent variables can take, is guaranteed to lie below the log-likelihood function. Hence, we can always maximize the expected values over many iterations leading to full convergence. The EM derivation follows from the fact that if g is strictly convex, then E[g(X)] = g(E[X]) holds if and only if X = E[X] with probability 1 which implies X is a constant. At refactored.ai, we constantly work on problems that help illustrate concepts in Machine Learning through visual mathematical examples.

%matplotlib inline
from scipy import stats, integrate

import numpy as np
import scipy
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Plot the Strictly Convex Region

c = -12

x = np.arange(-10, 11, 1)
y = np.square(x) + c

sns.set_style("darkgrid")
plt.plot(x, y)

The expected value of a continuous function of a random variable

Let us look at a uniform random variable $X \in U(a, b)$ where the function on the variable is a convex function. The expected value of a function can be derived with preliminary calculus and it results in the equations as shown.

Uniform Random Variable

Let us create a set of uniform random variables with (a, b)s in a list and plot them:

# (a, b)s in a udf
urv_list = [(-1, 2), (-1, 3), (-1, 4), (-1, 5), (-1, 6),
            (-9, 1), (-9, 2), (-9, 3), (-9, 4), (-9, 5), 
            (-9, 6), (-9, 7)]

fig, ax = plt.subplots(figsize=(15, 10))

for (a, b) in urv_list:
    spread = (b - a)
    linestyles = ['-']
    mu = 1/2*(a + b)
    x_uniform = np.linspace(a-1, b+1, 1000)
    
    left = mu - 0.5 * spread
    dist = stats.uniform(left, spread)
    plt.plot(x_uniform, dist.pdf(x_uniform),c='green',
    label=r'$a=%i, b=%i

Compare E[g(X)] and g(E[X])

x = np.arange(-10, 11, 1)
y = np.square(x) + c

def g_ex(a, b):
    '''
    Computes g(E[X])
    Args:
        a (float): Initial Point of Uniform Random Variable.
        b (float): End Point of Uniform Random Variable.
    Returns:
        f(E[X]) : Function of the Expected Value.
        
    '''
    p_x = 1/(b - a)
    g_ex_val = ((b**3 - a**3)/3.0 + c*(b - a))*p_x
    return g_ex_val


def e_gx(a, b):
    '''
    Computes E[g(X)]
    Args:
        a (float): Initial Point of Uniform Random Variable.
        b (float): End Point of Uniform Random Variable.
    Returns:
        E[g(X)] : Expected value of function.
        
    '''
    mean = (a + b)/2
    return mean**2 + c

fig, ax = plt.subplots()

for (a, b) in urv_list:
    mean = (a + b)/2
    plt.plot(x, y)
    plt.plot(mean, e_gx(a, b), 'ro', color='green')  
    plt.plot(mean, g_ex(a, b), 'ro')
    print(e_gx(a, b), g_ex(a, b))

fig.set_size_inches(12, 8)
plt.xlim(-5, 5)

You can see from the above that for all distributions, the E[g(X)] >= g[E(X)]. The green dots show the value of the function of the expected value of X and the red dots show the corresponding expected values of the function aligned on the y-axis.

% (a, b)) plt.xlim(-15, 15)

Compare E[g(X)] and g(E[X])