Speed up iterative algorithm in Python

Question

I'm implementing the forward recursion for Hidden Markov Model: the necessary steps (a-b) are shown here

enter image description here

Below is my implementation

from scipy.stats import norm
import numpy as np
import random

def gmmdensity(obs, w, mu, sd):
    # compute probability density function of gaussian mixture 
    gauss_mixt = norm.pdf(obs, mu, sd)[:,None]*w
    return gauss_mixt

def alpha(obs, states, A, pi, w, mu, sigma):    
    dens = np.sum(gmmdensity(obs, w, mu, sigma), axis = 2)
    # scaling factor is used to renormalize probabilities in order to
    # avoid numerical underflow
    scaling_factor = np.ones(len(obs))
    alpha_matrix = np.zeros((len(states), len(obs)))
    
    # for t = 0
    alpha_matrix[:,0] = pi*dens[0]
    scaling_factor[0] = 1/np.sum(alpha_matrix[:,0], axis = 0)
    alpha_matrix[:,0] *= scaling_factor[0]
    
    # for t == 1:T
    for t in range(1, len(obs)):
        alpha_matrix[:,t] = np.matmul(alpha_matrix[:,t-1], A)*dens[t]
        scaling_factor[t] = 1/np.sum(alpha_matrix[:,t], axis = 0)
        alpha_matrix[:,t] *= scaling_factor[t]

    return alpha_matrix, scaling_factor

Let's generate some data to run the algorithm

obs = np.concatenate((np.random.normal(0, 1, size = 500), 
                        np.random.normal(1.5, 1, size = 500))).reshape(-1,1)
N = 2 # number of hidden states
M = 3 # number of mixture components
states = list(range(N))
pi = np.array([0.5, 0.5]) # initial probabilities
A = np.array([[0.8, 0.2], [0.3, 0.7]]) # transition matrix
mu = np.array([np.min(obs), np.median(obs), np.max(obs)]) # means of mixture components
sigma = np.array([1, 1, 1]) # variances of mixture components
w = np.array([[0.2, 0.3, 0.5], [0.6, 0.2, 0.2]]) # weights of mixture components

Let's see how fast the algorithm is

%timeit alpha(obs, states, A, pi, w, mu, sigma)
13.6 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

Is there any possibility to make this code faster? I thought about using numba or cython, but it never fully worked in this case.

Nick ODell · Accepted Answer

I profiled this code, and found the following ideas were helpful for speeding it up.

Saving/loading the alpha_matrix[:,t] vector multiple times is slow. It is faster to keep the vector you are manipulating as a local variable until you are done manipulating it. Same for scaling_factor[t].
Accessing alpha_matrix[:,t] has poor memory locality. It accesses an element, skips a few elements, accesses an element, and so on. Since you are doing alpha_matrix[:,t] everywhere, I found it was easier to index if you swap the two indices. Once we're done manipulating it, we can transpose it back to its original state so you don't have to change the rest of your code.
Running np.sum(..., axis=0) on an array with only one axis is the same thing as np.sum() without an axis argument, but using the axis argument is slower.
Multiplying by the w array creates a big (1000, 2, 3) array, which you then sum the last dimension of. You can combine these two steps using einsum(). This is faster because it avoids creating the big intermediate array. I copied the code for this from Using Python numpy einsum to obtain dot product between 2 Matrices.
scipy.stats.norm.pdf() is weirdly slow here. I think SciPy is using a finite difference of the CDF to define the PDF? I re-coded it with the formula for the normal PDF, and found that was 3x faster.
The main loop of the algorithm can be extracted into a new function, which we can use Numba to accelerate. (This provided the bulk of the speedup, but I prefer to do this kind of thing last because Numba functions can't be profiled.)

This adds up to a roughly 40x speedup on the provided benchmark:

%timeit alpha_orig(obs, states, A, pi, w, mu, sigma)
14.4 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit alpha(obs, states, A, pi, w, mu, sigma)
308 µs ± 2.02 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Full code:

from scipy.stats import norm
import numpy as np
import random
import numba as nb


@nb.njit()
def fast_normal_pdf(x, mu, sigma):
    x = (x - mu) / sigma
    one_over_sqrt_2pi = 1 / np.sqrt(2 * np.pi)
    return (1 / sigma) * one_over_sqrt_2pi * np.exp(-0.5*x*x)


def gmmdensitysum(obs, w, mu, sd):
    pdf = fast_normal_pdf(obs, mu, sigma)
    return np.einsum('ij,kj->ik', pdf, w)


@nb.njit()
def alpha_inner(N, alpha_matrix, dens, scaling_factor):
    for t in range(1, N):
        current_alpha = (alpha_matrix[t-1] @ A) * dens[t]
        current_scaling_factor = 1 / current_alpha.sum()
        current_alpha *= current_scaling_factor
        scaling_factor[t] = current_scaling_factor
        alpha_matrix[t] = current_alpha


def alpha(obs, states, A, pi, w, mu, sigma):
    dens = gmmdensitysum(obs, w, mu, sigma)
    # scaling factor is used to renormalize probabilities in order to
    # avoid numerical underflow
    scaling_factor = np.ones(len(obs))
    alpha_matrix = np.zeros((len(obs), len(states)))
    
    # for t = 0
    alpha_matrix[0] = pi * dens[0]
    scaling_factor[0] = 1 / np.sum(alpha_matrix[0])
    alpha_matrix[0] *= scaling_factor[0]

    alpha_inner(len(obs), alpha_matrix, dens, scaling_factor)

    return alpha_matrix.T, scaling_factor

Speed up iterative algorithm in Python

Tags:

python

numpy

thesecond

1 Answers

Nick ODell

Recent Activity

Donate For Us

Speed up iterative algorithm in Python

Tags:

python

numpy

thesecond

1 Answers

Nick ODell

Related questions

Recent Activity

Donate For Us