Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The generalized Student-T probability distribution I coded in Python doesn't integrate to 1 (in some cases)

I've been trying to implement the skewed generalized t distribution in Python to model some financial returns. I based my code on formulas found on Wikipedia, and I used the Beta distribution from scipy.

from scipy.special import beta
import numpy as np
from math import sqrt

def sgt(x, params):
# This function accepts an array of 5 parameters [mu, sigma, lambda, p, q]  
    mu, sigma, lam, p, q = params

    v = (q**(-1/p)) / (sqrt((3*lam*lam + 1)*beta(3/p, q-2/p)/beta(1/p, q) - 4*lam*lam*(beta(2/p, q-1/p)/(beta(1/p, q)))**2))
    m = 2*v*sigma*lam*q**(1/p)*beta(2/p, q - 1/p) / beta(1/p, q)
    fx = p / (2*v*sigma*(q**(1/p))*beta(1/p, q)*((abs(x-mu+m)**p/(q*(v*sigma)**p*(lam*np.sign(x-mu+m)+1)**p + 1)+1)**(1/p + q)))

    return fx

Now, the function seems to work perfectly fine for some sets of parameters, but terribly for other sets of parameters.

For example:

dx = 0.001
x_axis = np.arange(-10, 10, dx)

ok_parameters = [0, 2, 0, 3, 8]
bad_parameters = [0, 2, 0, 1.05, 2.1]

ok_distribution = sgt(x_axis, ok_parameters)
bad_distribution = sgt(x_axis, bad_parameters)

If I try to compute the integrals of those two numbers:

a = np.sum(ok_distribution*dx)
b = np.sum(bad_distribution*dx)

I obtain the results a = 1.0013233154393804 and b = 2.2799746093533346. Now, in theory both of these should be 1, but I assume since I approximated the integral the value won't always be exactly 1. In the second case however I don't understand why the value is so high.

Does anyone know what the issue is?

These are the graphs of the ok distribution (blue) and bad distribution (orange)

like image 848
Varsavia Avatar asked Nov 16 '25 05:11

Varsavia


1 Answers

I believe there was just a typo (though I couldn't exactly find where) in your definition sgt. Here is an implementation that works.

%matplotlib inline
import matplotlib.pyplot as plt
from scipy.special import beta
import numpy as np
from math import sqrt
from typing import Union
from scipy import integrate

# Generalised Student T probability Distribution
def generalized_student_t(x:Union[float, np.ndarray], mu:float, sigma:float, 
                          lam:float, p:float, q:float) \
        -> Union[float, np.ndarray]:


    v = q**(-1/p) * ((3*lam**2 + 1)*(beta(3/p, q - 2/p)/beta(1/p,q)) - 4*lam**2*(beta(2/p, q - 1/p)/beta(1/p,q))**2)**(-1/2)

    m = 2*v*sigma*lam*q**(1/p)*beta(2/p,q - 1/p)/beta(1/p,q)   

    fx = p  / (2*v*sigma*q**(1/p)*beta(1/p,q)*(abs(x-mu+m)**p/(q*(v*sigma)**p)*(lam*np.sign(x-mu+m)+1)**p + 1)**(1/p + q))

    return fx

def plot_cdf_pdf(x_axis:np.ndarray, pmf:np.ndarray) -> None:
    """
    Plot the PDF and CDF of the array returned from the function.
    """
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))
    ax1.plot(x_axis, pmf)
    ax1.set_title('PDF')
    ax2.plot(x_axis, integrate.cumtrapz(x=x_axis, y=pmf, initial = 0))
    ax2.set_title('CDF')
    pass


dx = 0.0001
x_axis = np.arange(-10, 10, dx)

# Create the Two
distribution1 = generalized_student_t(x=x_axis, mu=0, sigma=1, lam=0, p=2, q=100)
distribution2 = generalized_student_t(x=x_axis, mu=0, sigma=2, lam=0, p=1.05, q=2.1)

plot_cdf_pdf(x_axis=x_axis, pmf=distribution1)
plot_cdf_pdf(x_axis=x_axis, pmf=distribution2)

enter image description here

distribution2

We can also check that the integral of the PDFs are 1

integrate.simps(x=x_axis, y = distribution1)
integrate.simps(x=x_axis, y = distribution2)

We can see the results of the integral are 0.99999999999999978 and 0.99752026308335162. The reason they are not exactly 1 is due the CDF being defined as integral from -infinity to infinity of the PDF.

like image 124
mch56 Avatar answered Nov 17 '25 19:11

mch56



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!