In a Java program, how can I determine if a dataset I have is following or not a normal distribution?
Is it possible?
Is there an API or an algorithm that I can use that determines this?
There are two questions here: how to determine if a distribution is normal and how to do so in Java. As the first link will show you, there are varying degrees of how certain you want to be that you are looking at normal data from the formal to the informal. The second link shows that there aren't standard Java packages for statistical analysis but many other ways to implement them.
This is a somewhat difficult statistical question and if you're not an expert in statistics, it seems deceptively simple. Your goal apparently is to determine whether the data could plausibly have come from any normal distribution, not one with a pre-specified mean and variance. Probably the best way to do this is with D'Agostino test, which is based on measuring the skewness and kurtosis of the distribution and comparing these to what's expected under normality.
As far as Java implementations, there are none that I'm aware of, although I don't regularly use Java. I would be slightly surprised if there is one, as it's a relatively obscure statistical function and Java isn't the most common language to use for statistics. However, my D language implementation (search in this file for dAgostinoK()) could probably be trivially translated to Java if you already have functions for computing skewness, kurtosis and the CDF of the Chi-Square distribution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With