Data science with Java – Part 3: Statistics with Apache Commons Math library

Although some statistical analysis can be performed with simple Java 8 code (thanks to lambda functions and Stream API),  a lot more with less lines of code can be achieved with libraries Google Guava or Apache Commons Mathematics Library.

I am a big fan of the Apache Foundation, so I will discard Google guava for now.

The commons library offers a couple options for each statistical function.

You can use the class DescriptiveStatistics passing the array of doubles as parameter:

DescriptiveStatistics descriptiveStatistics = new DescriptiveStatistics(testData);
out.println("\nThe mean is " + descriptiveStatistics.getMean());
out.println("The standard deviation is " + descriptiveStatistics.getStandardDeviation());
out.println("The median is " + descriptiveStatistics.getPercentile(50));

or use the classes Mean, Median, etc.

public static double getMean(double[] testData) {
    Mean mean = new Mean();
    return mean.evaluate(testData);
}

The StandardDeviation can be constructued using the sample formual (Bessel´s bias correction) setting the parameter to “true”:

private static double getUnbiasedStandardDeviation(double[] testData) {
// unbiased estimation
    StandardDeviation sdSubset = new StandardDeviation(false);
    return sdSubset.evaluate(testData);
}

private static double getBiasCorrectedStandardDeviation(double[] testData) {
// bias corrected estimation ( n − 1 instead of n in the formula)
    StandardDeviation sdPopulation = new StandardDeviation(true);
    return sdPopulation.evaluate(testData);
}