Data Science with Java – Part 4 : Testing Hypothesis with the inference package

To test if a certain hypothesis is likely to be true we can take advantage of the Apache commons math inference package.

Considering the tests included in the package is a good opportunity to learn more about statistics and probability theory.

Let´s consider the following binomial test about flipping a coin:


		BinomialTest binomialTest = new BinomialTest();

		double nullHypothesis = 0.5; //fair coin
		int numberOfSuccesses = 9; //number of heads (biased coin)
		
		//Two sided = Represents a right-sided test. H0: p ≤ p0, H1: p > p0.
		AlternativeHypothesis alternativeHypothesis = AlternativeHypothesis.TWO_SIDED;
		int numberOfTrials = 10;

		// Returns the observed significance level, or p-value, associated with
		// a Binomial test.
		double significanceLevel = binomialTest.binomialTest(numberOfTrials, numberOfSuccesses, nullHypothesis,
				alternativeHypothesis);

		double alpha = 0.03; //significance level of the test
		
		// Returns whether the null hypothesis can be rejected with the given
		// confidence level.
		//true if signficanceLevel < alpha
		boolean rejected = binomialTest.binomialTest(numberOfTrials, numberOfSuccesses, nullHypothesis,
				alternativeHypothesis, alpha);

		System.out.println("The significance level is " + significanceLevel);
		System.out.println("Can we reject the null hypothesis?" + rejected);

 

The result that we get is:

The significance level is 0.021484375000000003
Can we reject the null hypothesis?true

The significance level is lower that the expected value alpha; it means that we can discard the test.

In the next posts I will write about the ChiSquare and KolmogorovSmirnov tests too. Stay tuned! 🙂

Data science with Java – Part 3: Statistics with Apache Commons Math library

Although some statistical analysis can be performed with simple Java 8 code (thanks to lambda functions and Stream API),  a lot more with less lines of code can be achieved with libraries Google Guava or Apache Commons Mathematics Library.

I am a big fan of the Apache Foundation, so I will discard Google guava for now.

The commons library offers a couple options for each statistical function.

You can use the class DescriptiveStatistics passing the array of doubles as parameter:


DescriptiveStatistics descriptiveStatistics = new DescriptiveStatistics(testData);
out.println("\nThe mean is " + descriptiveStatistics.getMean());
out.println("The standard deviation is " + descriptiveStatistics.getStandardDeviation());
out.println("The median is " + descriptiveStatistics.getPercentile(50));

or use the classes Mean, Median, etc.


public static double getMean(double[] testData) {
    Mean mean = new Mean();
    return mean.evaluate(testData);
}

The StandardDeviation can be constructued using the sample formual (Bessel´s bias correction) setting the parameter to “true”:


private static double getUnbiasedStandardDeviation(double[] testData) {
// unbiased estimation
    StandardDeviation sdSubset = new StandardDeviation(false);
    return sdSubset.evaluate(testData);
}


private static double getBiasCorrectedStandardDeviation(double[] testData) {
// bias corrected estimation ( n − 1 instead of n in the formula)
    StandardDeviation sdPopulation = new StandardDeviation(true);
    return sdPopulation.evaluate(testData);
}

 

Data Science with Java – Part 2: CSV data into charts

A nice java library called opencsv allows you to import the csv file content and make charts out of it.

Let´s consider for example unemployment in Germany since the reunification. We will use a csv file containing year, amount of people in germany, west and east (four columns)

1991,2602203,1596457,1005745
1992,2978570,1699273,1279297
1993,3419141,2149465,1269676
1994,3698057,2426276,1271781
1995,3611921,2427083,1184838
1996,3965064,2646442,1318622
1997,4384456,2870021,1514435
1998,4280630,2751535,1529095
1999,4100499,2604720,1495779
2000,3889695,2380987,1508707
2001,3852564,2320500,1532064
2002,4061345,2498392,1562953
2003,4376795,2753181,1623614
2004,4381281,2782759,1598522
2005,4860909,3246755,1614154
2006,4487305,3007158,1480146
2007,3760586,2475528,1285058
2008,3258954,2138778,1120175
2009,3414992,2314215,1100777
2010,3238965,2227473,1011492
2011,2976488,2026545,949943
2012,2897126,1999918,897209
2013,2950338,2080342,869995
2014,2898388,2074553,823835
2015,2794664,2020503,774162
2016,2690975,1978672,712303

We can represent it with an index chart by using just JavaFX and the opencsv library:

package de.datascience.charts;

import java.io.FileReader;

import com.opencsv.CSVReader;

import javafx.application.Application;
import javafx.scene.Scene;
import javafx.scene.chart.CategoryAxis;
import javafx.scene.chart.LineChart;
import javafx.scene.chart.NumberAxis;
import javafx.scene.chart.ScatterChart;
import javafx.scene.chart.XYChart;
import javafx.stage.Stage;

public class UnemploymentGermany extends Application {

	@Override
	public void start(Stage stage) throws Exception {
		stage.setTitle("Index Chart Sample");
		final NumberAxis yAxis = new NumberAxis(0, 5000000, 1);
		final CategoryAxis xAxis = new CategoryAxis();

		final LineChart<String, Number> lineChart = new LineChart<>(xAxis, yAxis);
		yAxis.setLabel("People without job");
		xAxis.setLabel("year");
		lineChart.setTitle("Unemployment in Germnay");

		XYChart.Series series = new XYChart.Series();
		XYChart.Series seriesWest = new XYChart.Series();
		XYChart.Series seriesEast = new XYChart.Series();
		
		series.setName("Germany");
		seriesWest.setName("West Germany");
		seriesEast.setName("East Germany");
		
		try (CSVReader dataReader = new CSVReader(new FileReader("docs/unemployment_germany.csv"))) {
			String[] nextLine;
			while ((nextLine = dataReader.readNext()) != null) {
				String year = String.valueOf(nextLine[0]);
				int population = Integer.parseInt(nextLine[1]);
				series.getData().add(new XYChart.Data(year, population));
				int populationWest = Integer.parseInt(nextLine[2]);
				;
				seriesWest.getData().add(new XYChart.Data(year, populationWest));
				int populationEast = Integer.parseInt(nextLine[3]);
				seriesEast.getData().add(new XYChart.Data(year, populationEast));
			}
		}

		lineChart.getData().addAll(series, seriesWest, seriesEast);
		Scene scene = new Scene(lineChart, 500, 400);
		stage.setScene(scene);
		stage.show();
	}

	public static void main(String[] args) {
		launch(args);
	}
}

The output will be the following:

Data Science with Java – Part 1: bar charts with FX

This year some books about using Java for Data science have been released and I am very happy about it!!! It doesn´t have to be Python at any cost.

Let´s dive into this new Java adventure. 🙂

Some basic visualization can be achieved with some FX classes, that can be found in the “javafx.scene.chart” package.

The following code will create a bar chart about the the Shares of Expenditures in 4 countries by category:

package de.datascience.charts;

import javafx.application.Application;
import javafx.scene.Scene;
import javafx.scene.chart.BarChart;
import javafx.scene.chart.CategoryAxis;
import javafx.scene.chart.NumberAxis;
import javafx.scene.chart.XYChart;
import javafx.stage.Stage;

public class ExpendituresShares extends Application {

    final static String FOOD = "Food";
    final static String HOUSING = "Housing";
    final static String TRANSPORTATION = "Transportation";
    final static String HEALTHCARE = "Health care";
    final static String CLOTHING = "Clothing";
    
    final static String USA="U.S.A.";
    final static String UK="United Kingdom";
    final static String CANADA="Canada";
    final static String JAPAN="Japan";

    final CategoryAxis xAxis = new CategoryAxis();
    final NumberAxis yAxis = new NumberAxis();

    final XYChart.Series<String, Number> usaSeries = new XYChart.Series<>();
    final XYChart.Series<String, Number> canadaSeries2 = new XYChart.Series<>();
    final XYChart.Series<String, Number> ukSeries = new XYChart.Series<>();
    final XYChart.Series<String, Number> japanSeries = new XYChart.Series<>();

    public void simpleBarChartByCountry(Stage stage) {
        stage.setTitle("Bar Chart");
        final BarChart<String, Number> barChart
                = new BarChart<>(xAxis, yAxis);
        barChart.setTitle("Shares of expenditures by Country");
        xAxis.setLabel("Category");
        yAxis.setLabel("Percentage");

        usaSeries.setName(USA);
        addDataItem(usaSeries, FOOD, 14);
        addDataItem(usaSeries, HOUSING, 26);
        addDataItem(usaSeries, TRANSPORTATION, 17);
        addDataItem(usaSeries, HEALTHCARE, 8);
        addDataItem(usaSeries, CLOTHING, 4);

        canadaSeries2.setName(CANADA);
        addDataItem(canadaSeries2, FOOD, 15);
        addDataItem(canadaSeries2, HOUSING, 21);
        addDataItem(canadaSeries2, TRANSPORTATION, 20);
        addDataItem(canadaSeries2, HEALTHCARE, 7);
        addDataItem(canadaSeries2, CLOTHING, 6);

        ukSeries.setName(UK);
        addDataItem(ukSeries, FOOD, 20);
        addDataItem(ukSeries, HOUSING, 24);
        addDataItem(ukSeries, TRANSPORTATION, 15);
        addDataItem(ukSeries, HEALTHCARE, 2);
        addDataItem(ukSeries, CLOTHING, 6);
        
        japanSeries.setName(JAPAN);
        addDataItem(japanSeries, FOOD, 23);
        addDataItem(japanSeries, HOUSING, 22);
        addDataItem(japanSeries, TRANSPORTATION, 10);
        addDataItem(japanSeries, HEALTHCARE, 4);
        addDataItem(japanSeries, CLOTHING, 4);

        Scene scene = new Scene(barChart, 800, 600);
        barChart.getData().addAll(usaSeries, canadaSeries2, ukSeries, japanSeries);
        stage.setScene(scene);
        stage.show();
    }

    public void addDataItem(XYChart.Series<String, Number> series,
            String x, Number y) {
        series.getData().add(new XYChart.Data<>(x, y));
    }

    @Override
    public void start(Stage stage) {
        simpleBarChartByCountry(stage);
    }

    public static void main(String[] args) {
        launch(args);
    }

}

If you run the main you should see the following window:

We use cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners.
Cookies settings
Accept
Privacy & Cookie policy
Privacy & Cookies policy
Cookie name Active

Privacy Policy

What information do we collect?

We collect information from you when you register on our site or place an order. When ordering or registering on our site, as appropriate, you may be asked to enter your: name, e-mail address or mailing address.

What do we use your information for?

Any of the information we collect from you may be used in one of the following ways: To personalize your experience (your information helps us to better respond to your individual needs) To improve our website (we continually strive to improve our website offerings based on the information and feedback we receive from you) To improve customer service (your information helps us to more effectively respond to your customer service requests and support needs) To process transactions Your information, whether public or private, will not be sold, exchanged, transferred, or given to any other company for any reason whatsoever, without your consent, other than for the express purpose of delivering the purchased product or service requested. To administer a contest, promotion, survey or other site feature To send periodic emails The email address you provide for order processing, will only be used to send you information and updates pertaining to your order.

How do we protect your information?

We implement a variety of security measures to maintain the safety of your personal information when you place an order or enter, submit, or access your personal information. We offer the use of a secure server. All supplied sensitive/credit information is transmitted via Secure Socket Layer (SSL) technology and then encrypted into our Payment gateway providers database only to be accessible by those authorized with special access rights to such systems, and are required to?keep the information confidential. After a transaction, your private information (credit cards, social security numbers, financials, etc.) will not be kept on file for more than 60 days.

Do we use cookies?

Yes (Cookies are small files that a site or its service provider transfers to your computers hard drive through your Web browser (if you allow) that enables the sites or service providers systems to recognize your browser and capture and remember certain information We use cookies to help us remember and process the items in your shopping cart, understand and save your preferences for future visits, keep track of advertisements and compile aggregate data about site traffic and site interaction so that we can offer better site experiences and tools in the future. We may contract with third-party service providers to assist us in better understanding our site visitors. These service providers are not permitted to use the information collected on our behalf except to help us conduct and improve our business. If you prefer, you can choose to have your computer warn you each time a cookie is being sent, or you can choose to turn off all cookies via your browser settings. Like most websites, if you turn your cookies off, some of our services may not function properly. However, you can still place orders by contacting customer service. Google Analytics We use Google Analytics on our sites for anonymous reporting of site usage and for advertising on the site. If you would like to opt-out of Google Analytics monitoring your behaviour on our sites please use this link (https://tools.google.com/dlpage/gaoptout/)

Do we disclose any information to outside parties?

We do not sell, trade, or otherwise transfer to outside parties your personally identifiable information. This does not include trusted third parties who assist us in operating our website, conducting our business, or servicing you, so long as those parties agree to keep this information confidential. We may also release your information when we believe release is appropriate to comply with the law, enforce our site policies, or protect ours or others rights, property, or safety. However, non-personally identifiable visitor information may be provided to other parties for marketing, advertising, or other uses.

Registration

The minimum information we need to register you is your name, email address and a password. We will ask you more questions for different services, including sales promotions. Unless we say otherwise, you have to answer all the registration questions. We may also ask some other, voluntary questions during registration for certain services (for example, professional networks) so we can gain a clearer understanding of who you are. This also allows us to personalise services for you. To assist us in our marketing, in addition to the data that you provide to us if you register, we may also obtain data from trusted third parties to help us understand what you might be interested in. This ‘profiling’ information is produced from a variety of sources, including publicly available data (such as the electoral roll) or from sources such as surveys and polls where you have given your permission for your data to be shared. You can choose not to have such data shared with the Guardian from these sources by logging into your account and changing the settings in the privacy section. After you have registered, and with your permission, we may send you emails we think may interest you. Newsletters may be personalised based on what you have been reading on theguardian.com. At any time you can decide not to receive these emails and will be able to ‘unsubscribe’. Logging in using social networking credentials If you log-in to our sites using a Facebook log-in, you are granting permission to Facebook to share your user details with us. This will include your name, email address, date of birth and location which will then be used to form a Guardian identity. You can also use your picture from Facebook as part of your profile. This will also allow us and Facebook to share your, networks, user ID and any other information you choose to share according to your Facebook account settings. If you remove the Guardian app from your Facebook settings, we will no longer have access to this information. If you log-in to our sites using a Google log-in, you grant permission to Google to share your user details with us. This will include your name, email address, date of birth, sex and location which we will then use to form a Guardian identity. You may use your picture from Google as part of your profile. This also allows us to share your networks, user ID and any other information you choose to share according to your Google account settings. If you remove the Guardian from your Google settings, we will no longer have access to this information. If you log-in to our sites using a twitter log-in, we receive your avatar (the small picture that appears next to your tweets) and twitter username.

Children’s Online Privacy Protection Act Compliance

We are in compliance with the requirements of COPPA (Childrens Online Privacy Protection Act), we do not collect any information from anyone under 13 years of age. Our website, products and services are all directed to people who are at least 13 years old or older.

Updating your personal information

We offer a ‘My details’ page (also known as Dashboard), where you can update your personal information at any time, and change your marketing preferences. You can get to this page from most pages on the site – simply click on the ‘My details’ link at the top of the screen when you are signed in.

Online Privacy Policy Only

This online privacy policy applies only to information collected through our website and not to information collected offline.

Your Consent

By using our site, you consent to our privacy policy.

Changes to our Privacy Policy

If we decide to change our privacy policy, we will post those changes on this page.
Save settings
Cookies settings