Select Page

Milestone 3 Case Solution


As discussed in Milestone one and two previously, regarding the dataset chosen to assess the population growth in the world with respect to the developed and under-developed countries in the world. To compare the population difference and evaluate that the under-developed tend to have more population density compared to developed countries. Therefore, various statistical methods could be used to determine the distributions relevance with respect to the Research question. However, the dataset has been collected from World Bank database, in which, the data has been divided into developed and under-developed country wise population in the world from the year 1960 to 2015. Furthermore, the previous decision tree used to evaluate the dataset is illustrated below.






Using Weka Software to analyses the raw dataset collected from World Bank database, which includes population data for the developed and under developed country wise data. With the help of which, a decision tree was developed for each variable present in the dataset.

Research question

“Do the under developed countries have developed higher density of population in comparison to the developed countries since the turn of century?”

The objective of this research question was to predict that, the under-developed countries had higher growth population densities compared to developed countries in the world.


The dataset has been used in the Weka Software, with the help of top-down tree building method also known as CART model, a decision tree with respect to developed and under-developed countries has been constructed.

Data Sources 

The dataset has been collect from the link world bank in the World Bank database, with respect to the total population of the world from the year 1960 to 2015. In which, various countries population data was present, with their country codes, indicator name and indicators codes were also present.

Data Cleaning

The data collected was in raw form, therefore, the data was sorted with respect to countries that are included in developed economies and underdeveloped economies. Which, in turn, enable us to gather and effective estimate the total population of the developed and under-developed countries from the year 1960 to 2015. However, other data present was not considered for this analysis.


There were 273 rows available in the dataset available with around 5988 variables in the dataset collected. Furthermore, the correlation coefficient for under-developed 0.8379 and its mean absolute error was 1. Moreover, its relative absolute error amounted to 55.5% and its root relative squared error amount to 48.79%. Additionally, the total number of instances were six. As illustrated in the exhibit below.


weka explorere

This is just a sample partial case solution. Please place the order on the website to order your own originally done case solution.

Share This