Predicting Dwelling Type Using DecisionTree

This project aimed to predict the type of dwelling unit a Palestinian family lives in, based on various features in a dataset collected from the Palestinian Expenditure and Consumption Survey in 2011 and 2014. The dataset was preprocessed to standardize missing values, unify currency, and remove irrelevant features. A total of 20 features were carefully selected through the use of Extra Trees Classifier and Decision Tree algorithm to determine feature importance. A Decision Tree model was built using these selected features and the accuracy was evaluated using both training and testing data. The end goal of this project was to accurately predict the type of dwelling unit a Palestinian family lives in and help to understand the factors that influence the type of housing they live in.

Overview

This project includes 3 code files for preprocessing the data, selecting features, and building a decision tree.

R Files

PreProcessing.R: This file contains the code for preprocessing the data, which includes:

NULL Standardization.
Features Merging.
Currency Unification.
Features Removal.
Categorizing Numeric Data.

DecisionTree.R: This file contains the code for building the decision tree using the C50 package in R.

Python File

ExtraTreesClassifier.py: This file contains the code for applying the Extra Trees Classifier algorithm to determine the most important features, which are then used to build the decision tree.

For a comprehensive understanding of the project, refer to the Report.pdf file which includes a thorough explanation of the methodology, outcomes, and conclusions. The FeaturesDetails.xlsx file also provides a detailed description of all 840 features used in the project. Additionally, a sample from the dataset used in this project, the SefSec_2014_HH_weightNew.sav file, is also included in the repository.

Prerequisites

R
Python 3

Usage

Clone or download the repository to your local machine.
Ensure that you have all the prerequisites installed.
Install R needed packages by running this code install.packages("haven", "tidyverse", "Hmisc", "C50").
Install python needed packages by running the command pip install sklearn pandas.
Execute the code in PreProcessing.R line by line.
Run the ExtraTreesClassifier.py code, and copy the best 30 features from the console output.
In DecisionTree.R, update the features to the ones you copied.
Execute the code in DecisionTree.R line by line.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Dataset		Dataset
Python		Python
R		R
README.md		README.md
Report.pdf		Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Dwelling Type Using DecisionTree

Table of Contents

Overview

R Files

PreProcessing.R: This file contains the code for preprocessing the data, which includes:

DecisionTree.R: This file contains the code for building the decision tree using the C50 package in R.

Python File

ExtraTreesClassifier.py: This file contains the code for applying the Extra Trees Classifier algorithm to determine the most important features, which are then used to build the decision tree.

Prerequisites

Usage

About

Languages

obada-jaras/Predicting-Dwelling-Type-Using-DecisionTree

Folders and files

Latest commit

History

Repository files navigation

Predicting Dwelling Type Using DecisionTree

Table of Contents

Overview

R Files

PreProcessing.R: This file contains the code for preprocessing the data, which includes:

DecisionTree.R: This file contains the code for building the decision tree using the C50 package in R.

Python File

ExtraTreesClassifier.py: This file contains the code for applying the Extra Trees Classifier algorithm to determine the most important features, which are then used to build the decision tree.

Prerequisites

Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages