Skip to content

Sammandy/CreditCard-FraudDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Credit Card Fraud

Digital payments and cyber criminal activities are evolving and fraud is very common for both Card-Present and Card-not present type of payments. A dataset containing information on credit card fraud was analyzed using matplotlib and machine learning classification models to determine trends in fraudulent purchases.

Dataset

The dataset was obtained from kaggle : Credit_Card_Fraud_Data

Exploratory Data Analysis

Pandas was used to import and understand the data, while sklearn was used to normalize the data, and matplotlib was used to create bar graphs.

card_df

The probability of fraud remains constant as the distance from home increases but at a certain distance, the probability spikes to its maximum value, the probability goes from 5% to 35%.

distance_from_home

As the distance from last transaction increases, fraud increases too.

distance_from_last_trans

The below bar graph shows that the probability of fraud remains constant as the distance increases but at a certain distance, the probability spikes from less than 3% all the way to 40% and then to more than 60%. Therefore, higher ratio to median purchase price increases the probability of fraud.

ratio_to_median_purchase

Also, the below graph reveals that repeat retailer don't affect the probability of a transaction being fraud.

repeat_retailer

Overall, used pin number transactions are less likely to be a fraud, while used chip transactions are more likely to be a fraud.

used_pin

used_chip

Building Machine Learning Model

Unbalanced Data:

unbalanced

Balanced Data:

balanced

Then, Logistics Regression model, Random Forest Classifier model. As a result, the accurracy and precision for both models were high since they were either at 1 or very close to 1 (see below images of results).

Oversampling Accuracy and Precision

Logistic Regression:

OLRR

Random Forest Classifier:

ORFCR

Decision Trees:

ODTR

Accuracy and Precision

Logistic Regression:

ULRR

Random Forest Classifier:

URFCR

Decision Trees:

UDTR

Webpage Development

HTML and CSS was used for webpage development. A brief description on credit card fraud & its significance was included along with a form that uses ML model to depict whether a credit card transaction was legit or fraudulent once the fields are completed with data.

webpage

Flask App

The flask application takes the Random Forest Classifier machine learning model using the undersampled balanced dataset to determine if a purchase was fraudulent. The flask app connects the model to the HTML page allowing for user inputs on the webpage form and feeding those into the machine learning model. The flask app then returns the result back to display on the webpage.

Conclusion

The probability of fraud remains constant as the distance from home increases but at a certain distance, the probability spikes to its max value. Also, as distance from last transaction increases, fraud increases too. In addition, used pin number transactions are less likely to be a fraud, while used chip transactions are more likely to be a fraud.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published