Data Analysis & Prediction using Python Tools/Machine Learning This is a CISC 4900 group project which is on Stock Analysis based on three different markets
NOTE: PLEASE DO NOT USE OUR MODELS OR PREDICTION AS A MEANS TO BUY/SELL STOCK. THIS WAS ONLY MEANT FOR A CS RESEARCH COURSE, ANY INVESTMENT MADE IS DONE ON YOUR OWN ACCORD
Abstract:
This project focused on how to make a profit in the stock market. A 12-week investment simulation will be conducted analyzing day trading and bonds in different markets. We want to be able to figure out the factors that attribute to the profits from the stock. Through this project, the knowledge learned from this will be used to create leverage for increasing a person's investing power for the future.
Goals:
The main goal of this project is to investigate investment opportunities while weighing the risks and benefits while making educated investment decisions. A detailed understanding of the risks and the opportunities on high return for each industry. By gaining all this information, this will be used to help make future decisions in making smart investments.
We want to be able to analyze specific stock markets such as Technology, Airlines, and Healthcare based on the effects of the coronavirus outbreak globally. By retrieving three different kinds of stocks in each market, we will be able to determine how well each stock will end up doing. By analyzing the stock market data with Python (Pandas, NumPy, matplotlib, Seaborn, etc.). We will also store the data via MySql which will also help us access the data through jupyter notebook and Pycharm Python IDE, which will also help to generate graphs analyze the behavior of the data.
Further research is needed in order to complete this step. We also want to be able to implement effective team strategies in order to fully understand processes similar to Scrum.
Project Sources:
Language/ IDE used:
Python(Pandas, NumPy, matplotlib, Seaborn, etc.)/Jupyter notebook/PyCharm Machine Learning Machine Learning: Machine learning is part of data science. The word learning in machine learning means that the algorithms depend on some type data that has been used as a training set to fine-tune some model or algorithm parameters. There are many techniques such as regression, naive Bayes or supervised clustering. Machine learning is also involved with statistics to build a model to predict the behavior of data in future. It finds the pattern of the data by using the algorithms of machine learning. By data it can be mean numbers, words, images, clicks or anything we can digitally stored can be used as machine learning data. It has been seen as a subset of Artificial Intelligence which basically builds a mathematical model to predict the data by using a training set. Machine learning has been used everywhere from retail to the financial industry to predict the data set to know what is going to happen in future or how to make it better by getting a prediction. We have tried to use four different types of machine learning algorithms to understand the stock market data we got. We basically want it to see how this model can train the data set we have and implement our data for prediction.
Machine Learning
Linear Regression: There are two types of supervised machine learning model one is Regression and another one is classification. Regression predicts continuous value of the output where another one predicts discrete output. For this model we have used scikit-Learn which is one of the most popular machine learning libraries to use it. Scikit-learn has most of the statistical modelling including regression, classification, clustering etc. It has various components such as supervised learning algorithm, unsupervised algorithm and cross validation. In this particular model we have used a supervised algorithm. The spread of machine learning is one of the big reasons to use scikit-learn. We also have used Keras with tensorflow API which is one of the leading high-level neural networks APIs. It is written in Python and supports multiple back-end neural network computation engines. Keras is an open-source neural-network library. It is capable of running on top of TensorFlow, Designed to enable fast experimentation with deep neural networks. Which is widely used in machine learning algorithms.
Long Short-term Memory (LSTM) Model : LSTM model is an artificial recurrent neural network architecture that is used in the field of deep learning. This model has feedback connections that only can process multiple layers of data. Which makes it one of the most useful models for machine learning. LSTM networks have memory hinders that are associated through layers. A square has parts that make it more useful than a traditional neuron and a memory for ongoing arrangements. A square contains doors that deal with the square's state and yield. A square works upon an info succession and each entryway inside a square uses the sigmoid actuation units to control whether they are activated or not, rolling out the improvement of state and expansion of data coursing through the square. Below is the picture of how the LSTM model works in real life time.
Support Vector Regression Model: The Support Vector Regression (SVR) uses the same principles as the SVM for classification, with only a few minor differences.In SVR we try to fit the error within a certain threshold. Our objective when we are moving on with SVR is to basically consider the points that are within the boundary line. By using the SVR model we are to minimize error, individualizing the hyperplane which maximizes the margin, keeping in mind that part of the error is tolerated. By using an SVR model it is geared towards finding a cutting plane through the data which separates the data into two regimes in a way that maximizes the distance from the cutting plane to both datasets (the margin). For this model we have used scikit-learn regression which predicts continuous-valued attributes associated with an object. Scikit-learn is Simple and efficient tools for predictive data analysis Accessible to everybody, and reusable in various contexts. It is built on numpy, scipy and matplotlib.
KNN (K to the nearest neighbor): KNN is a non-parametric, lazy learning algorithm. A technique is non-parametric meaning that it does not make any assumptions about the underlying data. In other words, it makes its selection based on the proximity to other data points regardless of what feature the numerical values represent.KNN Can be used for classification and regression. The k-nearest neighbors (KNN) algorithm is a simple, supervised machine learning algorithm that can be used to solve both classification and regression problems. It’s easy to implement and understand, but has a major drawback of becoming significantly slower as the size of that data in use grows. KNN algorithm is basically reading the data set you provided to it and it will run a training and do an accuracy test to give you the feedback how good enough the data set is for training.
Company That we have Looked into:
Technology: Hasan
Apple (AAPL) Facebook (FB) Google (GOOGL) Microsoft (MSFT) Amazon (AMZN)
Airlines: Akash
American Airlines (AAL) Air China Airlines (AIRYY) Delta Airline - (DAL) Lufthansa Airlines (DLAKY) Qantas Airlines (QABSY)
Healthcare: Josh
Moderna Inc (MRNA) Vir Biotechnology (VIR) Regeneron Pharmaceuticals Inc. (REGN) AbbVie Inc. (ABBV) Gilead Sciences, Inc. (GILD)