In this project, you will apply the skills you have acquired in this course to operationalize a Machine Learning Microservice API.
You are given a pre-trained, sklearn
model that has been trained to predict housing prices in Boston according to several features, such as average rooms in a home and data about highway access, teacher-to-pupil ratios, and so on. You can read more about the data, which was initially taken from Kaggle, on the data source site. This project tests your ability to operationalize a Python flask app—in a provided file, app.py
—that serves out predictions (inference) about housing prices through API calls. This project could be extended to any pre-trained machine learning model, such as those for image recognition and data labeling.
Helps with environment standardization that simplifies installation and configuration. (CC7-L2-C6)
CC7-L2-C6
- Name: name (e.g. udacityProject4)
- Environment type: Create a new instance for environment (EC2)
- Instance type: m5.large (8GiB RAM + 2 vCPU)
- Platform: Amazon Linux 2
AWS > EC2 > Volumes > Select Volume > Actions > Modify Volume > set to at least 20GiB
Needed only first time in new AWS Cloud9 environment
ssh-keygen -t rsa
cat path_public_key_saved_to
Github > Settings > SSH and GPG keys > New SSH key
- Title: udacityProject4
- Key: public ssh key
git clone [email protected]:pkiage/project-ml-microservice-kubernetes.git
- Name: udacityProject4-server
- AMI: Ubuntu Server 18.04
- Instance type: t3.small or greater
- Key pair: select one (that is downloaded on PC)
- Security group: group where inbound rule as below
- Type: SSH; Source: Anywhere IPv4
- Configure storage: 20GiB or more
Hint: Don't open VSCode from WSL, rather navitage to the repository using file explorer then open it with VSCode.
VSCode > Open a Remote Window > Connect to Host (Remote-SSH) > Configure SSH Host... > C:\User\xxx\.ssh\config
Host connection_name
HostName public_ipv4_address
User aws_ec2_user1
IdentityFile path_to_ssh_key
- connection_name: udacity_project4
- path_to_ssh_key:
C:\Users\user_name\.ssh\key_name.pem
3. [if permission denied (public key) and using Windows OS] Public Key Settings
icacls.exe path_to_ssh_key /reset
icacls.exe path_to_ssh_key /grant:r "$($env:username):(r)"
icacls.exe path_to_ssh_key /inheritance:r
Hint:
- Ensure using the correct aws configure
- Delete ssh config folder both in
C:\Program Data\ssh
andC:\<user>\.ssh
- VSCode > Open a Remote Window > Connect to Host (Remote-SSH) > Add New SSH Host..
ssh -i "C:\path\to\key" user@host
- Select
C:\<user>\.ssh
- Click connect
Update existing packages:
sudo apt-get update
Python3:
sudo apt-get upgrade python3
sudo apt-get install python3-venv
Make:
sudo apt install make
sudo apt install apt-transport-https ca-certificates curl software-properties-common -y
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
sudo apt update -y
apt-cache policy docker-ce
sudo apt install docker-ce -y
sudo systemctl status docker
Add username to docker group (to avoid sudo each time running docker)
sudo passwd ubuntu
sudo usermod -aG docker ${USER}
sudo su - ${USER}
id -nG
VSCode > Source Control > Clone Repository > Repository Url > Clone From Url > Select Location to Clone To
- Repository Url: https://github.com/pkiage/project-ml-microservice-kubernetes.git
cd project-ml-microservice-kubernetes
Hint ensure on the right branch (develop or master)
git checkout branch_name
Tested in Windows OS
Prerequisite:
- Have Anaconda installed preferably with Anaconda Powershell Prompt
Open Anaconda Prompt
conda create -n udacityproject4 python=3.7.3
conda activate udacityproject4
conda install --yes --file requirements.txt
Run app
python .\app-local.py
Make prediction via frontend: http://127.0.0.1:80
Henceforth preferably in Unix OS - tested in Debian Distro (Ubuntu)
# CC7-L2-C6
mkdir /tmp/local_environments
python3 -m venv /tmp/local_environments/.devops
source /tmp/local_environments/.devops/bin/activate
make install
sudo wget -O /bin/hadolint https://github.com/hadolint/hadolint/releases/download/v1.16.3/hadolint-Linux-x86_64
sudo chmod +x /bin/hadolint
# OS: Linux; Architecture: x86-64; Release type: Stable; Installer type: Binary download
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
# Install kubectl binary with curl on Linux
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
# curl -LO "https://dl.k8s.io/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl.sha256"
# echo "$(cat kubectl.sha256) kubectl" | sha256sum --check
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
docker --version
minikube version
kubectl version --output=yaml
Task 1: Complete the Dockerfile
- After you complete this file and save it, it is recommended that you go back to your terminal and run make lint again to see if hadolint catches any errors in your Dockerfile.
- You are required to pass these lint checks to pass the project.
make lint
Task 2: Run a Container & Make a Prediction
run_docker.sh
- After a brief waiting period, you should see messages indicating a successful build, along with some indications that your app is being served on port 80 (also, a warning about the development server is to be expected, here).
make_prediction.sh
- In the prediction window, you should see the value of the prediction, and in your main window, where it indicates that your application is running, you should see some log statements print out
Task 3: Improve Logging & Save Output
- Copy and paste this terminal output, which has log info, in a text file
docker_out.txt
- The
docker_out.txt
file should include all your log statements plus a line that reads something likePOST /predict HTTP/1.1” 200 -
- The
docker_out.txt
file will be one of two, log output files that will be part of a passing, project submission.
sh run_docker.sh
Open another terminal window (Terminal B until explicitly stated otherwise)
[optional] view home of app (as defined in app.py
):
curl localhost:8000
Ensure in the virtual environment:
source /tmp/local_environments/.devops/bin/activate
sh make_prediction.sh
In terminal window server was running CTRL+C
Task 4: Upload the Docker Image
- If you’ve successfully implemented authentication and tagging, you should see a successful login statement and a repository name that you specified, printed in your terminal.
- You should also be able to see your image as a repository in your docker hub account
sudo sh upload_docker.sh
Hint:
- Consider running
docker login
before runningupload_docker.sh
- Confirm uploaded by viewing
hub.docker.com/repository/dockerpath
(replace dockerpath with what specified inupload_docker.sh
)
Task 5: Configure Kubernetes to Run Locally
- After minikube starts, a cluster should be running locally. You can check that you have one cluster running by typing kubectl config view where you should see at least one cluster with a certificate-authority and server.
minikube start
Hint if permission denied error run:
sudo usermod -aG docker ${USER}
sudo su -${USER}
If space issues may have to delete files
rm -rf frontend
rm frontend-localhost-windows.mp4
rm frontend-localhost-windows.mp4
rm test-predictions.ipynb
Specify space
# recommended minimum 1900MB
'minikube start --memory=1900mb'
or add volume size of EC2 instance and confirm
# recommended minimum 1900MB
df -h
minikube config view
minikube config set disk-size value_from_above_command
minikube delete
minikube start
If error getting ip during provisioning: IPs output should only be one line, got 2 lines
With other terminal window closed
sudo rm -rf ~/.docker/config.json
sudo chown "$USER":"$USER" /home/"$USER"/.docker -R
sudo chmod g+rwx "$HOME/.docker" -R
Wait for completion
kubectl config view
Task 6: Deploy with Kubernetes and Save Output Logs
run_kubernetes.sh
- Should create a pod with a name you specify
- Initially, your pod may be in the process of being created, as indicated by STATUS: ContainerCreating, but you just have to wait a few minutes until the pod is ready, then you can run the script again.
- Waiting: You can check on your pod’s status with a call to kubectl get pod and you should see the status change to Running. Then you can run the full
./run_kuberenets.sh
script again.
make_prediction.sh
- After pod is up and running
- Copy the text output after calling run_kubernetes.sh and paste it into a file
kubernetes_out.txt
- This will be the second (out of two) text files that are required for submission
- This output might look quite different from
docker_out.txt
; this new file should include your pod’s name and status, as well as the port forwarding and handling text.
sudo sh run_kubernetes.sh
Check if pod is running (waint about 2 minutes)
kubectl get pod
Once running run run_kubernetes.sh
again
In another terminal
source /tmp/local_environments/.devops/bin/activate
sudo sh make_prediction.sh
Clean up resources (e.g. AWS servers)
Delete the Kubernetes Cluster
minikube delete
├── .circleci
│ └── config.yml ### CircleCI configuration file
├── frontend ### [Flask](https://flask.palletsprojects.com/en/2.2.x/api/#flask.Flask) instance [template](https://flask.palletsprojects.com/en/2.2.x/templating/) folder used in app-local.py
│ └── index.html ### Name of template rendered by [render_template()](https://flask.palletsprojects.com/en/2.2.x/api/#flask.render_template) method - used in app-local.py
├── model_data
| ├── boston_housing_prediction.joblib ### Presisted Python object to be loaded in app.py (pre-trained model)
| └── housing.csv ### Data used in pre-trained model
├── output_txt_files
| ├── docker_out.txt ### Log statements from app.py following executing run_docker.sh
| └── kubernetes_out.txt ### Log statements after running a prediction via Kubernetes deployment
├── .gitignore ### Files and directories to ignore from git history
├── app-local.py ### Python flask app that serves out predictions (inference) about housing prices through API calls - for optional local testing with frontend
├── app.py ### Python flask app that serves out predictions (inference) about housing prices through API calls - deployed on Kubernetes cluster (doesn't include frontend)
├── Dockerfile ### Contains all commands a user could call on command line to assemble an image
├── frontend-localhost-windows.gif ### GIF showing demo of simple web application front-end to accept user input and produce a prediction. Tested on local host using Windows OS.
├── make_predictions.sh ### Sends some input into containerized application via appropriate port
├── requirements.txt ### List of Python dependencies for the project
├── rubric.png ### Udacity Project 4 Rubric
├── run_docker.sh ### Enables getting Docker running, locally
├── run_kubernetes.sh ### Deploys application on the Kubernetes cluster (after uploaded docker image and configured Kubernetes so that a cluster is running)
├── test-predictions.ipynb ### Jupyter notebook testing predictions made using the pre-trained model and sample data (includes references to AI ethics discussions)
└── upload_docker.sh ### Uploads built image to docker to make it accessible to a Kubernets cluster
Abbreviation | Description |
---|---|
crim | per capita crime rate by town |
zn | proportion of residential land zoned for lots over 25,000 sq.ft |
indus | proportion of non-retail business acres per town |
chas | Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) |
nox | nitrogen oxides concentration (parts per 10 million) |
rm | average number of rooms per dwelling |
age | proportion of owner-occupied units built prior to 1940 |
dis | weighted mean of distances to five Boston employment centres |
rad | index of accessibility to radial highways |
tax | full-value property-tax rate per $10,000 |
ptratio | pupil-teacher ratio by town |
black | 1000(Bk - 0.63)^2 where Bk is the proportion of black people by town |
lstat | lower status of the population (percent) |
medv | median value of owner-occupied homes in $1000s |