diff --git a/source/technical.rst b/source/architecture.rst
similarity index 97%
rename from source/technical.rst
rename to source/architecture.rst
index 7601ccb..221af84 100644
--- a/source/technical.rst
+++ b/source/architecture.rst
@@ -1,5 +1,5 @@
-Service Architecture
---------------------
+Internal Service Architecture
+-----------------------------
The EGI Notebooks service relies on the following technologies to provide its
functionality:
@@ -18,16 +18,13 @@ functionality:
* `Prometheus `_ for monitoring resource consumption.
-* Specific EGI hooks for `monitoring `_
- and `accounting `_.
+* Specific EGI hooks for `monitoring `_,
+ `accounting `_
+ and `backup `_.
* VO-Specific storage/Big data facilities or any pluggable tools into the
notebooks environment can be added to community specific instances.
-.. image:: /_static/egi_notebooks_architecture.png
-
-.. [[File:EGI_Notebooks_Stack.png|center|650px|EGI Notebooks Achitecture]]
-
Kubernetes
::::::::::
@@ -76,6 +73,7 @@ EGI Customisations
EGI Notebooks is deployed as a set of customisations of the `JupyterHub helm
charts `_.
+.. image:: /_static/egi_notebooks_architecture.png
Authentication
==============
diff --git a/source/index.rst b/source/index.rst
index b0804b0..230fd1a 100644
--- a/source/index.rst
+++ b/source/index.rst
@@ -34,9 +34,10 @@ or any additional service requests.
data
integration
communities
- technical
- training
faq
+ training
+ architecture
+ operations
.. to be added back
customisation
diff --git a/source/operations.rst b/source/operations.rst
new file mode 100644
index 0000000..0ff16c9
--- /dev/null
+++ b/source/operations.rst
@@ -0,0 +1,350 @@
+Service Operations
+------------------
+
+In this section you can find the common operational activities related to keep
+the service available to our users.
+
+Initial set-up
+==============
+
+Notebooks VO
+::::::::::::
+
+The resources used for the Notebooks deployments are managed with the
+``vo.notebooks.egi.eu`` VO. Operators of the service should join the VO, check
+the entry at the `operations portal `_
+and at `AppDB `_.
+
+Clients installation
+::::::::::::::::::::
+
+In order to manage the resources you will need these tools installed
+on your client machine:
+
+* ``egicli`` for discovering sites and managing tokens,
+
+* ``terraform`` to create the VMs at the providers,
+
+* ``ansible`` to configure the VMs and install kubernetes at the providers,
+
+* ``terraform-inventory`` to get the list of hosts to use from terraform.
+
+Get the configuration repo
+::::::::::::::::::::::::::
+
+All the configuration of the notebooks is stored at a git repo available in
+keybase. You'll need to be part of the ``opslife`` team in keybase to access.
+Start by cloning the repo:
+
+.. code-block:: shell
+
+ $ git clone keybase://team/opslife/egi-notebooks
+
+Kubernetes
+==========
+
+We use ``terraform`` and ``ansible`` to build the cluster at one of the EGI Cloud
+providers. If you are building the cluster for the first time, create a new
+directory on your local git repository from the template, add it to the
+repo, and get ``terraform`` ready:
+
+.. code-block:: shell
+
+ $ cp -a template
+ $ git add
+ $ cd /terraform
+ $ terraform init
+
+Using the ``egicli`` you can get the list of projects and their ids
+for a given site:
+
+.. code-block:: shell
+
+ $ egicli endpoint projects --site CESGA
+ id Name enabled site
+ -------------------------------- ------------------- --------- ------
+ 3a8e9d966e644405bf19b536adf7743d vo.access.egi.eu True CESGA
+ 916506ac136741c28e4326975eef0bff vo.emso-eric.eu True CESGA
+ b1d2ef2cc2284c57bcde21cf4ab141e3 vo.nextgeoss.eu True CESGA
+ eb7ff20e603d471cb731bdb83a95a2b5 fedcloud.egi.eu True CESGA
+ fcaf23d103c1485694e7494a59ee5f09 vo.notebooks.egi.eu True CESGA
+
+And with the project ID, you can obtain all the environment variables needed
+to interact with the OpenStack APIs of the site:
+
+.. code-block:: shell
+
+ $ eval "$(egicli endpoint env --site CESGA --project-id fcaf23d103c1485694e7494a59ee5f09)"
+
+Now you are ready to use the openstack or terraform at the site. The token
+obtained is valid for 1 hour, you can refresh it at any time with:
+
+.. code-block:: shell
+
+ $ eval "$(egicli endpoint token --site CESGA --project-id fcaf23d103c1485694e7494a59ee5f09)"
+
+First get the network IDs and pool to use for the site:
+
+.. code-block:: shell
+
+ $ openstack network list
+ +--------------------------------------+-------------------------+--------------------------------------+
+ | ID | Name | Subnets |
+ +--------------------------------------+-------------------------+--------------------------------------+
+ | 1aaf20b6-47a1-47ef-972e-7b36872f678f | net-vo.notebooks.egi.eu | 6465a327-c261-4391-a0f5-d503cc2d43d3 |
+ | 6174db12-932f-4ee3-bb3e-7a0ca070d8f2 | public00 | 6af8c4f3-8e2e-405d-adea-c0b374c5bd99 |
+ +--------------------------------------+-------------------------+--------------------------------------+
+
+In this case we will use ``public00`` as the pool for public IPs and
+``1aaf20b6-47a1-47ef-972e-7b36872f678f`` as the network ID. Check with the provider
+which is the right network to use. Use these values in the ``terraform.tfvars``
+file:
+
+.. code-block:: terraform
+
+ ip_pool = "public00"
+ net_id = "1aaf20b6-47a1-47ef-972e-7b36872f678f"
+
+You may want to check the right flavors for your VMs and adapt other variables
+in ``terraform.tfvars``. To get a list of flavors you can use:
+
+.. code-block:: shell
+
+ $ openstack flavor list
+ +--------------------------------------+----------------+-------+------+-----------+-------+-----------+
+ | ID | Name | RAM | Disk | Ephemeral | VCPUs | Is Public |
+ +--------------------------------------+----------------+-------+------+-----------+-------+-----------+
+ | 26d14547-96f2-4751-a686-f89a9f7cd9cc | cor4mem8hd40 | 8192 | 40 | 0 | 4 | True |
+ | 42eb9c81-e556-4b63-bc19-4c9fb735e344 | cor2mem2hd20 | 2048 | 20 | 0 | 2 | True |
+ | 4787d9fc-3923-4fc9-b770-30966fc3baee | cor4mem4hd40 | 4096 | 40 | 0 | 4 | True |
+ | 58586b06-7b9d-47af-b9d0-e16d49497d09 | cor24mem62hd60 | 63488 | 60 | 0 | 24 | True |
+ | 635c739a-692f-4890-b8fd-d50963bff00e | cor1mem1hd10 | 1024 | 10 | 0 | 1 | True |
+ | 6ba0080d-d71c-4aff-b6f9-b5a9484097f8 | small | 512 | 2 | 0 | 1 | True |
+ | 6e514065-9013-4ce1-908a-0dcc173125e4 | cor2mem4hd20 | 4096 | 20 | 0 | 2 | True |
+ | 85f66ce6-0b66-4889-a0bf-df8dc23ee540 | cor1mem2hd10 | 2048 | 10 | 0 | 1 | True |
+ | c4aa496b-4684-4a86-bd7f-3a67c04b1fa6 | cor24mem50hd50 | 51200 | 50 | 0 | 24 | True |
+ | edac68c3-50ea-42c2-ae1d-76b8beb306b5 | test-bigHD | 4096 | 237 | 0 | 2 | True |
+ +--------------------------------------+----------------+-------+------+-----------+-------+-----------+
+
+Finally ensure your public ssh key is also listed in the ``cloud-init.yaml``
+file and then you are ready to deploy the cluster with:
+
+.. code-block:: shell
+
+ $ terraform apply
+
+Your VMs are up and running, it's time to get kubernetes configured and running
+with ansible:
+
+.. code-block:: shell
+
+ $ cd .. # you should be now in
+ $ ANSIBLE_TRANSFORM_INVALID_GROUP_CHARS=silently TF_STATE=./terraform \
+ ansible-playbook --inventory-file=$(which terraform-inventory) \
+ playbooks/k8s.yaml
+
+Interacting with the cluster
+::::::::::::::::::::::::::::
+
+As the master will be on a private IP, you won't be able to directly interact
+with it, but you can still ssh into the VM using the ingress node as a gateway
+host (you can get the different hosts with ``TF_STATE=./terraform terraform-inventory --inventory``)
+
+.. code-block:: shell
+
+ $ ssh -o ProxyCommand="ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -W %h:%p -q egi@" \
+ -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null egi@
+ egi@k8s-master:~$ kubectl get nodes
+ NAME STATUS ROLES AGE VERSION
+ k8s-master Ready master 33m v1.15.7
+ k8s-nfs Ready 16m v1.15.7
+ k8s-w-ingress Ready 16m v1.15.7
+ egi@k8s-master:~$ helm list
+ NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
+ certs-man 2 Wed Jan 8 15:56:58 2020 DEPLOYED cert-manager-v0.11.0 v0.11.0 cert-manager
+ cluster-ingress 3 Wed Jan 8 15:56:53 2020 DEPLOYED nginx-ingress-1.7.0 0.24.1 kube-system
+ nfs-provisioner 3 Wed Jan 8 15:56:43 2020 DEPLOYED nfs-client-provisioner-1.2.8 3.1.0 kube-system
+
+
+Modifying/Destroying the cluster
+::::::::::::::::::::::::::::::::
+
+You should be able to change the number of workers in the cluster and re-apply
+terraform to start them and then execute the playbook to get them added to the
+cluster.
+
+Any changes in the master, NFS or ingress VMs should be done carfully as those
+will probably break the configuration of the kubernetes cluster and of any
+application running on top.
+
+.. TODO: remove nodes?
+
+.. TODO: update master/ingress/nfs
+
+Destroying the cluster can be done with a single command:
+
+.. code-block:: shell
+
+ $ terraform destroy
+
+Notebooks deployments
+=====================
+
+Once the k8s cluster is up and running, you can deploy a notebooks instance.
+For each deployment you should create a file in the `deployments` directory
+following the template provided:
+
+.. code-block:: shell
+
+ $ cp deployments/hub.yaml.template deployments/hub.yaml
+
+Each deployment will need a domain name pointing to your ingress host, you
+can create one at the `FedCloud dynamic DNS service `_.
+
+Then you will need to create an OpenID Connect client for EGI Check-in to authorise users
+into the new deployment. You can create a client by going to the `Check-in demo
+OIDC clients management `_.
+Use the following as redirect URL: ``https:///hub/oauth_callback``.
+
+In the `Access` tab, add ``offline_access`` to the list of scopes. Save the
+client and take note of the client ID and client secret for later.
+
+Finally you will also need 3 different random strings generated with
+``openssl rand -hex 32`` that will be used as secrets in the file describing
+the deployment.
+
+Go and edit the deployment description file to add this information (search for
+``# FIXME NEEDS INPUT`` in the file to quickly get there)
+
+For deploying the notebooks instance we will also use ``ansible``:
+
+.. code-block:: shell
+
+ $ ANSIBLE_TRANSFORM_INVALID_GROUP_CHARS=silently TF_STATE=./terraform ansible-playbook \
+ --inventory-file=$(which terraform-inventory) playbooks/notebooks.yaml
+
+The first deployment trial may fail due to a timeout caused by the downloading
+of the container images needed. You can retry after a while to re-deploy.
+
+In the master you can check the status of your deployment (the name of the
+deployment will be the same as the name of your local deployment file):
+
+.. code-block:: shell
+
+ $ helm status hub
+ LAST DEPLOYED: Thu Jan 9 08:14:49 2020
+ NAMESPACE: hub
+ STATUS: DEPLOYED
+
+ RESOURCES:
+ ==> v1/ServiceAccount
+ NAME SECRETS AGE
+ hub 1 6m46s
+ user-scheduler 1 3m34s
+
+ ==> v1/Service
+ NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
+ hub ClusterIP 10.100.77.129 8081/TCP 6m46s
+ proxy-public NodePort 10.107.127.44 443:32083/TCP,80:30581/TCP 6m45s
+ proxy-api ClusterIP 10.103.195.6 8001/TCP 6m45s
+
+ ==> v1/ConfigMap
+ NAME DATA AGE
+ hub-config 4 6m47s
+ user-scheduler 1 3m35s
+
+ ==> v1/PersistentVolumeClaim
+ NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
+ hub-db-dir Pending managed-nfs-storage 6m46s
+
+ ==> v1/ClusterRole
+ NAME AGE
+ hub-user-scheduler-complementary 3m34s
+
+ ==> v1/ClusterRoleBinding
+ NAME AGE
+ hub-user-scheduler-base 3m34s
+ hub-user-scheduler-complementary 3m34s
+
+ ==> v1/RoleBinding
+ NAME AGE
+ hub 6m46s
+
+ ==> v1/Pod(related)
+ NAME READY STATUS RESTARTS AGE
+ continuous-image-puller-flf5t 1/1 Running 0 3m34s
+ continuous-image-puller-scr49 1/1 Running 0 3m34s
+ hub-569596fc54-vjbms 0/1 Pending 0 3m30s
+ proxy-79fb6d57c5-nj8n2 1/1 Running 0 2m22s
+ user-scheduler-9685d654b-9zt5d 1/1 Running 0 3m30s
+ user-scheduler-9685d654b-k8v9p 1/1 Running 0 3m30s
+
+ ==> v1/Secret
+ NAME TYPE DATA AGE
+ hub-secret Opaque 3 6m47s
+
+ ==> v1/DaemonSet
+ NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
+ continuous-image-puller 2 2 2 2 2 3m34s
+
+ ==> v1/Deployment
+ NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
+ hub 1 1 1 0 6m45s
+ proxy 1 1 1 1 6m45s
+ user-scheduler 2 2 2 2 3m32s
+
+ ==> v1/StatefulSet
+ NAME DESIRED CURRENT AGE
+ user-placeholder 0 0 6m44s
+
+ ==> v1beta1/Ingress
+ NAME HOSTS ADDRESS PORTS AGE
+ jupyterhub notebooktest.fedcloud-tf.fedcloud.eu 80, 443 6m44s
+
+ ==> v1beta1/PodDisruptionBudget
+ NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
+ hub 1 N/A 0 6m48s
+ proxy 1 N/A 0 6m48s
+ user-placeholder 0 N/A 0 6m48s
+ user-scheduler 1 N/A 1 6m47s
+
+ ==> v1/Role
+ NAME AGE
+ hub 6m46s
+
+
+ NOTES:
+ Thank you for installing JupyterHub!
+
+ Your release is named hub and installed into the namespace hub.
+
+ You can find if the hub and proxy is ready by doing:
+
+ kubectl --namespace=hub get pod
+
+ and watching for both those pods to be in status 'Running'.
+
+ You can find the public IP of the JupyterHub by doing:
+
+ kubectl --namespace=hub get svc proxy-public
+
+ It might take a few minutes for it to appear!
+
+ Note that this is still an alpha release! If you have questions, feel free to
+ 1. Read the guide at https://z2jh.jupyter.org
+ 2. Chat with us at https://gitter.im/jupyterhub/jupyterhub
+ 3. File issues at https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues
+
+Updating a deployment
+:::::::::::::::::::::
+
+Just edit the deployment description file and run ansible again. The helm will
+be upgraded at the cluster.
+
+.. TODO:
+ prometheus
+ grafana
+ accounting
+ backups
+ capacity management
+ share the terraform status