layout | permalink | description | title | hero | width | ||
---|---|---|---|---|---|---|---|
default |
automation |
understanding and troubleshooting |
Automation |
|
is-10 |
make up
may take some time, because it handles multiple actions :
- building
- downloading docker images of all bundled components (
node
,nginx
,python
,elasticsearch
,kibana
) - compiling the
Vue.js
frontend withnode
into statichtml/css/js
files - building the
python
backend, with allpandas
andscikit-learn
dependencies
- downloading docker images of all bundled components (
- running
python
backend- a 3-node
elasticsearch
cluster (if you need less, just editMakefile
and setES_NODE
to 1) kibana
(required for launching it for now)nginx
for serving stating compile frontend, and reverse proxy of the backend,elasticsearch
andkibana
Here is the architecture overview :
All commands here have to be executed from the machine where docker
runs matchID
(local or remote).
To stop all components
make stop
- stop the backend :
make backend-stop
- start the backend :
make backend
- restart the backend :
make backend-stop backend
- get the logs :
docker logs -f matchid-backend
- build only the frontend :
make docker-build
- stop the frontend :
make frontend-stop
- start the frontend :
make frontend
- restart the frontend :
make frontend-stop frontend
- get the logs :
docker logs -f matchid-backend
Elasticsearch is useful for powerfull fuzzy matching with levenshtein distance (but could be replaced with a phonetic or ngram indexation with postgres) and is required for now by the validation module.
- stop elasticsearch :
make elasticsearch-stop
- start elasticsearch :
make elasticsearch
- restart elasticsearch :
make elasticsearch-stop elasticsearch
- check if you elasticsearch nodes containers are up :
docker ps | egrep 'matchid-(elasticsearch|esnode)'
, you should have a line for every container - get the logs of a node :
docker logs -f matchid-elasticsearch
, you may replaceelasticsearch
withesnode2
,esnode3
, depending of the number of nodes you have
See avanced elasticsearch troubleshooting for configuring your cluster and more about elasticsearch and docker.
Postgres is not mandatory, so it is not up
when you start. Postgres can be useful for a more reliable storage than elasticsearch, or to replace elasticsearch for quicker ngram
matching.
- start postgres :
make postgres
- stop postgres :
make postgres-stop
- restart postgres :
make postgres-stop postgres
Kibana has to be started on start (for a dependency of the nginx configuration). It can be useful to analyse your elasticsearch data, but is absolutely not mandatory (it will next not be a requirement).
- stop kibana :
make kibana-stop
- start kibana :
make kibana
- restart kibana :
make kibana-stop kibana
Some useful commands :
docker stats
shows all containers running, like atop
You can switch to developpement mode without stopping the whole sevices : make frontend-stop frontend-dev
Please consult contributing to developement.
When your run make start or make elasticsearch, the configuration of elasticsearch is created in docker-components/docker-compose-elasticsearch-huge.yml with one master and ES_NODES-1 nodes.
To change the configuration :
- to change the number of nodes: edit Makefile and change the number of nodes in
ES_NODES
(default to 3, tested up to 20). - to change the amount of memory of each node : change
ES_MEM
(java memory).
We've encountered stability problems with elasticsearch, docker and memory limitation (mem_limit
). Elasticsearch recommends twice the memory of the jvm
for the virtual machine memory. But setting hard memory limit with mem_limit
seems to be a wrong with the docker virtualisation. So we supressed this limiation, which may lead to performance problem when strong sollicitation. This problem will be followed in the matchID project, as elasticsearch is an essential component.
to be completed
- to check the status of the elasticsearch cluster :
docker exec -it matchid-elasticsearch curl -XGET localhost:9200/_cluster/health?pretty
, thestatus
should begreen
andnumber_of_nodes
to theES_NODES
value (default : 3) - to check the status of your indices :
docker exec -it matchid-elasticsearch curl -XGET localhost:9200/_cat/indices
, each indice should be green.
Note that the API point of elasticsearch is accessible from any client (unless you protect this API with the nginx configuration) : http://${matchID-host}/matchID/elastiscearch
is the same as localhost:9200/
within docker
. For example, if you run matchID
on your host :
- cluster health :
curl -XGET http://localhost/matchID/elasticsearch/_cluster/health?pretty
- indices :
curl -XGET http://localhost/matchID/elasticsearch/_cat/indices
If you have large indices and multiples nodes, the coherence of your cluster may take some time when you just started the cluster. You can then check the health indices.
If your cluster is red for some time (more than one minute) you should make elasticsearch restart
.
If it is still red, check the docker logs -f
of each node.
Many problems can occure :
- rights on the volume are not correct
- the volume space is over 85% full
- every indices are red because too much operations were driven in an inconsistant state of the cluster. This is the drama situation where you'll have to clear all your indices with a
curl -XDELETE http://localhost/matchID/elasticsearch/*
and lose all indexed data
It may be that one or many indices are red or at least one yellow. It may take some time but usually, elasticsearch self repairs.
With a persistent yellow state of an indice and the right amount of documents, you will still be able to access your data so that you may backup. You should do that with a recipe doing nothing from the dataset of your index to a msgpack
dataset. Then try to restore it on another index, then delete the index.
You may have to delete every red and yellow indices.