Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add upstream service handling #35

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

cookel2
Copy link
Contributor

@cookel2 cookel2 commented Jan 10, 2025

What

The changes in this PR are for this ticket: https://jira.ons.gov.uk/browse/DIS-1784

The changes allow other upstream services, as well as zebedee and the dataset api, to be used for getting documents for the search index. The following two new environment variables are used for this purpose:

  1. ENABLE_OTHER_SERVICES_REINDEX: this needs to be set to true for other upstream services to be used.
  2. OTHER_UPSTREAM_SERVICES: this provides a list of other upstream services to use. Each service must provide a URL and an endpoint for getting its resources. The resources must have the same format as those in the Search Upstream Stub.

By default the Search Reindex Batch service disables its ability to use any upstream services at all - to use zebedee and/or the dataset api and/or other upstream services requires the relevant env var to be true. Otherwise it will give an empty index. But if ENABLE_OTHER_SERVICES_REINDEX gets set to true locally then the Search Upstream Stub will get used by default (as it is already in the default list of OTHER_UPSTREAM_SERVICES).

How to review

Run Search Reindex Batch Locally as follows.

  1. Run Colima:
    colima start --cpu 4 --memory 8 --disk 100

  2. Move to the Search Stack directory and clean the existing containers:
    cd dp-compose/v2/stacks/search
    make clean

  3. Set the SERVICE_AUTH_TOKEN:
    export SERVICE_AUTH_TOKEN=usual value for local auth

  4. Decide which services to reindex; set the appropriate environment variable(s) to true (or leave them all to default to false).:
    export ENABLE_DATASET_API_REINDEX=true
    export ENABLE_ZEBEDEE_REINDEX=true
    export ENABLE_OTHER_SERVICES_REINDEX=true

  5. If reindexing the dataset-api, run the following commands:

  • Run mongodb:
    cd dp-compose/v2/stacks/v1-compat
    make up
  • open Robo3T and make sure that the 'datasets' database contains all the following collections (create any that are missing):
    datasets
    dimension.options
    editions
    instances
    instances_locks
  • Run the following commands:
    cd dp-dataset-api
    git pull
    export DISABLE_GRAPH_DB_DEPENDENCY=true
    make debug
    NB. The dataset api has a dependency on zebedee so also do as follows:
    cd zebedee
    git pull
    ./run.sh
  1. If reindexing zebedee, just run the following commands:
    cd zebedee
    git pull
    ./run.sh

  2. If reindexing other services, run the relevant service(s) e.g.
    cd dis-search-upstream-stub
    git pull
    make debug

  3. Start the stack:
    cd dp-compose/v2/stacks/search
    make up

  1. In the same terminal window as step 8, run the following commands to reindex the services:
    cd dp-search-reindex-batch
    make debug

Who can review

!me

@cookel2 cookel2 requested a review from a team as a code owner January 10, 2025 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant