diff --git a/docs/vobs/vobs-data-access.ipynb b/docs/vobs/vobs-data-access.ipynb index a7f1d91..6d604ca 100644 --- a/docs/vobs/vobs-data-access.ipynb +++ b/docs/vobs/vobs-data-access.ipynb @@ -2,24 +2,52 @@ "cells": [ { "cell_type": "markdown", - "id": "038d7e27-4cd4-45bc-bb2c-f73b9993e444", + "id": "401ab472-65c1-45d5-b7d1-fd36132764b2", "metadata": {}, "source": [ "# Vector Observatory Data Access\n", "\n", - "MalariaGEN data resources provide an integrated view of malaria vector genomes from across the globe. These data are available to everyone to benefit the science and surveillance of malaria. You can find more information on the vector data resources [here](https://www.malariagen.net/mosquito/).\n", + "MalariaGEN data resources provide an integrated view of malaria vector genomes from across the globe. These data are available to everyone to benefit the science and surveillance of malaria. You can find more information on the vector data resources at .\n", "\n", - "Vector Observatory data are stored in Google Cloud Storage (GCS). The current set-up requires users to request access and authenticate prior to accessing data. \n", + "Vector Observatory data are stored in Google Cloud Storage (GCS) in the US region. The current set-up requires users to request access and authenticate prior to accessing data. " + ] + }, + { + "cell_type": "markdown", + "id": "87eebaca-36c7-456e-b4c4-d45ec16e5c32", + "metadata": {}, + "source": [ + "### Terms of Use\n", + "Data in the Vector Observatory are organised into data releases. All data releases can be accessed for public health and educational purposes as soon as they are released. However, please note that data releases are subject to terms of use which may include an embargo on all public communications including academic publications. The terms of use for each data release can be found on the MalariaGEN website." + ] + }, + { + "cell_type": "markdown", + "id": "6bbc669f-f616-4f3b-abdb-b7de0296e61a", + "metadata": {}, + "source": [ + "### Fair Usage\n", "\n", - "Please note that although all data are available for immediate access for public health and educational purposes, the releases accessible through the Vector Observatory are **subject to different terms of use, including an embargo on public communications**, which encompasses academic publications. Each release, has specific terms of use attached, which are described within each release page.\n", + "Vector Observatory data are currently stored in Google Cloud Storage (GCS) in the US region. Access to Vector Observatory data in Google Cloud is free for all users. However, large transfers of data outside of Google Cloud in the US region substantially increase our running costs, and so we ask users to adhere to the following fair usage policy. This will allow us to continue making the data freely available.\n", "\n", - "---\n", + "- **Data access from Google Colab -** If you are using Google Colab to access data, please check if your allocated virtual machine (VM) is within the US region. If not, please request a new VM by selecting “Runtime > Disconnect and delete runtime” from the Colab menu. \n", + "- **Data access from other Google Cloud services -** If you are using another Google Cloud service such as Vertex AI Workbench, or are using a third party service such as Terra or Coiled which uses VMs within Google Cloud, please ensure that VMs are provisioned within the US region.\n", + "- **Data access from outside Google Cloud -** If you are planning to access data from any computer or VM located outside of Google Cloud, please contact us at support@malariagen.net. We can then advise on the most efficient methods for accessing data to both minimise our running costs and ensure you get the best performance.\n", "\n", + "Please note that we monitor data access logs to detect any unexpected large data transfers outside of Google Cloud in the US region, and may temporarily suspend access to users performing large data transfers. If we do suspend access, we will reach out to you to see if we can help optimise your data access." + ] + }, + { + "cell_type": "markdown", + "id": "038d7e27-4cd4-45bc-bb2c-f73b9993e444", + "metadata": {}, + "source": [ + "### Data Access\n", "To access data from the Vector Observatory, you will need to follow these steps:\n", "\n", "#### Step 1. Make sure you have a Google Account\n", "\n", - "To allow us to configure data access permissions, you will need to provide us with an email address that is associated with a Google account. This could be a standard Google (i.e., GMail) account, or alternatively it could be your work email if your employer uses Google Workspace.\n", + "To allow us to configure data access permissions, you will need to provide us with an email address that is associated with a Google account. This could be a standard Google (i.e., Gmail) account, or alternatively it could be your work email address if your employer uses Google Workspace.\n", "\n", "#### Step 2. Fill out the data access request form\n", "\n", @@ -27,7 +55,7 @@ "\n", "> [MalariaGEN cloud data access form](https://forms.gle/kCqistorZyxaU4LP7)\n", "\n", - "All requests for data access will be granted, subject to verification checks and agreement to reasonable use. This is to ensure that the data resources remain accessible to everyone. Submitting this form will allow us to configure storage permissions and monitor storage for excessive network usage in future.\n", + "All requests for data access will be granted subject to verification checks and agreement to reasonable use. This is to ensure that the data resources remain accessible to everyone. Submitting this form will allow us to configure storage permissions and monitor storage for excessive network usage in future.\n", "\n", "#### Step 3. Ensure you are using the latest version of the `malariagen_data` Python package\n", "\n", @@ -35,7 +63,7 @@ "\n", "#### Step 4. Set up Google Cloud authentication credentials\n", "\n", - "If you are only accessing data via the `malariagen_data` Python package from within Google Colab, you can skip this step, because authentication credentials will be obtained automatically.\n", + "If you are only accessing data via the `malariagen_data` Python package from within Google Colab, you can skip this step, because authentication credentials will be obtained automatically. If you have filled out the form but having issues authenticating in Google Colab, you can find a walkthrough video [here](https://drive.google.com/file/d/1dOPimdsPabvOoWkKdomWl8v24LRvjiaj/view?usp=sharing).\n", "\n", "If you are accessing data from any other location, you will need to authenticate with Google Cloud. To do this, you will need to:\n", "\n", @@ -50,7 +78,7 @@ "\n", "3. Authenticate using `gcloud`: \n", "\n", - "- If you need to authenticate within the `malariagen_data` package, you will need to use the following command:\n", + "- If you need to authenticate to use the `malariagen_data` package, you will need to use the following command:\n", "\n", "```bash\n", "gcloud auth application-default login\n",