From 993dccd3a8fe870926508b423b517f6a19842078 Mon Sep 17 00:00:00 2001 From: ahernank Date: Mon, 8 Apr 2024 10:30:18 -0500 Subject: [PATCH 1/5] add Ag3.9 page --- docs/ag3/ag3.9.ipynb | 1529 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1529 insertions(+) create mode 100644 docs/ag3/ag3.9.ipynb diff --git a/docs/ag3/ag3.9.ipynb b/docs/ag3/ag3.9.ipynb new file mode 100644 index 0000000..b509bad --- /dev/null +++ b/docs/ag3/ag3.9.ipynb @@ -0,0 +1,1529 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "LBNBl2exUYWu" + }, + "source": [ + "# Ag3.9\n", + "\n", + "The **[Ag3.9](Ag3.9): _Anopheles gambiae_ data resource** contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of 3639 mosquitoes.\n", + "\n", + "More information about this release can be found in the [data resource website](https://www.malariagen.net/data/ag39-anopheles-gambiae-data-resource). \n", + "\n", + "This page provides an introduction to open data resources released as part of `Ag3.9`. \n", + "\n", + "If you have any questions about this guide or how to use the data, please [start a new discussion](https://github.com/malariagen/vector-public-data/discussions/new) on the malariagen/vector-open-data repo on GitHub. If you find any bugs, please [raise an issue](https://github.com/malariagen/vector-public-data/issues/new/choose)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kJqs4cXppk8j" + }, + "source": [ + "## Terms of use\n", + "\n", + "Data from this project will be made publicly available before journal publication. Unless otherwise stated, analyses of project data are ongoing and publications are in preparation by project partners, and it is not permitted to use project data for publication (including any type of communication with the general public) without prior permission from the originating partner studies. \n", + "\n", + "Although malaria is generally an endemic rather than an epidemic disease, and the focus of this project is on surveillance of disease vectors rather than pathogens, our data terms of use build on MalariaGEN's approach to data sharing, and adopt norms which have been established for rapid sharing of pathogen genomic data during disease outbreaks. The primary rationale for this approach is that malaria remains a public health emergency, where ethically appropriate and rapid sharing of genomic surveillance data can help to detect and respond to biological threats such as new forms of insecticide resistance, and to adapt malaria vector control strategies to different settings and changing circumstances.\n", + "\n", + "The publication embargo for all data on this release will expire on the **9th of April 2026**. \n", + "\n", + "If you have any questions about the terms of use, please email [data@malariagen.net](mailto:data@malariagen.net)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iNSicUCtpk8j" + }, + "source": [ + "## Partner studies\n", + "\n", + "- [1270-VO-MULTI-PAMGEN (Ethiopia)](https://www.malariagen.net/network/where-we-work/1270-vo-multi-pamgen) - PAMGEN: Genetic interactions between human populations and malaria parasites in different environmental settings across Africa _\n", + "- [1270-VO-MULTI-PAMGEN (The Gambia)](https://www.malariagen.net/network/where-we-work/1270-vo-multi-pamgen) - PAMGEN: Genetic interactions between human populations and malaria parasites in different environmental settings across Africa _\n", + "- [1274-VO-KE-KAMAU](https://www.malariagen.net/network/where-we-work/1274-VO-KE-KAMAU) - _ PAMCA Anopheles genomics programme - Anopheles gambiae and Anopheles arabiensis genetic diversity and association with insecticide resistance in Kenya_\n", + "- [1280-VO-ZA-MUNHENGA](https://www.malariagen.net/network/where-we-work/1280-VO-ZA-MUNHENGA) - _PAMCA Anopheles genomics programme - Genetic structuring in the major malaria vector Anopheles arabiensis and implication on vector control in South Africa_\n", + "- [1281-VO-CM-CHRISTOPHE](https://www.malariagen.net/network/where-we-work/1281-VO-CM-CHRISTOPHE) - _ANOSPP screening of Anopheles species and Plasmodium presence in malaria vectors in Nigeria_\n", + "- [1323-VO-GM-NGWA](https://www.malariagen.net/network/where-we-work/1326-VO-UG-KAYONDO) - _Anopheles gambiae vector surveillance in The Gambia_\n", + "- [1329-VO-GA-CHRISTOPHE](https://www.malariagen.net/network/where-we-work/1329-VO-GA-CHRISTOPHE) - _PAMCA Anopheles genomics programme - Anopheles gambiae vector surveillance in Gabon_\n", + "\n", + "This release also includes data from two studies openly available in the literature: \n", + "- campos-2021 - [_The origin of island populations of the African malaria mosquito, Anopheles coluzzii_](https://doi.org/10.1038/s42003-021-02168-0) & [_Selection of sites for field trials of genetically engineered mosquitoes with gene drive_](https://doi.org/10.1111/eva.13283).\n", + "- [bergey-2019](https://doi.org/10.1111/eva.12878)- _Assessing connectivity despite high diversity in island populations of a malaria mosquito_." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5RHbe7N6pk8k" + }, + "source": [ + "## Whole-genome sequencing and variant calling\n", + "\n", + "All samples in `Ag3.9` have been sequenced individually to high coverage using Illumina technology at the Wellcome Sanger Institute. These sequence data have then been analysed to identify genetic variants such as single nucleotide polymorphisms (SNPs). After variant calling, both the samples and the variants have been through a range of quality control analyses, to ensure the data are of high quality. Both the raw sequence data and the curated variant calls are openly available for download and analysis. \n", + "\n", + "\n", + "For further information about the sequencing and variant calling methods used, please see the [methods page](methods)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9Hfchko2pk8l" + }, + "source": [ + "## Data hosting\n", + "\n", + "Data from `Ag3.9` are hosted by several different services. \n", + "\n", + "The SNP data have also been uploaded to Google Cloud, and can be analysed directly within the cloud without having to download or copy any data, including via free interactive computing services such as [MyBinder](https://gke.mybinder.org/) and [Google Colab](https://colab.research.google.com/). Further information about analysing these data in the cloud is provided in the [cloud data access guide](cloud)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lTJ_EnvOpk8l" + }, + "source": [ + "## Sample sets\n", + "\n", + "The samples included in `Ag3.9` have been organised into 6 sample sets. \n", + "\n", + "Each sample set corresponds to a set of mosquito specimens from a contributing study. Study details can be found in the partner studies webpages listed above." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "hGA4d7Yrpk8m", + "outputId": "c29827c1-0361-4926-c227-8f6e76c2a497", + "tags": [ + "remove-input" + ] + }, + "outputs": [], + "source": [ + "#!pip install -qq malariagen_data" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "AnmzLmEgpk8n", + "tags": [ + "remove-input" + ] + }, + "outputs": [ + { + "data": { + "application/javascript": [ + "(function(root) {\n", + " function now() {\n", + " return new Date();\n", + " }\n", + "\n", + " const force = true;\n", + "\n", + " if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n", + " root._bokeh_onload_callbacks = [];\n", + " root._bokeh_is_loading = undefined;\n", + " }\n", + "\n", + "const JS_MIME_TYPE = 'application/javascript';\n", + " const HTML_MIME_TYPE = 'text/html';\n", + " const EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", + " const CLASS_NAME = 'output_bokeh rendered_html';\n", + "\n", + " /**\n", + " * Render data to the DOM node\n", + " */\n", + " function render(props, node) {\n", + " const script = document.createElement(\"script\");\n", + " node.appendChild(script);\n", + " }\n", + "\n", + " /**\n", + " * Handle when an output is cleared or removed\n", + " */\n", + " function handleClearOutput(event, handle) {\n", + " function drop(id) {\n", + " const view = Bokeh.index.get_by_id(id)\n", + " if (view != null) {\n", + " view.model.document.clear()\n", + " Bokeh.index.delete(view)\n", + " }\n", + " }\n", + "\n", + " const cell = handle.cell;\n", + "\n", + " const id = cell.output_area._bokeh_element_id;\n", + " const server_id = cell.output_area._bokeh_server_id;\n", + "\n", + " // Clean up Bokeh references\n", + " if (id != null) {\n", + " drop(id)\n", + " }\n", + "\n", + " if (server_id !== undefined) {\n", + " // Clean up Bokeh references\n", + " const cmd_clean = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", + " cell.notebook.kernel.execute(cmd_clean, {\n", + " iopub: {\n", + " output: function(msg) {\n", + " const id = msg.content.text.trim()\n", + " drop(id)\n", + " }\n", + " }\n", + " });\n", + " // Destroy server and session\n", + " const cmd_destroy = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", + " cell.notebook.kernel.execute(cmd_destroy);\n", + " }\n", + " }\n", + "\n", + " /**\n", + " * Handle when a new output is added\n", + " */\n", + " function handleAddOutput(event, handle) {\n", + " const output_area = handle.output_area;\n", + " const output = handle.output;\n", + "\n", + " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", + " if ((output.output_type != \"display_data\") || (!Object.prototype.hasOwnProperty.call(output.data, EXEC_MIME_TYPE))) {\n", + " return\n", + " }\n", + "\n", + " const toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", + "\n", + " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", + " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", + " // store reference to embed id on output_area\n", + " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", + " }\n", + " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", + " const bk_div = document.createElement(\"div\");\n", + " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", + " const script_attrs = bk_div.children[0].attributes;\n", + " for (let i = 0; i < script_attrs.length; i++) {\n", + " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", + " toinsert[toinsert.length - 1].firstChild.textContent = bk_div.children[0].textContent\n", + " }\n", + " // store reference to server id on output_area\n", + " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", + " }\n", + " }\n", + "\n", + " function register_renderer(events, OutputArea) {\n", + "\n", + " function append_mime(data, metadata, element) {\n", + " // create a DOM node to render to\n", + " const toinsert = this.create_output_subarea(\n", + " metadata,\n", + " CLASS_NAME,\n", + " EXEC_MIME_TYPE\n", + " );\n", + " this.keyboard_manager.register_events(toinsert);\n", + " // Render to node\n", + " const props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", + " render(props, toinsert[toinsert.length - 1]);\n", + " element.append(toinsert);\n", + " return toinsert\n", + " }\n", + "\n", + " /* Handle when an output is cleared or removed */\n", + " events.on('clear_output.CodeCell', handleClearOutput);\n", + " events.on('delete.Cell', handleClearOutput);\n", + "\n", + " /* Handle when a new output is added */\n", + " events.on('output_added.OutputArea', handleAddOutput);\n", + "\n", + " /**\n", + " * Register the mime type and append_mime function with output_area\n", + " */\n", + " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", + " /* Is output safe? */\n", + " safe: true,\n", + " /* Index of renderer in `output_area.display_order` */\n", + " index: 0\n", + " });\n", + " }\n", + "\n", + " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", + " if (root.Jupyter !== undefined) {\n", + " const events = require('base/js/events');\n", + " const OutputArea = require('notebook/js/outputarea').OutputArea;\n", + "\n", + " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", + " register_renderer(events, OutputArea);\n", + " }\n", + " }\n", + " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", + " root._bokeh_timeout = Date.now() + 5000;\n", + " root._bokeh_failed_load = false;\n", + " }\n", + "\n", + " const NB_LOAD_WARNING = {'data': {'text/html':\n", + " \"
\\n\"+\n", + " \"

\\n\"+\n", + " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", + " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", + " \"

\\n\"+\n", + " \"\\n\"+\n", + " \"\\n\"+\n", + " \"from bokeh.resources import INLINE\\n\"+\n", + " \"output_notebook(resources=INLINE)\\n\"+\n", + " \"\\n\"+\n", + " \"
\"}};\n", + "\n", + " function display_loaded() {\n", + " const el = document.getElementById(null);\n", + " if (el != null) {\n", + " el.textContent = \"BokehJS is loading...\";\n", + " }\n", + " if (root.Bokeh !== undefined) {\n", + " if (el != null) {\n", + " el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n", + " }\n", + " } else if (Date.now() < root._bokeh_timeout) {\n", + " setTimeout(display_loaded, 100)\n", + " }\n", + " }\n", + "\n", + " function run_callbacks() {\n", + " try {\n", + " root._bokeh_onload_callbacks.forEach(function(callback) {\n", + " if (callback != null)\n", + " callback();\n", + " });\n", + " } finally {\n", + " delete root._bokeh_onload_callbacks\n", + " }\n", + " console.debug(\"Bokeh: all callbacks have finished\");\n", + " }\n", + "\n", + " function load_libs(css_urls, js_urls, callback) {\n", + " if (css_urls == null) css_urls = [];\n", + " if (js_urls == null) js_urls = [];\n", + "\n", + " root._bokeh_onload_callbacks.push(callback);\n", + " if (root._bokeh_is_loading > 0) {\n", + " console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", + " return null;\n", + " }\n", + " if (js_urls == null || js_urls.length === 0) {\n", + " run_callbacks();\n", + " return null;\n", + " }\n", + " console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", + " root._bokeh_is_loading = css_urls.length + js_urls.length;\n", + "\n", + " function on_load() {\n", + " root._bokeh_is_loading--;\n", + " if (root._bokeh_is_loading === 0) {\n", + " console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n", + " run_callbacks()\n", + " }\n", + " }\n", + "\n", + " function on_error(url) {\n", + " console.error(\"failed to load \" + url);\n", + " }\n", + "\n", + " for (let i = 0; i < css_urls.length; i++) {\n", + " const url = css_urls[i];\n", + " const element = document.createElement(\"link\");\n", + " element.onload = on_load;\n", + " element.onerror = on_error.bind(null, url);\n", + " element.rel = \"stylesheet\";\n", + " element.type = \"text/css\";\n", + " element.href = url;\n", + " console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n", + " document.body.appendChild(element);\n", + " }\n", + "\n", + " for (let i = 0; i < js_urls.length; i++) {\n", + " const url = js_urls[i];\n", + " const element = document.createElement('script');\n", + " element.onload = on_load;\n", + " element.onerror = on_error.bind(null, url);\n", + " element.async = false;\n", + " element.src = url;\n", + " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", + " document.head.appendChild(element);\n", + " }\n", + " };\n", + "\n", + " function inject_raw_css(css) {\n", + " const element = document.createElement(\"style\");\n", + " element.appendChild(document.createTextNode(css));\n", + " document.body.appendChild(element);\n", + " }\n", + "\n", + " const js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-3.3.4.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-3.3.4.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-3.3.4.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-3.3.4.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-mathjax-3.3.4.min.js\"];\n", + " const css_urls = [];\n", + "\n", + " const inline_js = [ function(Bokeh) {\n", + " Bokeh.set_log_level(\"info\");\n", + " },\n", + "function(Bokeh) {\n", + " }\n", + " ];\n", + "\n", + " function run_inline_js() {\n", + " if (root.Bokeh !== undefined || force === true) {\n", + " for (let i = 0; i < inline_js.length; i++) {\n", + " inline_js[i].call(root, root.Bokeh);\n", + " }\n", + "} else if (Date.now() < root._bokeh_timeout) {\n", + " setTimeout(run_inline_js, 100);\n", + " } else if (!root._bokeh_failed_load) {\n", + " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", + " root._bokeh_failed_load = true;\n", + " } else if (force !== true) {\n", + " const cell = $(document.getElementById(null)).parents('.cell').data().cell;\n", + " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", + " }\n", + " }\n", + "\n", + " if (root._bokeh_is_loading === 0) {\n", + " console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", + " run_inline_js();\n", + " } else {\n", + " load_libs(css_urls, js_urls, function() {\n", + " console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n", + " run_inline_js();\n", + " });\n", + " }\n", + "}(window));" + ], + "application/vnd.bokehjs_load.v0+json": "(function(root) {\n function now() {\n return new Date();\n }\n\n const force = true;\n\n if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n\n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n const NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"

\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"

\\n\"+\n \"\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"
\"}};\n\n function display_loaded() {\n const el = document.getElementById(null);\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = css_urls.length + js_urls.length;\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n\n function on_error(url) {\n console.error(\"failed to load \" + url);\n }\n\n for (let i = 0; i < css_urls.length; i++) {\n const url = css_urls[i];\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error.bind(null, url);\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n }\n\n for (let i = 0; i < js_urls.length; i++) {\n const url = js_urls[i];\n const element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error.bind(null, url);\n element.async = false;\n element.src = url;\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n };\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n const js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-3.3.4.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-3.3.4.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-3.3.4.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-3.3.4.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-mathjax-3.3.4.min.js\"];\n const css_urls = [];\n\n const inline_js = [ function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\nfunction(Bokeh) {\n }\n ];\n\n function run_inline_js() {\n if (root.Bokeh !== undefined || force === true) {\n for (let i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }\n} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n const cell = $(document.getElementById(null)).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n }\n\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(css_urls, js_urls, function() {\n console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));" + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import malariagen_data\n", + "ag3 = malariagen_data.Ag3()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 927 + }, + "id": "qsElasBepk8n", + "outputId": "4bf80a06-c2e8-4d2d-b4a6-99c8c66da7db", + "tags": [ + "remove-input" + ] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sample_setsample_count
study_id
1270-VO-MULTI-PAMGEN1270-VO-MULTI-PAMGEN-VMF00218273
1270-VO-MULTI-PAMGEN1270-VO-MULTI-PAMGEN-VMF00232212
1274-VO-KE-KAMAU1274-VO-KE-KAMAU-VMF00246564
1280-VO-ZA-MUNHENGA1280-VO-ZA-MUNHENGA-VMF00222223
1281-VO-CM-CHRISTOPHE1281-VO-CM-CHRISTOPHE-VMF0022759
1323-VO-GM-NGWA1323-VO-GM-NGWA-VMF00235188
1323-VO-GM-NGWA1323-VO-GM-NGWA-VMF002421630
1329-VO-GA-CHRISTOPHE1329-VO-GA-CHRISTOPHE-VMF00228146
bergey-2019bergey-2019113
campos-2021campos-2021163
\n", + "
" + ], + "text/plain": [ + " sample_set sample_count\n", + "study_id \n", + "1270-VO-MULTI-PAMGEN 1270-VO-MULTI-PAMGEN-VMF00218 273\n", + "1270-VO-MULTI-PAMGEN 1270-VO-MULTI-PAMGEN-VMF00232 212\n", + "1274-VO-KE-KAMAU 1274-VO-KE-KAMAU-VMF00246 564\n", + "1280-VO-ZA-MUNHENGA 1280-VO-ZA-MUNHENGA-VMF00222 223\n", + "1281-VO-CM-CHRISTOPHE 1281-VO-CM-CHRISTOPHE-VMF00227 59\n", + "1323-VO-GM-NGWA 1323-VO-GM-NGWA-VMF00235 188\n", + "1323-VO-GM-NGWA 1323-VO-GM-NGWA-VMF00242 1630\n", + "1329-VO-GA-CHRISTOPHE 1329-VO-GA-CHRISTOPHE-VMF00228 146\n", + "bergey-2019 bergey-2019 113\n", + "campos-2021 campos-2021 163" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_sample_sets = ag3.sample_sets(release=\"3.9\")\n", + "df_sample_sets[['study_id','sample_set', 'sample_count']].set_index('study_id')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yJ16OQ0Hpk8o" + }, + "source": [ + "Here is a more detailed breakdown of the samples contained within this sample set, summarised by country, year of collection, and species:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "a1OMvuTxUWpJ", + "outputId": "9f872334-fd50-4649-990a-df60ea71c12c", + "tags": [ + "remove-input" + ] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " \r" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
taxonarabiensiscoluzziigambiaegcx1gcx2melasmerusquadriannulatusunassigned
study_idsample_setcountryyear
1270-VO-MULTI-PAMGEN1270-VO-MULTI-PAMGEN-VMF00218Ethiopia202127300000000
1270-VO-MULTI-PAMGEN-VMF00232Gambia, The20191664101280003
1274-VO-KE-KAMAU1274-VO-KE-KAMAU-VMF00246Kenya20061951000000
20071800000000
20132404000000
20141500000000
20193052132000020
20207200000000
20214500000010
1280-VO-ZA-MUNHENGA1280-VO-ZA-MUNHENGA-VMF00222South Africa20219900000000
202212200000110
1281-VO-CM-CHRISTOPHE1281-VO-CM-CHRISTOPHE-VMF00227Cameroon20200158000000
1323-VO-GM-NGWA1323-VO-GM-NGWA-VMF00235Gambia, The2005650011824000
2014100000000
202161138320001
1323-VO-GM-NGWA-VMF00242Gambia, The20196831241495522650028
1329-VO-GA-CHRISTOPHE1329-VO-GA-CHRISTOPHE-VMF00228Gabon202010144000001
bergey-2019bergey-2019Uganda201500113000000
campos-2021campos-2021Angola2010080000000
Benin20140110000000
Cameroon2003040000000
2011040000000
Comoros, The Union of the20110035000000
Equatorial Guinea2002054000000
Gabon2018050000000
Guinea-Bissau2009000680000
Madagascar20180010000000
Mali2002020000000
2004050000000
2006036000000
2010020000000
2012020000000
Sao Tome and Principe19980120000000
20170190000000
Tanzania2012006000000
Zambia2015006000000
\n", + "
" + ], + "text/plain": [ + "taxon arabiensis \\\n", + "study_id sample_set country year \n", + "1270-VO-MULTI-PAMGEN 1270-VO-MULTI-PAMGEN-VMF00218 Ethiopia 2021 273 \n", + " 1270-VO-MULTI-PAMGEN-VMF00232 Gambia, The 2019 166 \n", + "1274-VO-KE-KAMAU 1274-VO-KE-KAMAU-VMF00246 Kenya 2006 19 \n", + " 2007 18 \n", + " 2013 24 \n", + " 2014 15 \n", + " 2019 305 \n", + " 2020 72 \n", + " 2021 45 \n", + "1280-VO-ZA-MUNHENGA 1280-VO-ZA-MUNHENGA-VMF00222 South Africa 2021 99 \n", + " 2022 122 \n", + "1281-VO-CM-CHRISTOPHE 1281-VO-CM-CHRISTOPHE-VMF00227 Cameroon 2020 0 \n", + "1323-VO-GM-NGWA 1323-VO-GM-NGWA-VMF00235 Gambia, The 2005 65 \n", + " 2014 1 \n", + " 2021 6 \n", + " 1323-VO-GM-NGWA-VMF00242 Gambia, The 2019 683 \n", + "1329-VO-GA-CHRISTOPHE 1329-VO-GA-CHRISTOPHE-VMF00228 Gabon 2020 1 \n", + "bergey-2019 bergey-2019 Uganda 2015 0 \n", + "campos-2021 campos-2021 Angola 2010 0 \n", + " Benin 2014 0 \n", + " Cameroon 2003 0 \n", + " 2011 0 \n", + " Comoros, The Union of the 2011 0 \n", + " Equatorial Guinea 2002 0 \n", + " Gabon 2018 0 \n", + " Guinea-Bissau 2009 0 \n", + " Madagascar 2018 0 \n", + " Mali 2002 0 \n", + " 2004 0 \n", + " 2006 0 \n", + " 2010 0 \n", + " 2012 0 \n", + " Sao Tome and Principe 1998 0 \n", + " 2017 0 \n", + " Tanzania 2012 0 \n", + " Zambia 2015 0 \n", + "\n", + "taxon coluzzii \\\n", + "study_id sample_set country year \n", + "1270-VO-MULTI-PAMGEN 1270-VO-MULTI-PAMGEN-VMF00218 Ethiopia 2021 0 \n", + " 1270-VO-MULTI-PAMGEN-VMF00232 Gambia, The 2019 4 \n", + "1274-VO-KE-KAMAU 1274-VO-KE-KAMAU-VMF00246 Kenya 2006 5 \n", + " 2007 0 \n", + " 2013 0 \n", + " 2014 0 \n", + " 2019 21 \n", + " 2020 0 \n", + " 2021 0 \n", + "1280-VO-ZA-MUNHENGA 1280-VO-ZA-MUNHENGA-VMF00222 South Africa 2021 0 \n", + " 2022 0 \n", + "1281-VO-CM-CHRISTOPHE 1281-VO-CM-CHRISTOPHE-VMF00227 Cameroon 2020 1 \n", + "1323-VO-GM-NGWA 1323-VO-GM-NGWA-VMF00235 Gambia, The 2005 0 \n", + " 2014 0 \n", + " 2021 1 \n", + " 1323-VO-GM-NGWA-VMF00242 Gambia, The 2019 12 \n", + "1329-VO-GA-CHRISTOPHE 1329-VO-GA-CHRISTOPHE-VMF00228 Gabon 2020 0 \n", + "bergey-2019 bergey-2019 Uganda 2015 0 \n", + "campos-2021 campos-2021 Angola 2010 8 \n", + " Benin 2014 11 \n", + " Cameroon 2003 4 \n", + " 2011 4 \n", + " Comoros, The Union of the 2011 0 \n", + " Equatorial Guinea 2002 5 \n", + " Gabon 2018 5 \n", + " Guinea-Bissau 2009 0 \n", + " Madagascar 2018 0 \n", + " Mali 2002 2 \n", + " 2004 5 \n", + " 2006 3 \n", + " 2010 2 \n", + " 2012 2 \n", + " Sao Tome and Principe 1998 12 \n", + " 2017 19 \n", + " Tanzania 2012 0 \n", + " Zambia 2015 0 \n", + "\n", + "taxon gambiae \\\n", + "study_id sample_set country year \n", + "1270-VO-MULTI-PAMGEN 1270-VO-MULTI-PAMGEN-VMF00218 Ethiopia 2021 0 \n", + " 1270-VO-MULTI-PAMGEN-VMF00232 Gambia, The 2019 10 \n", + "1274-VO-KE-KAMAU 1274-VO-KE-KAMAU-VMF00246 Kenya 2006 1 \n", + " 2007 0 \n", + " 2013 4 \n", + " 2014 0 \n", + " 2019 32 \n", + " 2020 0 \n", + " 2021 0 \n", + "1280-VO-ZA-MUNHENGA 1280-VO-ZA-MUNHENGA-VMF00222 South Africa 2021 0 \n", + " 2022 0 \n", + "1281-VO-CM-CHRISTOPHE 1281-VO-CM-CHRISTOPHE-VMF00227 Cameroon 2020 58 \n", + "1323-VO-GM-NGWA 1323-VO-GM-NGWA-VMF00235 Gambia, The 2005 0 \n", + " 2014 0 \n", + " 2021 1 \n", + " 1323-VO-GM-NGWA-VMF00242 Gambia, The 2019 41 \n", + "1329-VO-GA-CHRISTOPHE 1329-VO-GA-CHRISTOPHE-VMF00228 Gabon 2020 144 \n", + "bergey-2019 bergey-2019 Uganda 2015 113 \n", + "campos-2021 campos-2021 Angola 2010 0 \n", + " Benin 2014 0 \n", + " Cameroon 2003 0 \n", + " 2011 0 \n", + " Comoros, The Union of the 2011 35 \n", + " Equatorial Guinea 2002 4 \n", + " Gabon 2018 0 \n", + " Guinea-Bissau 2009 0 \n", + " Madagascar 2018 10 \n", + " Mali 2002 0 \n", + " 2004 0 \n", + " 2006 6 \n", + " 2010 0 \n", + " 2012 0 \n", + " Sao Tome and Principe 1998 0 \n", + " 2017 0 \n", + " Tanzania 2012 6 \n", + " Zambia 2015 6 \n", + "\n", + "taxon gcx1 \\\n", + "study_id sample_set country year \n", + "1270-VO-MULTI-PAMGEN 1270-VO-MULTI-PAMGEN-VMF00218 Ethiopia 2021 0 \n", + " 1270-VO-MULTI-PAMGEN-VMF00232 Gambia, The 2019 1 \n", + "1274-VO-KE-KAMAU 1274-VO-KE-KAMAU-VMF00246 Kenya 2006 0 \n", + " 2007 0 \n", + " 2013 0 \n", + " 2014 0 \n", + " 2019 0 \n", + " 2020 0 \n", + " 2021 0 \n", + "1280-VO-ZA-MUNHENGA 1280-VO-ZA-MUNHENGA-VMF00222 South Africa 2021 0 \n", + " 2022 0 \n", + "1281-VO-CM-CHRISTOPHE 1281-VO-CM-CHRISTOPHE-VMF00227 Cameroon 2020 0 \n", + "1323-VO-GM-NGWA 1323-VO-GM-NGWA-VMF00235 Gambia, The 2005 1 \n", + " 2014 0 \n", + " 2021 38 \n", + " 1323-VO-GM-NGWA-VMF00242 Gambia, The 2019 49 \n", + "1329-VO-GA-CHRISTOPHE 1329-VO-GA-CHRISTOPHE-VMF00228 Gabon 2020 0 \n", + "bergey-2019 bergey-2019 Uganda 2015 0 \n", + "campos-2021 campos-2021 Angola 2010 0 \n", + " Benin 2014 0 \n", + " Cameroon 2003 0 \n", + " 2011 0 \n", + " Comoros, The Union of the 2011 0 \n", + " Equatorial Guinea 2002 0 \n", + " Gabon 2018 0 \n", + " Guinea-Bissau 2009 6 \n", + " Madagascar 2018 0 \n", + " Mali 2002 0 \n", + " 2004 0 \n", + " 2006 0 \n", + " 2010 0 \n", + " 2012 0 \n", + " Sao Tome and Principe 1998 0 \n", + " 2017 0 \n", + " Tanzania 2012 0 \n", + " Zambia 2015 0 \n", + "\n", + "taxon gcx2 \\\n", + "study_id sample_set country year \n", + "1270-VO-MULTI-PAMGEN 1270-VO-MULTI-PAMGEN-VMF00218 Ethiopia 2021 0 \n", + " 1270-VO-MULTI-PAMGEN-VMF00232 Gambia, The 2019 28 \n", + "1274-VO-KE-KAMAU 1274-VO-KE-KAMAU-VMF00246 Kenya 2006 0 \n", + " 2007 0 \n", + " 2013 0 \n", + " 2014 0 \n", + " 2019 0 \n", + " 2020 0 \n", + " 2021 0 \n", + "1280-VO-ZA-MUNHENGA 1280-VO-ZA-MUNHENGA-VMF00222 South Africa 2021 0 \n", + " 2022 0 \n", + "1281-VO-CM-CHRISTOPHE 1281-VO-CM-CHRISTOPHE-VMF00227 Cameroon 2020 0 \n", + "1323-VO-GM-NGWA 1323-VO-GM-NGWA-VMF00235 Gambia, The 2005 18 \n", + " 2014 0 \n", + " 2021 32 \n", + " 1323-VO-GM-NGWA-VMF00242 Gambia, The 2019 552 \n", + "1329-VO-GA-CHRISTOPHE 1329-VO-GA-CHRISTOPHE-VMF00228 Gabon 2020 0 \n", + "bergey-2019 bergey-2019 Uganda 2015 0 \n", + "campos-2021 campos-2021 Angola 2010 0 \n", + " Benin 2014 0 \n", + " Cameroon 2003 0 \n", + " 2011 0 \n", + " Comoros, The Union of the 2011 0 \n", + " Equatorial Guinea 2002 0 \n", + " Gabon 2018 0 \n", + " Guinea-Bissau 2009 8 \n", + " Madagascar 2018 0 \n", + " Mali 2002 0 \n", + " 2004 0 \n", + " 2006 0 \n", + " 2010 0 \n", + " 2012 0 \n", + " Sao Tome and Principe 1998 0 \n", + " 2017 0 \n", + " Tanzania 2012 0 \n", + " Zambia 2015 0 \n", + "\n", + "taxon melas \\\n", + "study_id sample_set country year \n", + "1270-VO-MULTI-PAMGEN 1270-VO-MULTI-PAMGEN-VMF00218 Ethiopia 2021 0 \n", + " 1270-VO-MULTI-PAMGEN-VMF00232 Gambia, The 2019 0 \n", + "1274-VO-KE-KAMAU 1274-VO-KE-KAMAU-VMF00246 Kenya 2006 0 \n", + " 2007 0 \n", + " 2013 0 \n", + " 2014 0 \n", + " 2019 0 \n", + " 2020 0 \n", + " 2021 0 \n", + "1280-VO-ZA-MUNHENGA 1280-VO-ZA-MUNHENGA-VMF00222 South Africa 2021 0 \n", + " 2022 0 \n", + "1281-VO-CM-CHRISTOPHE 1281-VO-CM-CHRISTOPHE-VMF00227 Cameroon 2020 0 \n", + "1323-VO-GM-NGWA 1323-VO-GM-NGWA-VMF00235 Gambia, The 2005 24 \n", + " 2014 0 \n", + " 2021 0 \n", + " 1323-VO-GM-NGWA-VMF00242 Gambia, The 2019 265 \n", + "1329-VO-GA-CHRISTOPHE 1329-VO-GA-CHRISTOPHE-VMF00228 Gabon 2020 0 \n", + "bergey-2019 bergey-2019 Uganda 2015 0 \n", + "campos-2021 campos-2021 Angola 2010 0 \n", + " Benin 2014 0 \n", + " Cameroon 2003 0 \n", + " 2011 0 \n", + " Comoros, The Union of the 2011 0 \n", + " Equatorial Guinea 2002 0 \n", + " Gabon 2018 0 \n", + " Guinea-Bissau 2009 0 \n", + " Madagascar 2018 0 \n", + " Mali 2002 0 \n", + " 2004 0 \n", + " 2006 0 \n", + " 2010 0 \n", + " 2012 0 \n", + " Sao Tome and Principe 1998 0 \n", + " 2017 0 \n", + " Tanzania 2012 0 \n", + " Zambia 2015 0 \n", + "\n", + "taxon merus \\\n", + "study_id sample_set country year \n", + "1270-VO-MULTI-PAMGEN 1270-VO-MULTI-PAMGEN-VMF00218 Ethiopia 2021 0 \n", + " 1270-VO-MULTI-PAMGEN-VMF00232 Gambia, The 2019 0 \n", + "1274-VO-KE-KAMAU 1274-VO-KE-KAMAU-VMF00246 Kenya 2006 0 \n", + " 2007 0 \n", + " 2013 0 \n", + " 2014 0 \n", + " 2019 0 \n", + " 2020 0 \n", + " 2021 0 \n", + "1280-VO-ZA-MUNHENGA 1280-VO-ZA-MUNHENGA-VMF00222 South Africa 2021 0 \n", + " 2022 1 \n", + "1281-VO-CM-CHRISTOPHE 1281-VO-CM-CHRISTOPHE-VMF00227 Cameroon 2020 0 \n", + "1323-VO-GM-NGWA 1323-VO-GM-NGWA-VMF00235 Gambia, The 2005 0 \n", + " 2014 0 \n", + " 2021 0 \n", + " 1323-VO-GM-NGWA-VMF00242 Gambia, The 2019 0 \n", + "1329-VO-GA-CHRISTOPHE 1329-VO-GA-CHRISTOPHE-VMF00228 Gabon 2020 0 \n", + "bergey-2019 bergey-2019 Uganda 2015 0 \n", + "campos-2021 campos-2021 Angola 2010 0 \n", + " Benin 2014 0 \n", + " Cameroon 2003 0 \n", + " 2011 0 \n", + " Comoros, The Union of the 2011 0 \n", + " Equatorial Guinea 2002 0 \n", + " Gabon 2018 0 \n", + " Guinea-Bissau 2009 0 \n", + " Madagascar 2018 0 \n", + " Mali 2002 0 \n", + " 2004 0 \n", + " 2006 0 \n", + " 2010 0 \n", + " 2012 0 \n", + " Sao Tome and Principe 1998 0 \n", + " 2017 0 \n", + " Tanzania 2012 0 \n", + " Zambia 2015 0 \n", + "\n", + "taxon quadriannulatus \\\n", + "study_id sample_set country year \n", + "1270-VO-MULTI-PAMGEN 1270-VO-MULTI-PAMGEN-VMF00218 Ethiopia 2021 0 \n", + " 1270-VO-MULTI-PAMGEN-VMF00232 Gambia, The 2019 0 \n", + "1274-VO-KE-KAMAU 1274-VO-KE-KAMAU-VMF00246 Kenya 2006 0 \n", + " 2007 0 \n", + " 2013 0 \n", + " 2014 0 \n", + " 2019 2 \n", + " 2020 0 \n", + " 2021 1 \n", + "1280-VO-ZA-MUNHENGA 1280-VO-ZA-MUNHENGA-VMF00222 South Africa 2021 0 \n", + " 2022 1 \n", + "1281-VO-CM-CHRISTOPHE 1281-VO-CM-CHRISTOPHE-VMF00227 Cameroon 2020 0 \n", + "1323-VO-GM-NGWA 1323-VO-GM-NGWA-VMF00235 Gambia, The 2005 0 \n", + " 2014 0 \n", + " 2021 0 \n", + " 1323-VO-GM-NGWA-VMF00242 Gambia, The 2019 0 \n", + "1329-VO-GA-CHRISTOPHE 1329-VO-GA-CHRISTOPHE-VMF00228 Gabon 2020 0 \n", + "bergey-2019 bergey-2019 Uganda 2015 0 \n", + "campos-2021 campos-2021 Angola 2010 0 \n", + " Benin 2014 0 \n", + " Cameroon 2003 0 \n", + " 2011 0 \n", + " Comoros, The Union of the 2011 0 \n", + " Equatorial Guinea 2002 0 \n", + " Gabon 2018 0 \n", + " Guinea-Bissau 2009 0 \n", + " Madagascar 2018 0 \n", + " Mali 2002 0 \n", + " 2004 0 \n", + " 2006 0 \n", + " 2010 0 \n", + " 2012 0 \n", + " Sao Tome and Principe 1998 0 \n", + " 2017 0 \n", + " Tanzania 2012 0 \n", + " Zambia 2015 0 \n", + "\n", + "taxon unassigned \n", + "study_id sample_set country year \n", + "1270-VO-MULTI-PAMGEN 1270-VO-MULTI-PAMGEN-VMF00218 Ethiopia 2021 0 \n", + " 1270-VO-MULTI-PAMGEN-VMF00232 Gambia, The 2019 3 \n", + "1274-VO-KE-KAMAU 1274-VO-KE-KAMAU-VMF00246 Kenya 2006 0 \n", + " 2007 0 \n", + " 2013 0 \n", + " 2014 0 \n", + " 2019 0 \n", + " 2020 0 \n", + " 2021 0 \n", + "1280-VO-ZA-MUNHENGA 1280-VO-ZA-MUNHENGA-VMF00222 South Africa 2021 0 \n", + " 2022 0 \n", + "1281-VO-CM-CHRISTOPHE 1281-VO-CM-CHRISTOPHE-VMF00227 Cameroon 2020 0 \n", + "1323-VO-GM-NGWA 1323-VO-GM-NGWA-VMF00235 Gambia, The 2005 0 \n", + " 2014 0 \n", + " 2021 1 \n", + " 1323-VO-GM-NGWA-VMF00242 Gambia, The 2019 28 \n", + "1329-VO-GA-CHRISTOPHE 1329-VO-GA-CHRISTOPHE-VMF00228 Gabon 2020 1 \n", + "bergey-2019 bergey-2019 Uganda 2015 0 \n", + "campos-2021 campos-2021 Angola 2010 0 \n", + " Benin 2014 0 \n", + " Cameroon 2003 0 \n", + " 2011 0 \n", + " Comoros, The Union of the 2011 0 \n", + " Equatorial Guinea 2002 0 \n", + " Gabon 2018 0 \n", + " Guinea-Bissau 2009 0 \n", + " Madagascar 2018 0 \n", + " Mali 2002 0 \n", + " 2004 0 \n", + " 2006 0 \n", + " 2010 0 \n", + " 2012 0 \n", + " Sao Tome and Principe 1998 0 \n", + " 2017 0 \n", + " Tanzania 2012 0 \n", + " Zambia 2015 0 " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_samples = ag3.sample_metadata(sample_sets=\"3.9\")\n", + "df_summary = df_samples.pivot_table(\n", + " index=[\"study_id\",\"sample_set\", \"country\", \"year\"], \n", + " columns=[\"taxon\"],\n", + " values=\"sample_id\", \n", + " aggfunc=len,\n", + " fill_value=0)\n", + "df_summary" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dLiU0ulIpk8p" + }, + "source": [ + "Note that there can be multiple sampling sites represented within the same sample set." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OToX5vhfpk8p" + }, + "source": [ + "## Further reading\n", + "\n", + "We hope this page has provided a useful introduction to the `Ag3.9` data resource. If you would like to start working with these data, please visit the [cloud data access guide](cloud) or the [data download guide](download) or continue browsing the other documentation on this site.\n", + "\n", + "If you have any questions about the data and how to use them, please do get in touch by [starting a new discussion](https://github.com/malariagen/vector-data/discussions/new) on the malariagen/vector-data repository on GitHub." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "celltoolbar": "Tags", + "colab": { + "name": "Ag3.0-intro.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "developer-developer-training-nb-maintenance-mgen-8.7.0", + "language": "python", + "name": "conda-env-developer-developer-training-nb-maintenance-mgen-8.7.0-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From 9595e4361e26bc1f9cd1f003b65d411bd23dd4ad Mon Sep 17 00:00:00 2001 From: ahernank Date: Mon, 8 Apr 2024 10:30:35 -0500 Subject: [PATCH 2/5] update toc --- docs/_toc.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/_toc.yml b/docs/_toc.yml index d38c317..5d76ee9 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -12,6 +12,7 @@ parts: - file: ag3/ag3.6 - file: ag3/ag3.7 - file: ag3/ag3.8 + - file: ag3/ag3.9 - file: ag3/cloud - file: ag3/download - file: ag3/methods From 15d39fb9617e70252661c843d5bc3c40c3be64a1 Mon Sep 17 00:00:00 2001 From: ahernank Date: Mon, 8 Apr 2024 10:31:46 -0500 Subject: [PATCH 3/5] fix typo --- docs/ag3/ag3.9.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ag3/ag3.9.ipynb b/docs/ag3/ag3.9.ipynb index b509bad..339deff 100644 --- a/docs/ag3/ag3.9.ipynb +++ b/docs/ag3/ag3.9.ipynb @@ -110,7 +110,7 @@ }, "outputs": [], "source": [ - "#!pip install -qq malariagen_data" + "!pip install -qq malariagen_data" ] }, { From 0cec8dc86765c5a259a005d395bf47efe1054b43 Mon Sep 17 00:00:00 2001 From: ahernank Date: Mon, 8 Apr 2024 10:33:00 -0500 Subject: [PATCH 4/5] fix link --- docs/ag3/ag3.9.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ag3/ag3.9.ipynb b/docs/ag3/ag3.9.ipynb index 339deff..6e27e82 100644 --- a/docs/ag3/ag3.9.ipynb +++ b/docs/ag3/ag3.9.ipynb @@ -52,7 +52,7 @@ "\n", "This release also includes data from two studies openly available in the literature: \n", "- campos-2021 - [_The origin of island populations of the African malaria mosquito, Anopheles coluzzii_](https://doi.org/10.1038/s42003-021-02168-0) & [_Selection of sites for field trials of genetically engineered mosquitoes with gene drive_](https://doi.org/10.1111/eva.13283).\n", - "- [bergey-2019](https://doi.org/10.1111/eva.12878)- _Assessing connectivity despite high diversity in island populations of a malaria mosquito_." + "- bergey-2019 - [_Assessing connectivity despite high diversity in island populations of a malaria mosquito_](https://doi.org/10.1111/eva.12878)." ] }, { From f43cf6b92f03ce767cc51ad94560a343fb621e76 Mon Sep 17 00:00:00 2001 From: ahernank Date: Mon, 8 Apr 2024 13:23:43 -0500 Subject: [PATCH 5/5] add italics --- docs/ag3/ag3.9.ipynb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/ag3/ag3.9.ipynb b/docs/ag3/ag3.9.ipynb index 6e27e82..66bf07e 100644 --- a/docs/ag3/ag3.9.ipynb +++ b/docs/ag3/ag3.9.ipynb @@ -42,9 +42,9 @@ "source": [ "## Partner studies\n", "\n", - "- [1270-VO-MULTI-PAMGEN (Ethiopia)](https://www.malariagen.net/network/where-we-work/1270-vo-multi-pamgen) - PAMGEN: Genetic interactions between human populations and malaria parasites in different environmental settings across Africa _\n", - "- [1270-VO-MULTI-PAMGEN (The Gambia)](https://www.malariagen.net/network/where-we-work/1270-vo-multi-pamgen) - PAMGEN: Genetic interactions between human populations and malaria parasites in different environmental settings across Africa _\n", - "- [1274-VO-KE-KAMAU](https://www.malariagen.net/network/where-we-work/1274-VO-KE-KAMAU) - _ PAMCA Anopheles genomics programme - Anopheles gambiae and Anopheles arabiensis genetic diversity and association with insecticide resistance in Kenya_\n", + "- [1270-VO-MULTI-PAMGEN (Ethiopia)](https://www.malariagen.net/network/where-we-work/1270-vo-multi-pamgen) - _PAMGEN: Genetic interactions between human populations and malaria parasites in different environmental settings across Africa_\n", + "- [1270-VO-MULTI-PAMGEN (The Gambia)](https://www.malariagen.net/network/where-we-work/1270-vo-multi-pamgen) - _PAMGEN: Genetic interactions between human populations and malaria parasites in different environmental settings across Africa_\n", + "- [1274-VO-KE-KAMAU](https://www.malariagen.net/network/where-we-work/1274-VO-KE-KAMAU) - _PAMCA Anopheles genomics programme - Anopheles gambiae and Anopheles arabiensis genetic diversity and association with insecticide resistance in Kenya_\n", "- [1280-VO-ZA-MUNHENGA](https://www.malariagen.net/network/where-we-work/1280-VO-ZA-MUNHENGA) - _PAMCA Anopheles genomics programme - Genetic structuring in the major malaria vector Anopheles arabiensis and implication on vector control in South Africa_\n", "- [1281-VO-CM-CHRISTOPHE](https://www.malariagen.net/network/where-we-work/1281-VO-CM-CHRISTOPHE) - _ANOSPP screening of Anopheles species and Plasmodium presence in malaria vectors in Nigeria_\n", "- [1323-VO-GM-NGWA](https://www.malariagen.net/network/where-we-work/1326-VO-UG-KAYONDO) - _Anopheles gambiae vector surveillance in The Gambia_\n",