Skip to content

Commit

Permalink
add new cohorts
Browse files Browse the repository at this point in the history
  • Loading branch information
ahernank committed Oct 11, 2024
1 parent 4b40807 commit 4d26fd2
Showing 1 changed file with 44 additions and 0 deletions.
44 changes: 44 additions & 0 deletions _posts/2024-10-02-ag3-cohorts-v20240924.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
layout: post
title: "Ag3 cohorts analysis version 20240924"
tags: data
---

A new cohorts analysis version `20240924` has been released for the
Ag3 data resource. This is now the default cohorts analysis version
when using the `malariagen_data` [Ag3
API](https://malariagen.github.io/malariagen-data-python/latest/Ag3.html). This
cohorts analysis is available for datasets up to and including Ag3.11.

Please note that the new cohorts analysis may change the values of
sample metadata columns including `taxon`, `admin1_iso`,
`admin1_name`, `admin2_name`, and derived columns beginning `cohorts_`
relative to previous cohorts analysis versions.

To pin this cohorts analysis when accessing data:

{% highlight python %}
import malariagen_data

ag3 = malariagen_data.Ag3(
cohorts_analysis="20240924",
)
{% endhighlight %}

This new version introduces some key changes:

- Samples that were previously assigned as `gcx1`, have been renamed as `bissau`:
- `gcx` stands for `genetic cryptic species`, we use these labels as name placeholders for groups that fall outside our usual taxonomic assignment
- In line with [Caputo et al. (2024)](https://malariagen.github.io/vobs-updates/2024/09/10/caputo.html), which characterises the `gcx1` group, we have updated its proposed name to `Bissau molecular form`
- 291 samples that were previously assigned to the `gcx1` group, are now relabeled as `bissau`.
- 5 samples samples that were previously `unassigned`, are now relabeled as `bissau`.
- these changes also affect cohort names, e.g. `GM-M_gcx1_2019` has now been relabeled to `GM-M_biss_2019`

- 36 samples that were previously `unassigned`, have been renamed as (32) `melas`, (2)`gambiae`, (1) `fontenillei`, (1) `arabiensis`.

- An error on the administrative region 1 metadata has been fixed, affecting 119 samples. Tor these:
- `admin1_iso` has been relabeled from `UG-E` to `KE-04`
- `admin1_name` has been relabeled from `Eastern Region` to `Busia`
- these changes also affect cohort names, e.g. `UG-E_arab_2013` has now been relabeled to `KE-04_arab_2013`

If you need to access the previous version of the cohorts analysis, you can use pin it using the code in [here](https://malariagen.github.io/vobs-updates/2024/07/24/ag3-cohorts-v20240717.html).

0 comments on commit 4d26fd2

Please sign in to comment.