Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine why the count of assemblies on the organisms list does not match the genomes for the taxon on the genomes list #192

Open
NoopDog opened this issue Dec 5, 2024 · 1 comment
Assignees

Comments

@NoopDog
Copy link
Collaborator

NoopDog commented Dec 5, 2024

https://brc-analytics.dev.clevercanary.com/data/organisms

For example

Anopheles gambiae on the organisms page lists an "Assemblies" count of 7 but lists one assembly on the genomes page.

@NoopDog
Copy link
Collaborator Author

NoopDog commented Dec 5, 2024

I believe this may be because we are filtering out the extra genomes in the query to get the genomes from NCBI. The query looks like this:

From #157

https://api.ncbi.nlm.nih.gov/datasets/v2/genome/taxon/7165%2C5501/dataset_report?filters.assembly_source=refseq&
filters.has_annotation=true&
filters.exclude_paired_reports=true&
filters.exclude_atypical=true&
filters.assembly_level=scaffold&
filters.assembly_level=chromosome&filters.assembly_level=complete_genome

So @nekrut should we take the counts from # Assemblies = (reports -> taxonomy -> counts[0]) e.g.

 "counts": [
          {
            "type": "COUNT_TYPE_ASSEMBLY",
            "count": 7
          },

Or should we aggregate the count of assemblies returned when using the extra filters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants