-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for AnVIL duos_id (#6620) #6668
base: develop
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6668 +/- ##
===========================================
+ Coverage 85.63% 85.66% +0.02%
===========================================
Files 149 149
Lines 20950 20956 +6
===========================================
+ Hits 17940 17951 +11
+ Misses 3010 3005 -5 ☔ View full report in Codecov by Sentry. |
677649b
to
85b0fd6
Compare
89e199b
to
9032559
Compare
@@ -457,14 +457,16 @@ def _supplementary_bundle(self, bundle_fqid: TDRAnvilBundleFQID) -> TDRAnvilBund | |||
def _duos_bundle(self, bundle_fqid: TDRAnvilBundleFQID) -> TDRAnvilBundle: | |||
assert not bundle_fqid.is_batched, bundle_fqid | |||
duos_info = self.tdr.get_duos(bundle_fqid.source) | |||
duos_id = None if duos_info is None else one(duos_info['consentGroups'])['datasetIdentifier'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L
f9f87ff
to
7b0b646
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ✅
@@ -457,14 +457,20 @@ def _supplementary_bundle(self, bundle_fqid: TDRAnvilBundleFQID) -> TDRAnvilBund | |||
def _duos_bundle(self, bundle_fqid: TDRAnvilBundleFQID) -> TDRAnvilBundle: | |||
assert not bundle_fqid.is_batched, bundle_fqid | |||
duos_info = self.tdr.get_duos(bundle_fqid.source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, get_duos() already uses the ID and could return it. If that is not the case, I would like to understand why. If it is, that method should return a tuple of the ID and the info, after asserting that the ID used to look up the info, and the ID embedded in it are equal.
duos_id = None | ||
description = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duos_id = None | |
description = None | |
duos_id, description = None, None |
3acd295
to
06cb30a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a7594de
to
a1520bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please squash all commits into a single one.
src/azul/terra.py
Outdated
require(duos_id == consent_group['datasetIdentifier'], | ||
duos_id, consent_group['datasetIdentifier']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require(duos_id == consent_group['datasetIdentifier'], | |
duos_id, consent_group['datasetIdentifier']) | |
require(duos_id == consent_group['datasetIdentifier'], | |
"Mismatched identifiers", duos_id, consent_group) |
a1520bd
to
d3d5723
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that there aren't aren't any changes to test_response_anvil.py
and why there isn't an entry for the new field in azul.plugins.metadata.anvil.Plugin._field_mapping
. I noticed that there isn't a mapping entry for the description either, or any coverage of that field in test_response_anvil.py
so I think there is some existing test debt.
Most importantly, does this work? Can you actually see the added field in the service response of your deployment?
Please discuss this with @nadove-ucsc. The three of us should then discuss whether we should remediate the debt in this PR or in a follow-up issue.
@@ -210,6 +210,7 @@ def _non_pivotal_fields_by_entity_type(self) -> dict[str, set[str]]: | |||
}, | |||
'datasets': { | |||
'dataset_id', | |||
'duos_id', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why description
isn't listed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list is in _non_pivotal_fields_by_entity_type
. In non-datasets endpoints, the datasets
inner entity only contains a small subset of datasets
fields.
$ http 'https://service.explore.anvilproject.org/index/activities?size=2' | jq '.hits[].datasets'
[
{
"dataset_id": [
"8a756d17-c54c-3b33-312b-c767552a8516"
],
"title": [
"ANVIL_1000G_high_coverage_2019"
]
}
]
[
{
"dataset_id": [
"8a756d17-c54c-3b33-312b-c767552a8516"
],
"title": [
"ANVIL_1000G_high_coverage_2019"
]
}
]
@@ -498,6 +498,7 @@ def _duos_types(cls) -> FieldTypes: | |||
return { | |||
'document_id': null_str, | |||
'description': null_str, | |||
'duos_id': null_str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also change azul.plugins.metadata.anvil.indexer.transform.SingletonTransformer._is_duos to use duos_id
instead of description
.
The tests in that file index a primary bundle, not a DUOS bundle.
It is there. 7aeec0b
Having an entry in
The field is tested in
Yes
However the
|
0b6e556
to
ab479b3
Compare
bf4cde5
to
a1c6657
Compare
a1c6657
to
3215614
Compare
Yes …
… however due to #6609 both the
I tried a reindex with a updated version of Noa's patch applied … Index: src/azul/plugins/repository/tdr_anvil/__init__.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/azul/plugins/repository/tdr_anvil/__init__.py b/src/azul/plugins/repository/tdr_anvil/__init__.py
--- a/src/azul/plugins/repository/tdr_anvil/__init__.py (revision 00c13a2241e3c0453e51e70535b971a1df502ea3)
+++ b/src/azul/plugins/repository/tdr_anvil/__init__.py (date 1736384121828)
@@ -279,7 +279,10 @@
FROM {backtick(self._full_table_name(spec, BundleType.duos.value))}
'''))
dataset_row_id = row['datarepo_row_id']
- if dataset_row_id.startswith(prefix):
+ common_prefix = spec.prefix.common
+ assert prefix.startswith(common_prefix), (prefix, spec)
+ partition_prefix = prefix.removeprefix(common_prefix)
+ if dataset_row_id[len(common_prefix):].startswith(partition_prefix):
bundle_uuid = change_version(dataset_row_id,
self.datarepo_row_uuid_version,
self.bundle_uuid_version) … and the
|
Connected issues: #6620
Checklist
Author
develop
issues/<GitHub handle of author>/<issue#>-<slug>
1 when the issue title describes a problem, the corresponding PR
title is
Fix:
followed by the issue titleAuthor (partiality)
p
tag to titles of partial commitspartial
or completely resolves all connected issuespartial
labelAuthor (chains)
base
or this PR is not chained to another PRchained
or is not chained to another PRAuthor (reindex, API changes)
r
tag to commit title or the changes introduced by this PR will not require reindexing of any deploymentreindex:dev
or the changes introduced by it will not require reindexing ofdev
reindex:anvildev
or the changes introduced by it will not require reindexing ofanvildev
reindex:anvilprod
or the changes introduced by it will not require reindexing ofanvilprod
reindex:prod
or the changes introduced by it will not require reindexing ofprod
reindex:partial
and its description documents the specific reindexing procedure fordev
,anvildev
,anvilprod
andprod
or requires a full reindex or carries none of the labelsreindex:dev
,reindex:anvildev
,reindex:anvilprod
andreindex:prod
API
or this PR does not modify a REST APIa
(A
) tag to commit title for backwards (in)compatible changes or this PR does not modify a REST APIapp.py
or this PR does not modify a REST APIAuthor (upgrading deployments)
make docker_images.json
and committed the resulting changes or this PR does not modifyazul_docker_images
, or any other variables referenced in the definition of that variableu
tag to commit title or this PR does not require upgrading deploymentsupgrade
or does not require upgrading deploymentsdeploy:shared
or does not modifydocker_images.json
, and does not require deploying theshared
component for any other reasondeploy:gitlab
or does not require deploying thegitlab
componentdeploy:runner
or does not require deploying therunner
imageAuthor (hotfixes)
F
tag to main commit title or this PR does not include permanent fix for a temporary hotfixanvilprod
andprod
) have temporary hotfixes for any of the issues connected to this PRAuthor (before every review)
develop
, squashed old fixupsmake requirements_update
or this PR does not modifyrequirements*.txt
,common.mk
,Makefile
andDockerfile
R
tag to commit title or this PR does not modifyrequirements*.txt
reqs
or does not modifyrequirements*.txt
make integration_test
passes in personal deployment or this PR does not modify functionality that could affect the IT outcomePeer reviewer (after approval)
System administrator (after approval)
demo
orno demo
no demo
no sandbox
N reviews
label is accurateOperator (before pushing merge the commit)
reindex:…
labels andr
commit title tagno demo
develop
_select dev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused
or this PR is not labeleddeploy:shared
_select dev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply
or this PR is not labeleddeploy:gitlab
_select anvildev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused
or this PR is not labeleddeploy:shared
_select anvildev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply
or this PR is not labeleddeploy:gitlab
deploy:gitlab
deploy:gitlab
System administrator
dev.gitlab
are complete or this PR is not labeleddeploy:gitlab
anvildev.gitlab
are complete or this PR is not labeleddeploy:gitlab
Operator (before pushing merge the commit)
_select dev.gitlab && make -C terraform/gitlab/runner
or this PR is not labeleddeploy:runner
_select anvildev.gitlab && make -C terraform/gitlab/runner
or this PR is not labeleddeploy:runner
sandbox
label or PR is labeledno sandbox
dev
or PR is labeledno sandbox
anvildev
or PR is labeledno sandbox
sandbox
deployment or PR is labeledno sandbox
anvilbox
deployment or PR is labeledno sandbox
sandbox
deployment or PR is labeledno sandbox
anvilbox
deployment or PR is labeledno sandbox
sandbox
or this PR does not remove catalogs or otherwise causes unreferenced indices indev
anvilbox
or this PR does not remove catalogs or otherwise causes unreferenced indices inanvildev
sandbox
or this PR is not labeledreindex:dev
anvilbox
or this PR is not labeledreindex:anvildev
sandbox
or this PR is not labeledreindex:dev
anvilbox
or this PR is not labeledreindex:anvildev
p
if the PR is also labeledpartial
Operator (chain shortening)
develop
or this PR is not labeledbase
chained
label from the blocked PR or this PR is not labeledbase
base
base
label from this PR or this PR is not labeledbase
Operator (after pushing the merge commit)
dev
anvildev
dev
dev
anvildev
anvildev
_select dev.shared && make -C terraform/shared apply
or this PR is not labeleddeploy:shared
_select anvildev.shared && make -C terraform/shared apply
or this PR is not labeleddeploy:shared
dev
anvildev
Operator (reindex)
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
Operator
deploy:shared
,deploy:gitlab
,deploy:runner
,API
,reindex:partial
,reindex:anvilprod
andreindex:prod
labels to the next promotion PRs or this PR carries none of these labelsdeploy:shared
,deploy:gitlab
,deploy:runner
,API
,reindex:partial
,reindex:anvilprod
andreindex:prod
labels, from the description of this PR to that of the next promotion PRs or this PR carries none of these labelsShorthand for review comments
L
line is too longW
line wrapping is wrongQ
bad quotesF
other formatting problem