Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing possibility to update physics_group_name for a dataset. #87

Open
ArturAkh opened this issue Feb 14, 2023 · 10 comments
Open

Comments

@ArturAkh
Copy link

Dear DBS developers,

As discussed in CMSTalk:

https://cms-talk.web.cern.ch/t/moving-cms-embedding-samples-run-2-ul-to-t1s-global-storage-area/10203

It would be good to have the possibility to update physics_group_name for a dataset, at least in prod/phys03

Thank you very much in advance for considering this issue.

Best regards,

Artur Gottmann

@vkuznet
Copy link
Contributor

vkuznet commented Feb 14, 2023

@d-ylee I hope you can take care of this issue. Here we have two in fact:

  • one is adding update datasets info with additional meta-data phsyics_group_name,
  • and, another is authorization of DBS updates in phys03 DBS instance.

The former is trivial to add to dataset update API. The latter will require some thoughts. For that I suggest to introduce phys03 dbs group in CRIC (the service which keep users group/roles and where we already have dbs group). But we need a separate group/role to avoid people writing access to production DBS. The production DBS write access is only granted to dbs group and we should keep it this way, and for phys03 instance we need to have a new group. Said that, the next step would be to add authorization middleware for phys03 dbs group. Once it is done , the update APIs should be allowed for this specific group. Finally, we need to identify people who may be included in this group and update CRIC.

@d-ylee
Copy link
Contributor

d-ylee commented Mar 3, 2023

@vkuznet Not sure if I am understanding this correctly, but if we update the physics_group_name, would this affect other datasets that also use the same physics group? Or, for the logic to update a dataset's physics_group_name, would it be the following:

  1. Request to update physics_group_name for a dataset via /dataset/ PUT API
  2. Check in PHYSICS_GROUPS table to see if the new physics_group_name is not being used.
  3. If the physics_group_name already exists, get the PHYSICS_GROUP_ID the PHYSICS_GROUPS table and use the value to set the PHYSICS_GROUP_ID in the DATASETS table for that dataset.
  4. If the physics_group_name does not exist, create a new PHYSICS_GROUPS, get the new PHYSICS_GROUP_ID and set this new id to PHYSICS_GROUP_ID for the dataset.
  5. Commit the changes

@vkuznet
Copy link
Contributor

vkuznet commented Mar 4, 2023

Dennis, all steps are fine except #4. I think we should not allow users to create groups in DBS, I think it should be done by DBS admins similar to data-tiers. The reason is simple, user may provide a data which may contain typo, and it leads to a possibility to create undesired groups which later should be fixed. I would suggest that you implement steps 1,2,3,5 only. If users need a new group we may create it manually in DBS (after confirming with appropriate L2 of the group), and later users may inject data into this group. In other words, we'll allow to update the group but not to create a new one.

@d-ylee
Copy link
Contributor

d-ylee commented Mar 9, 2023

Valentin, in this case, if a user provides a physics_group_name as a parameter for the /dataset PUT API (UpdateDatasets), then instead of step 4, the PHYSICS_GROUP_NAME should be updated for the currently assigned PHYSICS_GROUP_ID in the dataset? As a result, all datasets that use the same PHYSICS_GROUP_ID will have a renamed PHYSICS_GROUP_NAME.

@vkuznet
Copy link
Contributor

vkuznet commented Mar 9, 2023

well, I think it is obvious. If user provides physics_group_name we'll get its id and update its id in dataset table. This will basically update the physics_group_name associated with this dataset. But we should not touch all datasets! The update will only happen for that specific dataset.

@d-ylee
Copy link
Contributor

d-ylee commented Mar 10, 2023

While working on this, I noticed that the DBSPutHandler gives HTTP 500 when an error happens in an update:

dbs2go/web/handlers.go

Lines 444 to 448 in 928dc25

if err != nil {
responseMsg(w, r, err, http.StatusInternalServerError)
return
}
}

...while for the DBSPostHandler and DBSGetHandler both return HTTP 400:

dbs2go/web/handlers.go

Lines 569 to 573 in 928dc25

if err != nil {
responseMsg(w, r, err, http.StatusBadRequest)
return
}
}

Should the DBSPutHandler return HTTP 500? Or should DBSPutHandler also give HTTP 400 on an error?

@vkuznet
Copy link
Contributor

vkuznet commented Mar 10, 2023

Dennis, good observation. The 500 refers to issue with server, while 400 to issue with request. Since this case we got error from the API it is 400 error rather 500. Please make proper adjustment in a code.

@d-ylee
Copy link
Contributor

d-ylee commented Mar 10, 2023

@vkuznet I pushed the initial feature to update a dataset via using physics_group_name parameter. Is this what we are looking for?
1563568

d-ylee added a commit that referenced this issue Apr 3, 2023
…me_in_datasets

Initial feature for Issue #87

Allows a dataset's `physics_group_id` to be updated to another, existing physics_group
@d-ylee
Copy link
Contributor

d-ylee commented Apr 3, 2023

@vkuznet For the authorization of DBS updates for only phys03, would this only be fore the datasets update, or for all of the updates. In this case, would a custom authentication be required for just the datasets DBSPutHandler case?

@vkuznet
Copy link
Contributor

vkuznet commented Apr 3, 2023

Dennis, you're mixing different concepts here. The authentication only authenticates users to access a particular resource, while authorization decides what user can do with a resources. Your question is about authorization rather than authentication. As far as I recall the original DBS python code relies on authorization per API basis, and therefore we should use it too, and only authorize update of datasets API with physics_group. You may put in place dedicated authorization for specific API. The authorization should check if user is part of specific group (provided by CMSWEB HTTP headers). For more see cmsauth module and/or CMSWEB HTTP headers, e.g. use httpgo service to display them. The one you are looking for is Cms-Authz-Admin, Cms-Authz-Operator or Cms-Authz-User. The user who will be allowed by authorization should be in some specific group we decide, e.g.group:dbs-phys03-admin. These groups are defined in CMS CRIC service and once we'll create/establish such group we may assign different users to it, and it will allow only users from a specific group to perform certain operation which will be checked by dbs middleware.

@d-ylee d-ylee removed their assignment Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants