Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "efs-dir" provsioning mode #1497

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

mpatlasov
Copy link
Contributor

@mpatlasov mpatlasov commented Nov 8, 2024

This PR was inspired by Jonathan Rainer's PR, and Fabio Bertinatto's PR.

Is this a bug fix or adding new feature?

New feature, requested in #538 and #517 and possibly other issues, this comes up a lot as a feature people would like.

What is this PR about? / Why do we need it?

Since its creation the driver has supported Access Point Provisioning as its main means of dynamic provisioning. However this is problematic for a few reasons, it can cause issues with deleting access-points (as reported here) and there is a hard limit of 120 AccessPoints per EFS which for some use cases is very quickly depleted. Further it requires various AWS IAM Permissions that can be complicated to sort out and manage.

This PR allows a new provisioning mode called efs-dir so that instead of creating EFS access points, directories are created instead. This is achieved by creating a new Interface called Provisioner that is implemented by an AccessPointProvisioner (the original method) and a DirectoryProvisioner (the new method) this also allows in future for different kinds of provisioning to occur, maybe FileSystemProvisioning for example.

What testing is done?
New code is covered by unit-tests, they all pass when I run "go test" locally. New e2e tests are added covering create/delete operation in "efs-dir" mode. They also pass when I run them against OpenShift (OCP) cluster with new driver installed.

fixes #538

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 8, 2024
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 8, 2024
@mpatlasov
Copy link
Contributor Author

/test pull-aws-efs-csi-driver-unit

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 26, 2024
@seanzatzdev-amazon
Copy link
Contributor

Please rebase

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 11, 2024
@mpatlasov
Copy link
Contributor Author

Update: last force-push rebased PR (only).

@mpatlasov
Copy link
Contributor Author

Update: last force-push fixed e2e tests for "efs-dir".

@mpatlasov
Copy link
Contributor Author

/test pull-aws-efs-csi-driver-unit

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 17, 2024
Copy link
Contributor

@dankova22 dankova22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpatlasov Thanks for the PR, we are considering supporting this feature. If could you address these comments and rebase, that would be great

| directoryPerms | | | false | Directory permissions for [Access Point root directory](https://docs.aws.amazon.com/efs/latest/ug/efs-access-points.html#enforce-root-directory-access-point) creation. |
| uid | | | true | POSIX user Id to be applied for [Access Point root directory](https://docs.aws.amazon.com/efs/latest/ug/efs-access-points.html#enforce-root-directory-access-point) creation. |
| gid | | | true | POSIX group Id to be applied for [Access Point root directory](https://docs.aws.amazon.com/efs/latest/ug/efs-access-points.html#enforce-root-directory-access-point) creation. |
| gidRangeStart | | 50000 | true | Start range of the POSIX group Id to be applied for [Access Point root directory](https://docs.aws.amazon.com/efs/latest/ug/efs-access-points.html#enforce-root-directory-access-point) creation. Not used if uid/gid is set. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update storage class parameters documentation that are not supported for efs-dir mode to more clearly state they are only applicable to access point provisioning?

klog.V(5).Infof("Provisioning directory with permissions %s", perms)

provisionedDirectory := path.Join(target, provisionedPath)
err = d.osClient.MkDirAllWithPermsNoOwnership(provisionedDirectory, perms)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add support for using static uid/gid from storage class?

I don't think dynamically selecting from a range as we do for access point provisioning makes sense here, but offering a static uid/gid option for the provisioned dir offers users an option who do not want permissive directory perms or to run their pods as root

if value, ok := volumeParams[AzName]; ok {
azName = value
}
mountTarget, err := localCloud.DescribeMountTargets(ctx, fileSystemId, azName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we reduce the provision function to only call DescribeMountTargets once instead of (max) 2?

Part of the advantage of using directory provisioning would be reducing API calls / chance of throttling, so we should try to limit these calls as much as possible.

We should be able to return the result from getMountOptions and reuse this for mountOptions and volContext

} else {
// If it is nil then it's safe to try and delete the directory as it should now be empty
klog.V(5).Infof("Deleting temporary directory at '%s'", target)
if err := d.osClient.RemoveAll(target); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this Remove instead of RemoveAll as extra safety precaution to ensure we are not deleting any unintended data?

This is unlikely as UUID is appended to target, but will ensure there is no accidental deletion if theres any other mounts in TempMountPathPrefix

@dankova22
Copy link
Contributor

Hi @mpatlasov Im interested to hear more about one of your concerns with current access point provisioning:

it can causes issues with user permissions around deleting provisioned directories

Is this referring to issues you have faced deleting access point root dirs outside of k8 or when deleting volumes with deleteAccessPointRootDir?

Use `buildDriver()` instead of `&Driver{...}`
Controller's CreateVolume and DeleteVolume now calls method of Provisioner interface. All heavy-lifting logic of create/delete specific for access-point provsioning is hidden now in AccessPointProvisioner struct.
Implement `DirectoryProvisioner` struct and its `Provision/Delete` methods. Add `delete-provisioned-dir` command-line option to control DeleteVolume behavior in efs-dir mode.
The patch adds new unit-tests for `provisioner_ap.go` and `provisioner_dir.go`, and also adds a few unit-tests for `controller.go`.
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mpatlasov
Once this PR has been reviewed and has the lgtm label, please assign nckturner for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2025
@mpatlasov
Copy link
Contributor Author

Hi @dankova22 , thank you for reviewing this PR, highly appreciated! I've just rebased it, and not addressed your comments yet. Will do it soon.

@mpatlasov
Copy link
Contributor Author

Hi @mpatlasov Im interested to hear more about one of your concerns with current access point provisioning:

it can causes issues with user permissions around deleting provisioned directories

Is this referring to issues you have faced deleting access point root dirs outside of k8 or when deleting volumes with deleteAccessPointRootDir?

This phrase in PR's description is copy/pasta from Jonathan Rainer's PR description. I thought it referred the issue #517:

What happened?
Access Points are not being deleted all PVCs in the namespace are deleted
What you expected to happen?
Access Points needs ro deleted when corresponding PVC is deleted

I have to remove "user permissions around deleting provisioned directories" from the description, sorry for that confusion.

@mpatlasov
Copy link
Contributor Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 8, 2025
@k8s-ci-robot
Copy link
Contributor

@mpatlasov: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-aws-efs-csi-driver-e2e ad2a406 link true /test pull-aws-efs-csi-driver-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disable Access Point Usage
4 participants