Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: GID for group already in use by a different group #509

Open
adombeck opened this issue Sep 2, 2024 · 27 comments · May be fixed by #663
Open

Issue: GID for group already in use by a different group #509

adombeck opened this issue Sep 2, 2024 · 27 comments · May be fixed by #663
Labels
bug Something isn't working high High importance issue jira

Comments

@adombeck
Copy link
Contributor

adombeck commented Sep 2, 2024

Describe the issue

Reported by @jhaar on #496:

Secondly, once I went through the whole initialization process again for "[email protected]", it still fails - this time this shows up in the logs

authd[900]: 2024/08/26 13:31:57 ERROR GID for group "group1" already in use by group "group2"
authd[900]: 2024/08/26 13:31:57 WARN can't check authentication: failed to update user "[email protected]": GID for group "group1" already in use by a different group

Now that I can't help you with. Yes those are two AD groups (if it matters, we are Hybrid mode, i.e. old Enterprise AD hooked into EntraID) that I am members of - why they show up as the same GID is beyond me - they certainly show no evidence of problems outside of this authd event.

@adombeck adombeck added bug Something isn't working jira labels Sep 2, 2024
@adombeck
Copy link
Contributor Author

adombeck commented Sep 2, 2024

I did some initial analysis. The error originates here:

// If a group with the same GID exists, we need to ensure that it's the same group or fail the update otherwise.
if existingGroup.Name != "" && existingGroup.Name != groupContent.Name {
slog.Error(fmt.Sprintf("GID for group %q already in use by group %q", groupContent.Name, existingGroup.Name))
return fmt.Errorf("GID for group %q already in use by a different group", groupContent.Name)
}

We set the GID here:

gidv := users.GenerateID(g.UGID)

and here:

defaultGroup := []users.GroupInfo{{Name: uInfo.Name, GID: &uInfo.UID}}

Both of these GIDs are generated by GenerateID as a number between 65536 and 100000. There is no check that the generated GID is not already in use, so it is to be expected that there will sometimes be a conflict. We could fix this by checking that the generated GID is not in use.

I wonder why we store the GID in our database at all. Wouldn't it be enough to store the group name and use addgroup groupadd to add the group to /etc/group and let that pick a GID which isn't in use already?

@didrocks
Copy link
Member

didrocks commented Sep 2, 2024

There is no check that the generated GID is not already in use

Are you sure? The logs GID for group "group1" already in use by group "group2" is exactly this check, where it can be done without any race (meaning, when we insert it to the database). This is the error that is reported, to avoid owerwriting a group with another one.

I wonder why we store the GID in our database at all. Wouldn't it be enough to store the group name and use addgroup groupadd to add the group to /etc/group and let that pick a GID which isn't in use already?

So, adduser and addgroup are used for local users (storing in /etc/passwd, /etc/shadow and /etc/group). Then, they can delegate with some patches to other processes to insert inside their database. However, for all "external directories" like ldap (with samba bindings) or sssd for AD for instance, they all own and store their own info as we do. Then, those components (as authd does) answers to nss requests for their respective users and groups.

Also, relying on adduser/addgroup for storing an user and group won’t solve one of the use case for directory handling: shared ressources. If you have a disk shared with nfs for instance, you need to map your uid/gid to the ones on disk for access permissions. This is why you need predictable UIDs and GIDs, and some that are shared across multiples machines and not generated per machines. (In a previous implementation, we were just adding one increment after another in case of collision, but we decided against it in this one and wanted to see if the collision rate was high).

Also, we need to predict the UID even before it’s created on disk for ssh first user access, which is doing a pre-flight even before authentication. Returning a different UID will then puzzle sshd and result in various errors on first login.

This is for those reasons we didn’t went for +increments right away.

Out of curiosity. can you share the conflicting ID groups name? We can probably work the generation algorithm to ensure we end up with less conflicts.

@adombeck
Copy link
Contributor Author

adombeck commented Sep 2, 2024

Are you sure? The logs GID for group "group1" already in use by group "group2" is exactly this check, where it can be done without any race (meaning, when we insert it to the database). This is the error that is reported, to avoid owerwriting a group with another one.

What I meant is that GenerateID (or the function calling it) doesn't check if the generated ID is already in use on the system, in which case it should generate another one. The current check results in an unsuccessful login for reasons not caused by the user. We should be able to avoid that.

aSo, adduser and addgroup are used for local users (storing in ´/etc/passwd, /etc/shadowand/etc/group`). Then, they can delegate with some patches to other processes to insert inside their database. However, for all "external directories" like ldap (with samba bindings) or sssd for AD for instance, they all own and store their own info as we do. Then, those components (as authd does) answers to nss requests for their respective users and groups.

But don't we add the group to /etc/group here?

if err := runGPasswd(opts.gpasswdCmd[0], args...); err != nil {

Also, relying on adduser/addgroup for storing an user and group won’t solve one of the use case for directory handling: shared ressources. If you have a disk shared with nfs for instance, you need to map your uid/gid to the ones on disk for access permissions. This is why you need predictable UIDs and GIDs, and some that are shared across multiples machines and not generated per machines.

How does the current implementation get us predictable GIDs across multiple machines? IIUC, we generate the GIDs per authd instance, and we don't support using an authd instance from a different machine. What am I missing?

Also, we need to predict the UID even before it’s created on disk for ssh first user access, which is doing a pre-flight even before authentication. Returning a different UID will then puzzle sshd and result in various errors on first login.

What do you mean with "pre-flight even before authentication"?

Out of curiosity. can you share the conflicting ID groups name? We can probably work the generation algorithm to ensure we end up with less conflicts.

I couldn't reproduce the error myself, it was reported on #496.

@didrocks
Copy link
Member

didrocks commented Sep 2, 2024

But don't we add the group to /etc/group here?

No, we do add a remote user to a local groups, this is group already handled by the local group utils. There is an option to request nss to still go through groups membership even if it found one matching, but this option is not enabled on Ubuntu by default and it’s something we should explore (but we shouldn’t do that without the distro support).

How does the current implementation get us predictable GIDs across multiple machines? IIUC, we generate the GIDs per authd instance, and we don't support using an authd instance from a different machine. What am I missing?

The implementation is computing the GIDs only based on the string passed to it: https://github.com/ubuntu/authd/blob/main/internal/users/manager.go#L418. There is nothing random. The proof is that the tests even assert the generated id, despite CI running on different machines.

What do you mean with "pre-flight even before authentication"?

sshd needs to ensure an user potentially exists before going on with authentication. Otherwise, it raised a mock authentication (so that the remote attacker doesn’t know if the user exists or not in the machine) with an unmatchable password.

To do this pre-check, it does an NSS request even before the pam authenticaiton starts and expect a real passwd entry for the user, with the UID and user name matching the one that will be returned later, after authentication.

However, on first login, we don’t have such user information (we even don't know if the user exists). This is why we thus needs a predictable ID so that, if later authentication succeed, we can then return a matching user information.

@adombeck
Copy link
Contributor Author

adombeck commented Sep 2, 2024

The implementation is computing the GIDs only based on the string passed to it: https://github.com/ubuntu/authd/blob/main/internal/users/manager.go#L418. There is nothing random.

Ah, I didn't actually read the implementation and assumed it's generating a random ID 😅

This approach will inherently lead to UID/GID conflicts though. I don't see a way to both have a predictable UID/GID and avoid conflicts, so we should weigh up the benefits of a predictable UID/GID (the shared disk use case you mentioned - is there anything else?) and the drawbacks (unsuccessful logins).

@didrocks
Copy link
Member

didrocks commented Sep 2, 2024

I don't see a way to both have a predictable UID/GID and avoid conflicts, so we should weigh up the benefits of a predictable UID/GID (the shared disk use case you mentioned - is there anything else?) and the drawbacks (unsuccessful logins).

Indeed, there is none. (share disk and ssh are the 2 reasons). I think though that maybe we can come up with an algorithm that would lead to less collisions? I wouldn’t have expected to see one so fast and that worries me, so there is probably better possible implementation (which will require a transition plan).

@adombeck
Copy link
Contributor Author

adombeck commented Sep 2, 2024

I'm not sure I correctly understood the SSH case, but the way I understand it, we should be able to solve it by generating a random UID ourselves until we generated one that doesn't exist yet on the system, no? That's still prone to a race, because the UID could have been taken by the time the user successfully authenticated and we actually try to create the user with that UID on the system, but that should be a rare enough case that it's fine to fail with a descriptive error message if that happens.

I think though that maybe we can come up with an algorithm that would lead to less collisions?

Given the small range of IDs we generate, I don't think there is any way to avoid collisions with an acceptable probability.

I wouldn’t have expected to see one so fast and that worries me

The fact that a surprisingly low number of elements from a relatively large set are needed to get a high probability of collisions is known as the birthday paradox.

@3v1n0
Copy link
Collaborator

3v1n0 commented Sep 2, 2024

That's still prone to a race, because the UID could have been taken by the time the user successfully authenticated

Yeah, I feel that's a potential issue, but indeed we could potentially hold that UID until done, so that it's returned by nss.

@martinKacenga
Copy link

Experiencing the same problem. The logs:
Sep 09 15:41:44 ***** authd[1548]: 2024/09/09 15:41:44 ERROR GID for group "advascope users sg" already in use by group "všichni interní sg" Sep 09 15:41:44 ***** authd[1548]: 2024/09/09 15:41:44 WARN can't check authentication: failed to update user "******@*****.**": GID for group "advascope users sg" already in use by a different group

@jibel jibel added the high High importance issue label Sep 17, 2024
@adombeck
Copy link
Contributor Author

There is no check that the generated GID is not already in use

Are you sure? The logs GID for group "group1" already in use by group "group2" is exactly this check, where it can be done without any race (meaning, when we insert it to the database). This is the error that is reported, to avoid owerwriting a group with another one.

After taking a closer look, I don't think this check is enough to avoid UID/GID clashes, because we only check if the UID/GID is already stored in our database, but we don't check if it's in use by any other NSS sources on the system, right?

@mazer-ai
Copy link

mazer-ai commented Oct 9, 2024

Am understanding this conversation correctly to mean that different linux cloud instances might end up with different assigned UIDs and GIDs for the same verified AAD user? If so, what happens when the home dir lives on an NFS mounted file system, where the ACLs are based on the UID and GID? Couldn't the user end up not owning their own home directory?

@adombeck
Copy link
Contributor Author

adombeck commented Oct 9, 2024

Am understanding this conversation correctly to mean that different linux cloud instances might end up with different assigned UIDs and GIDs for the same verified AAD user? If so, what happens when the home dir lives on an NFS mounted file system, where the ACLs are based on the UID and GID? Couldn't the user end up not owning their own home directory?

Yes, that's correct. We're exploring ways to support shared network resources like NFS which rely on UIDs and GIDs for access, but for now that's a known limitation. We plan to add that to the limitations section of the authd wiki.

@mazer-ai
Copy link

Thanks -- that's what I was afraid of... Any suggestions or advice on temporary work arounds? This is a killer for trying to use authd to facilitate centralized logins with a bunch of azure linux instances. Only solution we've come up with so far is including a list of manually maintained adduser commands in the packer setup when building the VM images. But this is not really sustainable and requires a rebuild each time a new user gets added...

adombeck added a commit that referenced this issue Oct 10, 2024
Removing users from the database allows other users for which the same
UID is generated (which can happen, see #509) to log in and gain access to
the deleted user’s home directory.

UDENG-4658
@jhaar
Copy link

jhaar commented Oct 11, 2024

if authd randomly generates UIDs & GIDs, why not get it to instead have some function that takes fully qualified AzureAD usernames and groups as inputs, and generates UIDs/GIDs from their (numeric) hash? That way all authd's instances will always generate the same UIDs/GIDs without needing some central database.

For security/privacy reasons, an improvement would be for each org to set an "org_seed" value - which gets prepended to the username/groupname - thus generating the same UID/GID across all installs with the same "org_seed" - but being undiscoverable by anyone else? Can't think of an attack vector - but that just means I don't have the imagination ;-)

@adombeck
Copy link
Contributor Author

why not get it to instead have some function that takes fully qualified AzureAD usernames and groups as inputs, and generates UIDs/GIDs from their (numeric) hash? That way all authd's instances will always generate the same UIDs/GIDs without needing some central database.

That's what authd is currently doing (see the implementation of the generateID function), but it leads to the issue which this GitHub issue is about: It leads to collisions, because UIDs and GIDs can only be in the range 1 to 2**32, which is too small. Currently, when authd notices a UID/GID collision, it aborts the login attempt. That's clearly not ideal, so we plan to change that in one of the next releases. It's not clear yet how we will change it. One approach is to generate random IDs until we generated one that's unique, another approach we're exploring is to store the IDs in the Entra directory.

@mazer-ai
Copy link

I think random IDs are going to be problematic, even if the computation is deterministic. Say Sally and Bob both hash to the same UID, then on the same machine, who gets assigned the next random unique UID will vary depending on the login order, and on two different machines, both could potentially get the same UID assigned since the machines don't communicate that a UID has already been taken...

I think you'd have to scan the entire list of users and generate a complete list of UIDs in order of account creation to be sure that this was completely deterministic and UIDs don't ever change once the account is created.. I guess if you access that data, you could also just assign UIDs to everyone starting at some fixed number based on the order in which users were added to the org...

If possible, seems like storing a UID in the Entra directory, if possible (don't know enough about what's possible with AAD), it would be safer. Then it's on the org to avoid collisions when adding new users. Guess you'd have to do the same for groups...

@bill-taut
Copy link

Agree with @mazer-ai. If you resolve collisions the way you describe, @adombeck, then the same user (or group, re the topic of this ticket) on different machines could have different IDs, which will lead to permission inconsistencies on shared network-mounted drives. Storing IDs in Entra could work.

@adombeck
Copy link
Contributor Author

Agree with @mazer-ai. If you resolve collisions the way you describe, @adombeck, then the same user (or group, re the topic of this ticket) on different machines could have different IDs, which will lead to permission inconsistencies on shared network-mounted drives.

We're aware of that. That's why I wrote above that we don't support shared network resources like NFS at the moment. It's now also mentioned in the limitations section of the wiki.

@bill-taut
Copy link

Agree with @mazer-ai. If you resolve collisions the way you describe, @adombeck, then the same user (or group, re the topic of this ticket) on different machines could have different IDs, which will lead to permission inconsistencies on shared network-mounted drives.

We're aware of that. That's why I wrote above that we don't support shared network resources like NFS at the moment. It's now also mentioned in the limitations section of the wiki.

You had said "another approach we're exploring is to store the IDs in the Entra directory." That would support shared network resources. Is this still something you're considering? It's what I was hoping to encourage with my comment. It would be very helpful.

@adombeck
Copy link
Contributor Author

Thank you for your input, hearing about actual user needs helps us prioritize! The outcome of the research regarding storing IDs in the Entra directory is that it would come with some drawbacks. We have not yet made a decision if we want to support that or not, but will do so within the next few weeks.

@bill-taut
Copy link

That's great. Thanks for your response, @adombeck, and thank you for your work on authd.

@arion-ch
Copy link

arion-ch commented Nov 5, 2024

I'm running into this error after renaming a group in EntraID.

Nov 05 16:31:10 analysis authd[5033]: 2024/11/05 16:31:10 ERROR GID 1499969210 for group "YYYY" already in use by group "XXXX"

The group named XXXX was renamed to YYYY after initially setting up authd and logging in with a user that is a member of the group. Would this be solved somehow by explicitly rebuilding the cache of group names?

@adombeck
Copy link
Contributor Author

adombeck commented Nov 6, 2024

@arion-ch That sounds like an interesting edge case which we probably don't support well currently. Do you mind creating a separate issue for that, including the exact steps to reproduce?

@arion-ch
Copy link

arion-ch commented Nov 6, 2024

@arion-ch That sounds like an interesting edge case which we probably don't support well currently. Do you mind creating a separate issue for that, including the exact steps to reproduce?

Sure, issue #620 created to capture this scenario.

@adombeck
Copy link
Contributor Author

The outcome of the research regarding storing IDs in the Entra directory is that it would come with some drawbacks. We have not yet made a decision if we want to support that or not, but will do so within the next few weeks.

We decided to not store UIDs and GIDs centrally in the Entra directory. Instead, we will generate random UIDs and GIDs and document how to use Samba and NFS + Kerberos with ID mapping (see #653).

The main reasons for this decision are:

  • The Microsoft Graph API does not support any race-free way to use directory extensions. So if we would store the UID/GID in a directory extension, it would be prone to a race which could allow an attacker to get assigned the same UID as another user.
    • Another approach we explored is to require an admin to run a provisioning tool for each new user/group, which would generate and store the UID/GID. That would hurt UX though.
  • Storing the IDs centrally would be provider-specific, so:
    • It would require additional effort to implement for each provider which supports storing IDs in some way.
    • It would not be possible for each OIDC provider, so to fix this issue, we would have to generate random IDs for those providers which don't support central IDs. We prefer that such a central functionality as the UID/GID generation does not behave differently depending on the provider.

@adombeck adombeck linked a pull request Dec 2, 2024 that will close this issue
@bill-taut
Copy link

Thank you for the update.

FYI, the use case we expected and hoped for was that we would create the UIDs and store them in Entra outside of authd. We would be responsible for worrying about and resolving any race conditions, etc. Authd would simply be configured to read those values.

This is disappointing, as supporting a network mounted drive is a requirement for us. We'll have to find an alternative to authd (or keep our fingers crossed for the day when you decide to support this, should we not find a viable alternative).

adombeck added a commit that referenced this issue Dec 11, 2024
To solve the issues with UID/GID generation (see #509 and UDENG-4874),
we decided to generate random unique UIDs/GIDs. That means that using
authd with network filesystems will result in permission issues unless
it's correctly configured to use map IDs. Since there don't seem to be
any good existing resources on how to do that, we document it ourselves.

Closes #653
UDENG-5414
@adombeck
Copy link
Contributor Author

This is disappointing, as supporting a network mounted drive is a requirement for us.

We have now documented how you can use authd with network filesystems (NFS or Samba). The starting point is: https://canonical-authd.readthedocs-hosted.com/en/latest/reference/troubleshooting/#file-ownership-on-shared-network-resources-e-g-nfs-samba

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high High importance issue jira
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants