rfc: distributed coordinator #1078

burgerdev · 2024-12-17T10:28:06Z

No description provided.

3u13r · 2024-12-18T10:44:01Z

rfc/009-distributed-coordinator.md

+  bytes MeshCAKey = 5;
+  bytes MeshCACert = 6;


I think we need to explain the security of the HA part a bit more. I think when we allow to directly set the Mesh components of the Coordinator during automatic recovery we loose protection against the Kubernetes admin/workload owner as they can redirect to themself and "provision" a coordinator with the values above.

The simplest case to think about is one Coordinator needing to be recovered and the workload owner answering the recovery call from this coordinator. The next time the user verifies the deployment, it sees a valid chain of manifests and therefore trusts the new MestCACert. This allows the workload owner to man-in-the-middle the TLS connection from the data owner to the application.

While we have excluded this threat model from our current recovery, I think HA and auto-recovery can be implemented while securing against this threat model as well e.g., via only allowing recovery of coordinator that have the same hashes. Of course, this breaks the upgrade process, but I think we have to drop coordinator upgrades anyway in this threat model.

Good catch, thanks!

We should aim to support recovery from heterogeneous coordinators if they are explicitly allowed by the manifest. An upgrading workload owner could first set a manifest including new coordinator policies, then deploy new coordinators, then remove old coordinators, then remove old coordinator policies from the manifest. A data owner would need to verify not only the current manifest, but also the history of allowed coordinators.

I think what we need are the following invariants:

(A) A coordinator with current manifest M only sends key material to pods that have the coordinator role in M.

(B) A coordinator with current manifest M uses only key material that it generated locally or that was received from a pod with the coordinator role in M.

(A) should be covered sufficiently by the current proposal text, and we could modify it to achieve (B) as follows:

Load the existing latest transition from the store, keeping the signature around but not checking it yet.

Fetch the corresponding manifest, but don't set the state yet.

Create a validator from the temporary manifest's reference values.

Connect to the serving coordinator and validate its reference values.

Check that the serving coordinator's policy corresponds to a coordinator role in the temp manifest.

Receive the RecoverResponse.

Verify the signature from (1).

Initialize the state with received seed, keys, certs and the temp manifest.

I think that is a nice summary and solution. Though, I think we could omit the roles from the manifest if we'd need to simplify it, since in the eyes of the data owner all code of all components inside the mesh/deployment must be trusted anyway. I.e., the workload owner should never be able to impersonate any component from the deployment, then all guarantees of shielding the data owner against the workload owner are broken.

This way the verification of the endpoint that recovers the coordinator could be simplified so that the recovering coordinator is first "initialized" by the running coordinator, like any other workload. Then the application logic notices that it was recovered and requests the locally kept keys of the running coordinator. The endpoint that is called here resides behind mTLS client auth.

The big trade off here, I think, is due to the asynchronous/split nature there is more potential that future changes break the security on the other hand I think the two parts are conceptually quite simple.

msanft · 2025-01-03T08:01:10Z

rfc/009-distributed-coordinator.md

+
+## Background
+
+The Contrast Coordinator is a stateful service with a backend storage that can't be shared.


Is the coordinator state non-sensitive?

It does not need to be CC-secure, if that's what you mean.

contrast/rfc/004-recovery.md

Lines 38 to 40 in d4892d3

The list of state transitions needs to be checked for integrity.

Otherwise, an attacker that can manipulate the transition objects can set arbitrary manifests.

Therefore, we sign each state transition with a key derived from the secret seed.

rfc: distributed coordinator

d623dcb

burgerdev added the no changelog PRs not listed in the release notes label Dec 17, 2024

3u13r requested changes Dec 18, 2024

View reviewed changes

msanft reviewed Jan 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc: distributed coordinator #1078

rfc: distributed coordinator #1078

burgerdev commented Dec 17, 2024

3u13r Dec 18, 2024

burgerdev Dec 18, 2024

3u13r Jan 13, 2025

msanft Jan 3, 2025

burgerdev Jan 3, 2025


		## Background

		The Contrast Coordinator is a stateful service with a backend storage that can't be shared.

	The list of state transitions needs to be checked for integrity.
	Otherwise, an attacker that can manipulate the transition objects can set arbitrary manifests.
	Therefore, we sign each state transition with a key derived from the secret seed.

rfc: distributed coordinator #1078

Are you sure you want to change the base?

rfc: distributed coordinator #1078

Conversation

burgerdev commented Dec 17, 2024

3u13r Dec 18, 2024

Choose a reason for hiding this comment

burgerdev Dec 18, 2024

Choose a reason for hiding this comment

3u13r Jan 13, 2025

Choose a reason for hiding this comment

msanft Jan 3, 2025

Choose a reason for hiding this comment

burgerdev Jan 3, 2025

Choose a reason for hiding this comment