-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rechecking pending Pods #195
base: master
Are you sure you want to change the base?
Rechecking pending Pods #195
Conversation
d8e231e
to
1edaca7
Compare
@dougbtv and @maiqueb Can you take a look. I did verify that #162 did not re-break:
As you can see, Pod web-1 was missing IP annotation and in Pending phase. I refreshed the Pod, this time it had the IP annotation the first time around, and found a match. So no bad cleanup happened. |
Ok, I verified that it also fixes #182 Shut down a node with a STS and network with full IPs. Force deleted the Pod, so it gets rescheduled. It remains in ContainerCreating state with "Could not allocate IP in range: ip: 10.128.165.32 / - 10.128.165.33" error Then the reconciler runs. It cleans up the stale IP (belonging to the Pod I force deleted from the dead node), despite the new replica with same podRef being in ContainerCreating state aka Pending phase. Now the new replica can create and moves to Running seen in ip-reconciler logs:
verified the IP Pools:
vs new one:
So this fixes both new issue #182, without re-breaking the fix for #162 (skip pods in Pending phase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this patch; while it seems I'm making plenty of comments, I'm very happy with the quality of the code.
I wonder about the impacts of this retry on the code - especially since the retries are nearly immediate (no backoff is used).
I honestly don't expect this scenario to happen (famous last words ...), but if the cronjob is triggered and there are plenty of pending pods (and staying in pending state for long), we will overload the API.
pkg/reconciler/iploop.go
Outdated
livePodIPs) | ||
_, isFound := livePodIPs[ip] | ||
return isFound || livePod.phase == v1.PodPending | ||
isFound := checkIpOnPod(&livePod, podRef, ip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, I'd rename the method - not 100% sure what to.
It might be personal, but I'm very used to prefixing methods that return booleans with is
. What do you thing about isIPonPod(...)
?
return isFound || livePod.phase == v1.PodPending | ||
isFound := checkIpOnPod(&livePod, podRef, ip) | ||
|
||
if !isFound && (livePod.phase == v1.PodPending) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could make sense to export this into an helper func.
pkg/reconciler/iploop.go
Outdated
* so re-fetch the Pod 5x | ||
*/ | ||
var podToMatch *podWrapper = &livePod | ||
var retries int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't 0 the default ?
nit: I personally would prefer if the variables were initialized using :=
. Just more used to it.
pkg/reconciler/iploop.go
Outdated
for retries < storage.PodRefreshRetries { | ||
retries += 1 | ||
podToMatch := rl.refreshPod(livePodRef) | ||
if podToMatch == nil { | ||
logging.Debugf("Cleaning up...") | ||
return false | ||
} else if podToMatch.phase != v1.PodPending { | ||
logging.Debugf("Pending Pod is now in phase: %s", podToMatch.phase) | ||
break | ||
} else { | ||
isFound = checkIpOnPod(podToMatch, podRef, ip) | ||
// Short-circuit - Pending Pod may have IP now | ||
if isFound { | ||
logging.Debugf("Pod now has IP annotation while in Pending") | ||
return true | ||
} | ||
} | ||
} | ||
isFound = checkIpOnPod(podToMatch, podRef, ip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to use exponential backoff here ?
Also, I think this could be re-written with k8s primitives:
import "k8s.io/apimachinery/pkg/util/wait"
timeout := time.Second
pollInterval := 200*time.Millisecond // 5 * 0.2 = 1
// PollImmediate tries a condition func until it returns true, an error, or the timeout
// is reached.
// ...
waitErr := wait.PollImmediate(pollInterval, timeout, func() (bool, error) {
pod := rl.refreshPod(livePodRef)
if podToMatch == nil {
logging.Debugf("Cleaning up...")
return false, fmt.Errorf("pod is gone")
} else if isPodPending(pod) {
logging.Debugf("Pending Pod is now in phase: %s", podToMatch.phase)
return true, nil
} else {
return isIPonPod(pod, podRef, ip), nil // I'm not entirely sure I did not oversimplify this ...
}
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am hesitant to use a backoff because remember that for #182 when the IP Pool is full, we WANT it to reconcile. In fact, we dont need any retries in that case... because the container will never come up otherwise. It is in Pending state, does not have an IP annotation either, and will match the podRef.
So adding more retries or backoff would just add load or delay the reconciler (and thus blocked Pod) longer.
Any suggestions on how to distinguish between these two scenarios? Or should we just retry once
pkg/reconciler/iploop.go
Outdated
var namespace string | ||
var podName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't see a reason for declaring the variables here, you can just :=
below.
Or even - once you know len(namespacedName) == 2
- return directly: return namespacedName[0], namespacedName[1]
.
not sure what I wrote is easy to understand, I mean something like:
func splitPodRef(podRef string) (string, string) {
namespacedName := strings.Split(podRef, "/")
if len(namespacedName) != 2 {
logging.Errorf("Failed to split podRef %s", podRef)
return "", ""
}
return namespacedName[0], namespacedName[1]
}
pkg/storage/storage.go
Outdated
@@ -13,7 +13,8 @@ var ( | |||
RequestTimeout = 10 * time.Second | |||
|
|||
// DatastoreRetries defines how many retries are attempted when updating the Pool | |||
DatastoreRetries = 100 | |||
DatastoreRetries = 100 | |||
PodRefreshRetries = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might not be needed if we end up using wait.PollImmediate
or wait.ExponentialBackoff
.
pod, err := i.clientSet.CoreV1().Pods(namespace).Get(context.TODO(), name, metav1.GetOptions{}) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
return pod, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, maybe I'm missing something, but why don't you just return i.clientSet.CoreV1().Pods(namespace).Get(context.TODO(), name, metav1.GetOptions{})
?
@maiqueb I'm just going to remove all retries altogether. No backoff. That means more uncessary API calls and longer time to reconcile for #182 . In that case the Pod is Pending, has no IP annotation, and matches the podRef... but it will never come up until the old IP is reconciled since IP pool is full. The retry is only for #162 which is a rarer bug to hit (ip-reconciler has to run at the same time as Pods coming up, and Pending Pod has no IP annotation YET - as it is still in process of coming up). So just refetching the individual Pod once should be good enough, as it will make an uncommon scenario even more rare. I tried a few times and when it is hit, the Pod always has the IP on the first GET call. |
1edaca7
to
72eb8ea
Compare
72eb8ea
to
69b4616
Compare
|
||
for retries < storage.PodRefreshRetries { | ||
retries += 1 | ||
podToMatch = rl.refreshPod(livePodRef) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to instead of retrying now just put the pods in pending state to a list, and after the main loop (that handles all pods that are not pending) iterate all the ones that were in pending state ?
I am unsure if I am overthinking this.
Hey @xagent003 any chance we could see some tests that show this in action? With just one removed test, I'm not quite sure on it. Also, maybe some improvements to the reconciliation controller in the meanwhile. |
This fixes #182
while hopefully not re-breaking #162
We shouldn't treat all pending Pods as "alive" and skip the check. The list of all Pods fetch'd earlier may be stale, and as observed in some scenarios, several seconds before the ip-reconciler does the isPodAlive check.
Instead, can we retry a Get on an individual Pod, with the hopes that it has final IP/network annoations? So refetch the Pod, up to 5 times, if it is Pending state and initial IP check fails. After that, just do the IP matching check like before