Issue
There is a known issue with the admission-webhook controller where it does not properly handle certificate rotation from cert-manager. This can result in pipelines failing in several ways or Jupyter notebooks failing to start. The most commonly seen failures will log 'TLS handshake error' or 'x509: certificate has expired or is not yet valid' errors.
Workaround
The issue occurs after cert-manager generates a new certificate and pushes it to admission-webhook. The admission-webhook pod(s) continue reading the old certificate from memory instead of reading the new certificate from disk. A permanent solution is currently being worked upon. To work around the issue is a relatively simple matter of restarting the admission-webhook pod(s):
kubectl get pods -n kubeflow | grep admission
kubectl delete pod <PODNAME> -n kubeflow
This will delete the existing admission-webhook pod(s) triggering new pods to be deployed. The newly deployed pods will pick up and read the certificate from cert-manager resolving the issue.
Comments
0 comments
Please sign in to leave a comment.