PodWarden
Guides

Drift Detection

Understanding and resolving cluster drift — when Kubernetes state doesn't match PodWarden's database

What is Cluster Drift?

Cluster drift occurs when the actual state of your Kubernetes cluster diverges from what PodWarden has on record. This can mean resources created outside PodWarden (via kubectl, Helm, or another tool), stale database records pointing to resources that no longer exist, or infrastructure issues like expiring TLS certificates and broken DNS. PodWarden's drift detection checks for these conditions automatically and surfaces them as system messages.

Common Drift Scenarios

"Unmanaged IngressRoute" or "Unmanaged Ingress"

Cause: Someone created a Kubernetes Ingress or Traefik IngressRoute directly (via kubectl, Helm, or another tool) without going through PodWarden.

Resolution options:

  1. Import it — Create a matching ingress rule in PodWarden's Network page to track it
  2. Remove it — Delete the K8s resource if it's no longer needed: kubectl delete ingressroute <name> -n <namespace>
  3. Ignore it — If it's intentionally managed outside PodWarden (e.g., a monitoring stack), the warning can be acknowledged

"Ghost Ingress Rule"

Cause: An ingress rule exists in PodWarden's database marked as active, but the corresponding K8s resource was deleted directly from the cluster.

Resolution: Go to the Network page, find the rule, and either re-deploy it or delete it from PodWarden.

"CrashLoopBackOff"

Cause: A container keeps crashing on startup. Common reasons: missing environment variables, failed database connections, out-of-memory kills, application bugs.

Resolution: Check the pod logs: kubectl logs <pod-name> -n <namespace> --previous. Fix the underlying issue and redeploy.

"TLS Certificate Expiring"

Cause: A TLS certificate is approaching expiry. If using Let's Encrypt via cert-manager, this usually means cert-manager failed to renew.

Resolution:

  1. Check cert-manager logs: kubectl logs -n cert-manager deployment/cert-manager
  2. Check the Certificate resource: kubectl describe certificate <name> -n <namespace>
  3. If DNS-based validation, ensure DNS records are correct
  4. Manual renewal: kubectl delete certificate <name> -n <namespace> to trigger re-issuance

"DNS Mismatch"

Cause: The domain resolves to a different IP than expected. This can happen after changing cloud providers, updating DNS records, or if DDNS stopped updating.

Resolution:

  1. Verify the expected gateway IP in PodWarden's Hosts page
  2. Check your DNS provider — update the A record to point to the correct IP
  3. If using PodWarden DDNS, check Settings → DDNS for errors

"Gateway Traefik Unhealthy"

Cause: Traefik (the reverse proxy) on a gateway host is crashed or failing. All inbound HTTPS traffic through this gateway is affected.

Resolution:

  1. Check Traefik pod status: kubectl get pods -n kube-system | grep traefik
  2. Check logs: kubectl logs -n kube-system deployment/traefik
  3. Common fix: kubectl rollout restart deployment/traefik -n kube-system
  4. If Traefik was patched with ndots:2, verify the patch wasn't reverted

Configuring Drift Detection

Drift detection settings are available in Settings → Drift Detection (the Drift Detection tab in the Settings page).

SettingDefaultDescription
Clean cycles before auto-resolve2Number of consecutive healthy check cycles before a drift alert is automatically cleared. Increase this to reduce noise from transient conditions.
Enabled check categoriesingress, pods, infrastructureWhich categories of drift checks run. Disable a category if it produces false positives in your environment.

Changes take effect on the next drift detection cycle.

See Also