PodWarden
User ManualPodWarden

System Messages

Health check alerts and infrastructure consistency notifications

What you see

URL: /system/messages

The system messages page shows health check results detected by PodWarden's periodic infrastructure audit. PodWarden automatically checks for data inconsistencies between its database and your live infrastructure every 15 minutes.

A bell icon in the top navigation bar shows the number of unread messages. The badge color reflects the highest severity:

ColorMeaning
Red (pulsing)Critical issue — e.g. cluster unreachable
Red (static)Error — e.g. node name mismatch preventing canvas visualization
AmberWarning — e.g. host offline or orphaned
BlueInformational — e.g. stale assignment

Click the bell to open the system messages page.

Health checks

PodWarden runs these checks automatically:

Node naming

CheckSeverityWhat it means
Node name mismatchErrorThe K8s node name stored in PodWarden doesn't match the actual node name in your cluster. This prevents deployment-to-node connections from appearing on the canvas.
Node name missingWarningA host is assigned to a cluster but has no K8s node name recorded.

Cluster membership

CheckSeverityWhat it means
Cluster unreachableCriticalPodWarden cannot connect to the Kubernetes API for this cluster.
Orphaned hostWarningA host claims to be in a cluster, but the cluster doesn't have a matching node.
Unknown K8s nodeErrorA node exists in your K8s cluster that PodWarden doesn't know about.

Deployments

CheckSeverityWhat it means
Stale assignmentInfoA deployment references a cluster that no longer exists.
Unplaceable deploymentWarningA deployment's placement targets a node that no host matches.

Hosts

CheckSeverityWhat it means
Host unreachableWarningA host hasn't reported stats in over 30 minutes.
Tailscale hostname driftInfoA Tailscale-discovered host is missing its hostname.

Ingress Drift

CheckSeverityWhat it means
Unmanaged IngressRouteWarningA Traefik IngressRoute exists in your cluster but isn't tracked by PodWarden. Someone created it directly via kubectl or Helm.
Unmanaged IngressWarningA standard Kubernetes Ingress resource exists but isn't managed by PodWarden.
Unmanaged exposed serviceWarningA NodePort or LoadBalancer service exposes traffic outside the cluster without going through PodWarden's ingress management.
Ghost ingress ruleErrorPodWarden has an ingress rule marked as active, but the corresponding Kubernetes resource doesn't exist. The rule may have been deleted directly from the cluster.

Pod Health

CheckSeverityWhat it means
CrashLoopBackOffError (managed) / Warning (unmanaged)A pod keeps crashing and restarting. Error severity if it belongs to a PodWarden-managed deployment.
EvictedError (managed) / Warning (unmanaged)A pod was evicted, usually due to resource pressure (memory, disk).
ImagePullBackOffError (managed) / Warning (unmanaged)A pod can't pull its container image. Check that the image exists and registry credentials are correct.

Infrastructure

CheckSeverityWhat it means
TLS cert expiring (K8s)Warning (14d) / Error (7d) / Critical (3d)A TLS certificate stored as a Kubernetes secret is approaching expiry.
TLS cert expiring (live)Warning (14d) / Error (7d)The live TLS certificate served by a domain is approaching expiry.
DNS mismatchErrorA domain's DNS resolves to a different IP than the gateway host's address. Traffic may not reach your cluster.
Gateway Traefik unhealthyCriticalTraefik pods on a gateway host are in a failed state. Inbound traffic to all services on this gateway is likely down.

Using the page

Filtering

  • Severity chips at the top toggle which severity levels are shown
  • Category dropdown filters by check category (node naming, cluster, deployments, hosts, ingress drift, pod health, infrastructure, system apps)
  • Status toggle switches between active issues, resolved issues, or all
  • Click a message row to expand and see full details and diagnostic data

Object links

When you expand a message, clickable links appear that take you directly to the affected object's detail page. For example, a "Host unreachable" message links to that host's detail page, and a "DNS mismatch" message links to the Network page. Links are shown for hosts, clusters, deployments, and network resources when the relevant object exists.

Actions

  • Mark as read — dismisses the notification badge for you (other users still see it as unread)
  • Mark all read — marks all active messages as read
  • Run check (admin only) — triggers an immediate health check instead of waiting for the next cycle
  • Delete (admin only) — permanently removes a message

Suppressing messages

Some alerts are expected and intentional — for example, an "Unmanaged IngressRoute" for a route you created directly via kubectl, or an "Unknown K8s node" for a node you provisioned outside of PodWarden. These are not bugs; they are known deviations you have chosen to accept. Suppressing them keeps your message list focused on issues that actually need attention.

To suppress a message, click the eye-slash icon (Suppress button) on any message row. The message is immediately hidden from the main list and will not trigger email notifications.

To view suppressed messages, enable the "Show suppressed" toggle in the page header. Suppressed messages reappear dimmed with a "Suppressed" badge so they are visually distinct from active issues.

To unsuppress a message, enable the "Show suppressed" toggle, then click the eye icon (Unsuppress button) on the message you want to restore. The message returns to the normal active list.

Suppression is permanent until you explicitly unsuppress — health check cycles do not clear or reset suppression state. Suppressed messages are also excluded from the notification digest, so re-enabling email notifications will not retroactively send alerts for them.

Auto-resolution

When a previously detected issue is no longer found during a health check (e.g., you fixed a node name mismatch), the message is automatically marked as resolved. Resolved messages are cleaned up after 7 days.

When a previously detected issue disappears, PodWarden doesn't resolve it immediately. Instead, it waits for multiple consecutive clean check cycles (default: 2, configurable in Settings) before marking it resolved. This prevents noisy alerts from pods that briefly recover before crashing again.

Drift Detection Dashboard

URL: /apps/drift-detection

PodWarden provides a dedicated dashboard for drift detection issues, accessible from the navigation menu. The dashboard shows:

  • Summary cards — count of active issues by severity (critical, error, warning, info)
  • Issues table — all active drift issues with severity, category, title, and time since detection
  • Action buttons:
    • Run Check Now — trigger an immediate health check without waiting for the next cycle
    • Clear Resolved — delete all resolved drift messages

Each issue in the table links to the relevant PodWarden page where you can investigate and fix it.

Configuration

Drift detection settings are in Settings → Drift Detection:

SettingDefaultDescription
Clean cycles before auto-resolve2How many consecutive clean check cycles before an issue is automatically resolved. Higher values reduce false resolutions but delay cleanup.
Enabled categoriesAll enabledToggle which drift categories run: ingress, pods, infrastructure. Disabling a category stops its checks but preserves existing messages.

Email notifications

PodWarden can send email alerts when new infrastructure issues are detected. Configure this in Settings → System Config → Notifications.

SettingDescription
Enable email notificationsMaster toggle
RecipientsComma-separated list of email addresses
Minimum severityOnly issues at this severity or above trigger an email

Emails are sent as a digest — one email per health check cycle containing all newly detected issues. Previously known issues do not re-trigger emails. If an issue resolves and later reappears, it is treated as new.

Email delivery uses the SMTP settings configured in the same Settings page. SMTP must be configured and working for email notifications to function.