System Messages | PodWarden Hub

What you see

URL: /system/messages

The system messages page shows health check results detected by PodWarden's periodic infrastructure audit. PodWarden automatically checks for data inconsistencies between its database and your live infrastructure every 15 minutes.

A bell icon in the top navigation bar shows the number of unread messages. The badge color reflects the highest severity:

Color	Meaning
Red (pulsing)	Critical issue — e.g. cluster unreachable
Red (static)	Error — e.g. node name mismatch preventing canvas visualization
Amber	Warning — e.g. host offline or orphaned
Blue	Informational — e.g. stale assignment

Click the bell to open the system messages page.

Health checks

PodWarden runs these checks automatically:

Node naming

Check	Severity	What it means
Node name mismatch	Error	The K8s node name stored in PodWarden doesn't match the actual node name in your cluster. This prevents deployment-to-node connections from appearing on the canvas.
Node name missing	Warning	A host is assigned to a cluster but has no K8s node name recorded.

Cluster membership

Check	Severity	What it means
Cluster unreachable	Critical	PodWarden cannot connect to the Kubernetes API for this cluster.
Orphaned host	Warning	A host claims to be in a cluster, but the cluster doesn't have a matching node.
Unknown K8s node	Error	A node exists in your K8s cluster that PodWarden doesn't know about.

Deployments

Check	Severity	What it means
Stale assignment	Info	A deployment references a cluster that no longer exists.
Unplaceable deployment	Warning	A deployment's placement targets a node that no host matches.

Hosts

Check	Severity	What it means
Host unreachable	Warning	A host hasn't reported stats in over 30 minutes.

Ingress Drift

Check	Severity	What it means
Unmanaged IngressRoute	Warning	A Traefik IngressRoute exists in your cluster but isn't tracked by PodWarden. Someone created it directly via kubectl or Helm.
Unmanaged Ingress	Warning	A standard Kubernetes Ingress resource exists but isn't managed by PodWarden.
Unmanaged exposed service	Warning	A NodePort or LoadBalancer service exposes traffic outside the cluster without going through PodWarden's ingress management.
Ghost ingress rule	Error	PodWarden has an ingress rule marked as active, but the corresponding Kubernetes resource doesn't exist. The rule may have been deleted directly from the cluster.

Pod Health

Check	Severity	What it means
CrashLoopBackOff	Error (managed) / Warning (unmanaged)	A pod keeps crashing and restarting. Error severity if it belongs to a PodWarden-managed deployment.
Evicted	Error (managed) / Warning (unmanaged)	A pod was evicted, usually due to resource pressure (memory, disk).
ImagePullBackOff	Error (managed) / Warning (unmanaged)	A pod can't pull its container image. Check that the image exists and registry credentials are correct.

Infrastructure

Check	Severity	What it means
TLS cert expiring (K8s)	Warning (14d) / Error (7d) / Critical (3d)	A TLS certificate stored as a Kubernetes secret is approaching expiry.
TLS cert expiring (live)	Warning (14d) / Error (7d)	The live TLS certificate served by a domain is approaching expiry.
DNS mismatch	Error	A domain's DNS resolves to a different IP than the gateway host's address. Traffic may not reach your cluster.
Domain proxied via Cloudflare	Info	The domain's DNS resolves to a Cloudflare proxy IP. PodWarden cannot verify the origin IP from public DNS. This is not a mismatch — it is an expected condition for Cloudflare-proxied domains.
Gateway Traefik unhealthy	Critical	Traefik pods on a gateway host are in a failed state. Inbound traffic to all services on this gateway is likely down.

Longhorn storage

These checks require Longhorn to be installed on the cluster. They compare live Longhorn state against the cluster baseline. All findings with a Doctor button can be remediated in one click.

Check	Severity	What it means	Doctor
Longhorn setting drift	Warning	A Longhorn global setting differs from the effective baseline for this cluster (global default or per-cluster override).	Yes — patches the setting to the baseline value
Longhorn volume degraded	Error	A Longhorn volume's robustness is `degraded` — one or more replicas are missing, faulty, or rebuilding.	Yes — initiates replica rebuild (blocked if cluster cannot satisfy replica count)
Orphaned Longhorn replica	Warning	A `replica.longhorn.io` resource references a volume that no longer exists. The replica was not garbage-collected when the volume was deleted.	Yes — deletes the orphaned replica resource

Storage

Check	Severity	What it means	Doctor
Stuck PVC	Error	A PersistentVolumeClaim has been in `Pending` state for over 30 minutes (provisioning failed or storage class missing) or in `Released` state for over 30 minutes (underlying PV not reclaimed).	Yes — deletes the stuck PVC to allow re-provisioning, or releases the retained PV

Cluster baseline

Check	Severity	What it means	Doctor
k3s argument drift	Warning	A k3s server argument on a control-plane node differs from the effective baseline. PodWarden reads `/etc/rancher/k3s/config.yaml` via SSH to detect this.	Yes — writes the correct value to the config file and restarts k3s (brief disruption)
Baseline namespace missing	Error	A namespace required by the cluster baseline does not exist on the cluster.	Yes — creates the namespace with the required labels

High-availability control plane

Check	Severity	What it means	Doctor
Control-plane promotion failed	Error	A worker→control-plane promotion attempt failed. The host is in a failed provisioning state and the cluster control plane is incomplete.	Yes — three actions: Retry (re-run the bootstrap playbook), Revert (uninstall CP role, reinstall as worker), or Abandon (DB-only: return host to free pool)
Control-plane demotion failed	Error	A control-plane→worker demotion attempt failed mid-pipeline (drain, uninstall, etcd removal, or DB cleanup). The cluster control plane may be in an inconsistent state.	Yes — three actions: Retry demotion (re-run the full pipeline), Force complete (skip on-host work; DB and rejoin only), or Abandon (DB-only: clear the alert; you handle the host manually)

Doctor — one-click remediation

Many findings include a Doctor button. Clicking it opens a modal that shows exactly what change PodWarden will make (a diff table with before/after values), any side effects, and an Apply button. Nothing changes until you explicitly click Apply.

Doctor requires the operator role. See Doctor for the full workflow including conflict handling, audit links, and the list of available recipes.

Using the page

Filtering

Severity chips at the top toggle which severity levels are shown
Category dropdown filters by check category (node naming, cluster, deployments, hosts, ingress drift, pod health, infrastructure, system apps)
Status toggle switches between active issues, resolved issues, or all
Click a message row to expand and see full details and diagnostic data

Object links

When you expand a message, clickable links appear that take you directly to the affected object's detail page. For example, a "Host unreachable" message links to that host's detail page, and a "DNS mismatch" message links to the Network page. Links are shown for hosts, clusters, deployments, and network resources when the relevant object exists.

Actions

Mark as read — dismisses the notification badge for you (other users still see it as unread)
Mark all read — marks all active messages as read
Run check (admin only) — triggers an immediate health check instead of waiting for the next cycle
Delete (admin only) — permanently removes a message

Suppressing messages

Some alerts are expected and intentional — for example, an "Unmanaged IngressRoute" for a route you created directly via kubectl, or an "Unknown K8s node" for a node you provisioned outside of PodWarden. These are not bugs; they are known deviations you have chosen to accept. Suppressing them keeps your message list focused on issues that actually need attention.

To suppress a message, click the eye-slash icon (Suppress button) on any message row. The message is immediately hidden from the main list and will not trigger email notifications.

To view suppressed messages, enable the "Show suppressed" toggle in the page header. Suppressed messages reappear dimmed with a "Suppressed" badge so they are visually distinct from active issues.

To unsuppress a message, enable the "Show suppressed" toggle, then click the eye icon (Unsuppress button) on the message you want to restore. The message returns to the normal active list.

Suppression is permanent until you explicitly unsuppress — health check cycles do not clear or reset suppression state. Suppressed messages are also excluded from the notification digest, so re-enabling email notifications will not retroactively send alerts for them.

Auto-resolution

When a previously detected issue is no longer found during a health check (e.g., you fixed a node name mismatch), the message is automatically marked as resolved. Resolved messages are cleaned up after 7 days.

When a previously detected issue disappears, PodWarden doesn't resolve it immediately. Instead, it waits for multiple consecutive clean check cycles (default: 2, configurable in Settings) before marking it resolved. This prevents noisy alerts from pods that briefly recover before crashing again.

Drift Detection Dashboard

URL: /apps/drift-detection

PodWarden provides a dedicated dashboard for drift detection issues, accessible from the navigation menu. The dashboard shows:

Summary cards — count of active issues by severity (critical, error, warning, info)
Issues table — all active drift issues with severity, category, title, and time since detection
Action buttons:
- Run Check Now — trigger an immediate health check without waiting for the next cycle
- Clear Resolved — delete all resolved drift messages

Each issue in the table links to the relevant PodWarden page where you can investigate and fix it.

Configuration

Drift detection settings are in Settings → Drift Detection:

Setting	Default	Description
Clean cycles before auto-resolve	2	How many consecutive clean check cycles before an issue is automatically resolved. Higher values reduce false resolutions but delay cleanup.
Enabled categories	All enabled	Toggle which drift categories run: ingress, pods, infrastructure. Disabling a category stops its checks but preserves existing messages.

Email notifications

PodWarden can send email alerts when new infrastructure issues are detected. Configure this in Settings → System Config → Notifications.

Setting	Description
Enable email notifications	Master toggle
Recipients	Comma-separated list of email addresses
Minimum severity	Only issues at this severity or above trigger an email

Emails are sent as a digest — one email per health check cycle containing all newly detected issues. Previously known issues do not re-trigger emails. If an issue resolves and later reappears, it is treated as new.

Email delivery uses the SMTP settings configured in the same Settings page. SMTP must be configured and working for email notifications to function.