ClusterMind deploys an AI agent inside your cluster that proactively detects issues, diagnoses root causes, and sends fix PRs — all surfaced in Slack.
Born from Production
Our diagnostic engine wasn't designed in a vacuum. Each of the 34+ checks was born from a real production incident on real clusters.
A node went NotReady and nobody noticed for 17 hours. The PostgreSQL StatefulSet pod was stuck on the dead node the entire time — because StatefulSet pods don't auto-reschedule like Deployments.
Three nodes cordoned during an upgrade. Cluster showed 99% CPU request utilization — triggering panic. Actual usage? 16%. ClusterMind distinguishes requests from real usage.
A Longhorn volume entered an infinite attach/detach loop, reaching generation count 991. Each cycle risked data corruption. ClusterMind flags volumes with generation >50.
A backup restore accidentally set a Longhorn instance-manager to 12 CPU cores on a single node, silently blocking all volume attachments. Generic monitoring missed it entirely.
How It Works
Deploy once, diagnose forever. No agents phoning home with your data.
$ helm install clustermind
A lightweight StatefulSet with Redis sidecar deploys inside your cluster. Your API key, your infrastructure. Data never leaves.
34+ checks / every 6h
AI runs comprehensive diagnostics: nodes, pods, storage, certificates, ArgoCD sync, AlertManager alerts. Four severity tiers, zero false positives on healthy clusters.
#incidents → Slack
Alerts land in Slack with root cause analysis and remediation steps. Ask follow-up questions in threads. Or let ClusterMind send a fix PR to your GitOps repo.
Capabilities
Not another dashboard. An AI operations engineer that lives in your cluster and reports through Slack.
34+ checks across 4 severity tiers. Nodes, pods, Longhorn storage, certificates, ArgoCD, AlertManager. Runs every 6 hours, configurable.
~$0.03 per diagnostic run
Alerts with severity, root cause, and kubectl commands. Ask follow-up questions in threads. No context switching to dashboards or terminals.
Response in < 30s
Connects to your GitHub repos. Creates fix branches, opens PRs, monitors CI, watches ArgoCD rollouts. Multi-repo support out of the box.
PRs with full CI validation
Two-bot RBAC architecture. Read-only by default. Dangerous operations require human approval via dashboard. Protected namespaces can't be touched.
kube-system, argocd, cert-manager protected
Bring Your Own Anthropic API Key. The agent runs inside your cluster. We receive only minimal metadata. Your secrets, logs, and data never leave your infrastructure.
You pay Anthropic directly — ~$5/mo
Track AI costs per diagnostic run. See estimated downtime avoided. Shadow invoice shows what you'd pay on Enterprise. Prometheus metrics for everything.
Shadow invoice: $4,653/mo value for $0
Pricing
Start free during Friends & Family. We show you exactly what you're saving so you can decide if it's worth it.
BYOK — $49-$149 per successful fix
Discounted per-fix pricing
100 free fixes + volume discounts
FAQ
get, describe, logs, and top — but cannot modify anything. Write operations (scale, delete, drain) require a separate privileged bot that only executes after human approval via the dashboard.
kubectl delete namespace, kubectl delete --all, any operation on protected namespaces (kube-system, argocd, cert-manager, longhorn-system, monitoring), and shell injection patterns. These cannot be bypassed, even with approval.
kubectl, ClusterMind works.
helm install command. The first diagnostic runs immediately.
Get Started
Join the Friends & Family preview. Full product, zero cost. We'll show you the value before we ever charge.