VARIANT:
Friends & Family Preview

Stop babysitting your Kubernetes clusters.

ClusterMind deploys an AI agent inside your cluster that proactively detects issues, diagnoses root causes, and sends fix PRs — all surfaced in Slack.

clustermind — diagnostic run

Every check has a scar behind it.

Our diagnostic engine wasn't designed in a vacuum. Each of the 34+ checks was born from a real production incident on real clusters.

CRITICAL

The 17-Hour Blind Spot

A node went NotReady and nobody noticed for 17 hours. The PostgreSQL StatefulSet pod was stuck on the dead node the entire time — because StatefulSet pods don't auto-reschedule like Deployments.

Undetected for 17 hours · ClusterMind catches in < 6 min
HIGH

The Phantom CPU Crisis

Three nodes cordoned during an upgrade. Cluster showed 99% CPU request utilization — triggering panic. Actual usage? 16%. ClusterMind distinguishes requests from real usage.

Reported 99% CPU · Reality 16% CPU
CRITICAL

Generation 991

A Longhorn volume entered an infinite attach/detach loop, reaching generation count 991. Each cycle risked data corruption. ClusterMind flags volumes with generation >50.

Volume generation 991 · Alert threshold > 50
HIGH

The 12-Core Ghost

A backup restore accidentally set a Longhorn instance-manager to 12 CPU cores on a single node, silently blocking all volume attachments. Generic monitoring missed it entirely.

CPU request 12,000m · Expected < 500m

Three steps to calmer clusters.

Deploy once, diagnose forever. No agents phoning home with your data.

01

Deploy

$ helm install clustermind

A lightweight StatefulSet with Redis sidecar deploys inside your cluster. Your API key, your infrastructure. Data never leaves.

02

Detect

34+ checks / every 6h

AI runs comprehensive diagnostics: nodes, pods, storage, certificates, ArgoCD sync, AlertManager alerts. Four severity tiers, zero false positives on healthy clusters.

03

Resolve

#incidents → Slack

Alerts land in Slack with root cause analysis and remediation steps. Ask follow-up questions in threads. Or let ClusterMind send a fix PR to your GitOps repo.

Built for the 3am page.

Not another dashboard. An AI operations engineer that lives in your cluster and reports through Slack.

Proactive Diagnostics

34+ checks across 4 severity tiers. Nodes, pods, Longhorn storage, certificates, ArgoCD, AlertManager. Runs every 6 hours, configurable.

~$0.03 per diagnostic run

Slack-Native

Alerts with severity, root cause, and kubectl commands. Ask follow-up questions in threads. No context switching to dashboards or terminals.

Response in < 30s

GitOps Auto-Remediation

Connects to your GitHub repos. Creates fix branches, opens PRs, monitors CI, watches ArgoCD rollouts. Multi-repo support out of the box.

PRs with full CI validation

Human-in-the-Loop Safety

Two-bot RBAC architecture. Read-only by default. Dangerous operations require human approval via dashboard. Protected namespaces can't be touched.

kube-system, argocd, cert-manager protected

BYOK — Your Data Stays

Bring Your Own Anthropic API Key. The agent runs inside your cluster. We receive only minimal metadata. Your secrets, logs, and data never leave your infrastructure.

You pay Anthropic directly — ~$5/mo

Full Cost Visibility

Track AI costs per diagnostic run. See estimated downtime avoided. Shadow invoice shows what you'd pay on Enterprise. Prometheus metrics for everything.

Shadow invoice: $4,653/mo value for $0

See the value before you pay.

Start free during Friends & Family. We show you exactly what you're saving so you can decide if it's worth it.

F&F

Friends & Family preview is live. Full product, zero cost. We'll show you a shadow invoice so you know the value you're getting.

Starter
$0 platform

BYOK — $49-$149 per successful fix

  • Bring your own Anthropic key
  • 34+ diagnostic checks
  • Slack alerts & threads
  • GitOps auto-remediation
  • Approval dashboard
  • Pay only for proven fixes
Join Waitlist
Enterprise
$5,000 /mo

100 free fixes + volume discounts

  • Everything in Growth
  • Dedicated support engineer
  • Custom runbooks
  • SSO / SAML
  • SLA: 2min MTTA, 15min MTTR
  • Compliance audit trail
Contact Sales

Common questions.

Security & Privacy

Nowhere. The agent runs inside your Kubernetes cluster. Your logs, secrets, and cluster state never leave your infrastructure. We receive only minimal metadata (incident counts, heartbeats) for the dashboard. This is architecturally enforced, not just a policy.
Read-only RBAC by default. The agent can get, describe, logs, and top — but cannot modify anything. Write operations (scale, delete, drain) require a separate privileged bot that only executes after human approval via the dashboard.
Hard-blocked operations include: kubectl delete namespace, kubectl delete --all, any operation on protected namespaces (kube-system, argocd, cert-manager, longhorn-system, monitoring), and shell injection patterns. These cannot be bypassed, even with approval.
Your Anthropic API key is stored encrypted in your cluster's Kubernetes Secrets. We never see or handle it. SSH keys for GitOps repositories are encrypted at rest in PostgreSQL. All authentication uses Slack OAuth with JWT RS256 signing.
On the roadmap for Enterprise launch (Q2 2026). The BYOK architecture inherently reduces our compliance surface — since cluster data stays in your infrastructure, our platform handles only authentication and metadata.

General

Kubernetes 1.25 and above. Any distribution — EKS, GKE, AKS, k3s, RKE2, kind, OrbStack. If it runs kubectl, ClusterMind works.
About 5 minutes. Add the Slack app to your workspace, click "Connect Cluster" in the dashboard, run the helm install command. The first diagnostic runs immediately.
About $5/month in Anthropic API costs for a typical cluster running diagnostics every 6 hours. Each run costs ~$0.03 using Claude Sonnet. You pay Anthropic directly via your own API key. During Friends & Family, the ClusterMind platform itself is free.
Yes. Purpose-built checks for each: Longhorn volume health, instance-manager errors, and degraded replicas. ArgoCD sync status and application health. cert-manager certificate expiry. AlertManager firing alerts. These aren't generic — they're informed by real production incidents.
Those tools alert on metric thresholds. ClusterMind reasons about your cluster. It understands that a StatefulSet pod stuck on a NotReady node won't auto-reschedule. It knows that 99% CPU requests with 16% actual usage is a false alarm, not a crisis. And it doesn't just alert — it opens fix PRs against your GitOps repo.

Your cluster shouldn't need a babysitter.

Join the Friends & Family preview. Full product, zero cost. We'll show you the value before we ever charge.

You're on the list.

We'll reach out soon with next steps.

Want a live demo? Book a call →