Kubernetes needs a real --force
For ten million years developers have created and used tools with --force options. Kubernetes message is clear: those days are over.
You can really learn a lot watching Claude agentically debug problems within Kubernetes. Some technologies are very hard to debug. Kubernetes is one of them. I consider myself senior level, and figuring out why things don't get removed, deleted, or terminated has always been difficult for me. I don’t understand what design decision was so poor that this is the way it had to be, but Claude can be a useful tool at solving these problems. Claude gets that pattern when something does NOT get removed or deleted
Audit the entire server to figure out what’s the hold up
Guess about what could be a dependency
Try to delete the thing
If that fails, remove associated “finalizers” and try again or restart the process
if it succeeds you’re done.
Removing a chart dependency on cert-manager
I had a declared cert-manager
as an optional dependency in a helm chart. This worked well when I was just running everything as one app on Kubernetes within that chart, but it apparently violated some conventions with Kubernetes. I went to migrate cert-manager
out of the chart and I couldn’t.
My first step was to simply uninstall the chart, and delete the namespace. Ideally, this would be facilitated by helm with a `—purge` command. But everything in Kubernetes is more difficult. Uninstalling the chart and deleting the namespace didn’t work though because the dependency installed crds: a crd is a custom resource. They’re not installed in the namespace, but they’re “maintained by Helm.” Those crds weren’t cleaned up. Cleaning them up is documented, but it’s cute watching agentic AI figure this out the hard way.
● Bash(kubectl get crd certificaterequests.cert-manager.io -o yaml | grep -A5 'metadata:' | grep -E …
'labels:|annotations:|name:' | head -10)
⎿ annotations:
meta.helm.sh/release-name: ACME
f:annotations:
f:meta.helm.sh/release-name: {}
● The issue is clear now. The CRDs are owned by the "ACME" Helm release, which is causing the conflict when you try to install cert-manager in its own namespace.
In fact, one of the ways Claude does this is simple.
kubectl get crd | grep cert-manager
kubectl delete crd certificaterequests.cert-manager.io certificates.cert-manager.io challenges.acme.cert-manager.io clusterissuers.cert-manager.io issuers.cert-manager.io orders.acme.cert-manager.io --ignore-not-found
That command just hangs… this is where I would be in over my head. Claude tries this next
kubectl get challenges.acme.cert-manager.io --all-namespaces
At this point, I’ll be honest. I don’t even know what this command does. News to me, you can get a challenge to a cert and it’s represented as a crd. From here, it knows it needs to delete the challenge,
kubectl delete challenges.acme.cert-manager.io -n coworkunion coworkunion-prometheus-tls-cert-1-1841491659-3150586538
Will that work? Of course not, because it’s Kubernetes, and it’s too stupid to include a —force
flag. Next up, Claude knows it needs to remove the finalizers. Finalizers are hooks that have to return success before a resource can be cleaned up. How does it do that? Maybe there is a convenient flag like kubectl delete —ignore-finalizers
that would function like —force
but it would be more fine grained? Of course not, it’s Kubernetes. Removing the finalizers on the command line looks like this,
kubectl get crd challenges.acme.cert-manager.io -o json | jq '.metadata.finalizers = null' | kubectl replace --raw /apis/apiextensions.k8s.io/v1/customresourcedefinitions/challenges.acme.cert-manager.io -f -
Of course, if any human wrote code like that on the command line they’d be insane so let’s look at it
kubectl get crd challenges.acme.cert-manager.io -o json
Get everything about the crd as json. This for reference shows a finalizer for
“customresourcecleanup.apiextensions.k8s.io”
which is probably what the documentation is referring to when it says,If the namespace has been marked for deletion without deleting the cert-manager installation first, the namespace may become stuck in a terminating state. This is typically due to the fact that the
APIService
resource still exists however the webhook is no longer running so is no longer reachable. To resolve this, ensure you have run the above commands correctly, and if you're still experiencing issues then run:jq '.metadata.finalizers = null'
Remove just the finalizers value from the metadata, effectively deleting the hook. Return the whole documented as JSON.
kubectl replace --raw /apis/apiextensions.k8s.io/v1/customresourcedefinitions/challenges.acme.cert-manager.io -f -
This is command like patch, but that focuses on the difference.
Why did it write that in such a way? No idea. It’s probably the same as this,
kubectl patch crd challenges.acme.cert-manager.io -p '{"metadata":{"finalizers":null}}' --type=merge
Which is what I did after I got sick of watching it show off. That removed the finalizer which made possible to delete the last crd.
But you can’t deny that it’s amusing and entertaining see how it solves these problems without reading the docs.
Addendum for the tribal
Just want to reiterate,
I knew about the docs before I ran the LLM. This was only to see how it would go about figuring this out as a learning exercise. That note was always in there.
This is not abnormal. Most every piece of software that I’ve used is “caveat emptor”. You can remove an rpm ignoring it’s normally blocking hooks with
rpm -e —noscripts
you can remove a package in Debian that is in an inconsistent state withdpkg --force-remove-reinstreq
where normally dpkg would want to reinstall the package thinking that a reinstallation will allow an uninstallation. But sometimes the package needs to be removed in order for an installation to work, and so it’s there!This bug isn’t rare. It’s common. It’s in the *Frequently* Asked Questions for cert-manager.
This is not about security. It’s about convenience. I can remove something by manually removing finalizers and dependent resources. If I can do it, there should be a command to do what I’m doing automatically. It’s not complex. The argument here is whether or not it should be convenient.
Take care. Hope you enjoyed the adventure.