Today, I asked it to run gt restack (Graphite CLI) and resolve conflicts. The agent resolved the submodule conflict correctly, but then proceeded to run git push --force-with-lease --no-verify without asking for permission - directly violating my rules.
The agent's defense was reasonable ("force push is expected after a rebase"), but that's exactly why I want to be asked first. The whole point of the rule is to maintain human oversight on destructive operations.
I'm curious:
Has anyone else experienced AI agents ignoring explicit safety rules? How are you handling guardrails for potentially destructive operations? Is there a more reliable way to enforce these boundaries?
The irony is that the agent acknowledged the rule violation in its apology, which means it "knew" the rule existed but chose to proceed anyway. This feels like a trust issue that could have much worse consequences in other scenarios.
loading...