Ask HN: How are you handling non-probabilistic security for LLM agents?

1 points

1 hours ago

3 comments

story

I've been experimenting with autonomous agents that have shell and database access. The standard approach seems to be "put safety guardrails in the system prompt", but that feels like a house of cards honestly. If a model is stochastic, its adherence to security instructions is also stochastic.

I'm looking into building a hard "Action Authorization Boundary" (AAB) that sits outside the agent's context window entirely. The idea is to intecept the tool-call, normalize it into intent against a deterministic YAML policy before execution.

A few questions for those building in this space:

Canonicalization: How do you handle the messiness of LLM tool outputs? If the representation isn't perfectly canonical, the policy bypasses seem trivial.

Stateful Intent: How do you handle sequences that are individually safe but collectively risky? For example, an agent reading a sensitive DB (safe) and then making a POST request to an external API (dangerous exfiltration).

Latency: Does moving the "gate" outside the model-loop add too much overhead for real-time agentic workflows?

I’ve been working on a CAR (Canonical Action Representation) spec to solve this, but I’m curious if I'm overthinking it or if there’s an existing firewall for agents standard I'm missing.