Prompt injection is annoying enough that most (all??) apps so far are mostly just ignoring that it exists and hoping a solution will come along before their customer base grows enough to actually care about security. There are answers!

But first! Breathe deeply and repeat after me: “it’s impossible to reliably detect prompt injection attacks, and it probably always will be”. Breathe deeply again, and accept this. Good, now we’re ready to move on.

How do we make a secure agent?

Simon Wilison has been the leading voice here, with his initial Lethal Trifecta and recently aggregating some papers that build on it. In these ideas, there’s a Venn diagram with 3 circles:

The more recent paper broadened Simon’s “Ability to communicate externally” (i.e. exfiltrate) to include anything that changes state.

MCP Colors 101

In my work, I’ve decided that Simon’s diagram can...

Latest Post

MCP Colors: Systematically deal with prompt injection risk

MCP Colors 101