MCP Colors: Systematically deal with prompt injection risk
Prompt injection is annoying enough that most (all??) apps so far are mostly just ignoring that it exists and hoping a solution will come along before their customer base grows enough to actually care about security. There are answers!
But first! Breathe deeply and repeat after me: “it’s impossible to reliably detect prompt injection attacks, and it probably always will be”. Breathe deeply again, and accept this. Good, now we’re ready to move on.
How do we make a secure agent?
Simon Wilison has been the leading voice here, with his initial Lethal Trifecta and recently aggregating some papers that build on it. In these ideas, there’s a Venn diagram with 3 circles:

The more recent paper broadened Simon’s “Ability to communicate externally” (i.e. exfiltrate) to include anything that changes state.
MCP Colors 101
In my work, I’ve decided that Simon’s diagram can...