Item 43680550

mlenhard • 7 days ago

One of the biggest issues I see, briefly discussed here, is how one MCP server tool's output can affect other tools later in the same message thread. To prevent this, there really needs to be sandboxing between tools. Invariant labs did this with tool descriptions [1], but I also achieved the same via MCP resource attachments[2]. It's a pretty major flaw exacerbated by the type of privilege and systems people are giving MCP servers access to.

This isn't necessarily the fault of the spec itself, but how most clients have implemented it allows for some pretty major prompt injections.

[1] https://invariantlabs.ai/blog/mcp-security-notification-tool... [2] https://www.bernardiq.com/blog/resource-poisoning/

cyanydeez • 7 days ago

Isn't this basically a lot of hand waving that ends up being isomorphic to SQL injection?

Thats what we're talking about? A bunch of systems cobbled together where one could SQL inject at any point and there's basically zero observability?

2 replies

seanhunter • 7 days ago

Yes, and the people involved in all this stuff have also reinvented SQL injection in a different way in the prompt interface, since it's impossible[1] for the model to tell what parts of the prompt are trustworthy and what parts are tainted by user input, no matter what delimeters etc you try to use. This is because what the model sees is just a bunch of token numbers. You'd need to change how the encoding and decoding steps work and change how models are trained to introduce something akin to the placeholders that solve the sql injection problem.

Therefore it's possible to prompt inject and tool inject. So you could for example prompt inject to get a model to call your tool which then does an injection to get the user to run some untrustworthy code of your own devising.

[1] See the excellent series by Simon Willison on this https://simonwillison.net/series/prompt-injection/

mlenhard • 7 days ago

Yeah, you aren't far off with SQL injection comparison. That being said it's not really a fault of the MCP spec, more so with current client implementations of it.