I read through several of the top level pages, then SQLite, but still had no idea what was meant by "context" as it's a highly ambiguous word and is never mentioned with any concrete definition, example, or scope of capability that it is meant to imply.
After reading the Python server tutorial, it looks like there is some tool calling going on, in the old terminology. That makes more sense. But none of the examples seem to indicate what the protocol is, whether it's a RAG sort of thing, do I need to prompt, etc.
It would be nice to provide a bit more concrete info about capabilities and what the purposes is before getting into call diagrams. What do the arrows represent? That's more important to know than the order that a host talks to a server talks to a remote resource.
I think this is something that I really want and want to build a server for, but it's unclear to me how much more time I will have to invest before getting the basic information about it!
Thank you. That’s good feedback.
The gist of it is: you have an llm application such as Claude desktop. You want to have it interact (read or write) with some system you have. MCP solves this.
For example you can give the application the database schema as a “resource”, effectively saying; here is a bunch of text, do whatever you want with it during my chat with the llm. Or you can give the application a tool such as query my database. Now the model itself can decide when it wants to query (usually because you said: hey tell me what’s in the accounts table or something similar).
It’s “bring the things you care about” to any llm application with an mcp client
Or, in short: it's (an attempt to create) a standard protocol to plug tools to LLM app via the good ol' tools/function calling mechanism.
It's not introducing new capabilities, just solving the NxM problem, hopefully leading to more tools being written.
(At least that's how I understand this. Am I far off?)
We definitely hope this will solve the NxM problem.
On tools specifically, we went back and forth about whether the other primitives of MCP ultimately just reduce to tool use, but ultimately concluded that separate concepts of "prompts" and "resources" are extremely useful to express different _intentions_ for server functionality. They all have a part to play!
I think this where the real question is for me. When I read about MCP, the topmost question in my mind is "Why isn't this just tool calling?" I had difficulty finding an answer to this. Below, you have someone else asking "Why not just use GraphQL?" And so on.
It would probably be helpful for many of your readers if you had a focused document that addressed specifically that motivating question, together with illustrated examples. What does MCP provide, and what does it intend to solve, that a tool calling interface or RPC protocol can't?
You can find more information on some design questions like these in https://spec.modelcontextprotocol.io/specification, which is a much more "implementors" focused guide than the user documentation at https://modelcontextprotocol.io
Seems more accurate to state this reshapes the NxM problem rather than solving it.
Yeah even I don't understand how it exactly solves the NXM problem (which translates to having M different prompts for N different llms. corerct me if I'm wrong please)
Does it give a standard way to approve changes? I wouldn't want to give an LLM access to my database unless I can approve the changes it applies.
It seems to support your ask, as much as a protocol can. Having read all the docs and looked through some code, my mental model is:
- A host never talks to a server directly, only via a Client (which is presumably a human). The host has or is the LLM (app).
- A server only supplies context data (readonly), in the form of tool call, direct resource URL, or pre populated prompt. It can call back to a client directly, for example to request something from the hosts LLM.
- A client sits in the middle, representing the human in the loop. It manages the requests bidirectionally
It seems mostly modeled around the security boundaries, rather than just AI capabilities domains. The client is always in the loop, the host and server do not directly communicate. I look at the filesystem server and I don't see any indication of a difference between a tool that is just reading from one that is doing changes:
https://github.com/modelcontextprotocol/servers/blob/main/sr...
How can an add on that works with arbitrary "servers" tell the difference between these two tools? Without being able to tell the difference you can't really build a generic way to ask for confirmation in the application that is using the server...
{
name: "create_directory",
description:
"Create a new directory or ensure a directory exists. Can create multiple " +
"nested directories in one operation. If the directory already exists, " +
"this operation will succeed silently. Perfect for setting up directory " +
"structures for projects or ensuring required paths exist. Only works within allowed directories.",
inputSchema: zodToJsonSchema(CreateDirectoryArgsSchema) as ToolInput,
},
{
name: "list_directory",
description:
"Get a detailed listing of all files and directories in a specified path. " +
"Results clearly distinguish between files and directories with [FILE] and [DIR] " +
"prefixes. This tool is essential for understanding directory structure and " +
"finding specific files within a directory. Only works within allowed directories.",
inputSchema: zodToJsonSchema(ListDirectoryArgsSchema) as ToolInput,
},
Great work on the protocol!! I am looking for some examples of creating my own custom client with the Anthropic API leveraging MCP, but I could not find any. Pretty much want to understand how Claude Desktop is integrating with MCP Server along with Anthropic API Can you provide some pointers about the integration? e.g.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=1024, [mcp_server]=... ## etc.? ... )
At first glance it seems to be a proposed standard interface and protocol for describing and offering an external system to the function calling faculity of an LLM.
> had no idea what was meant by "context" as it's a highly ambiguous word and is never mentioned with any concrete definition
(forgive me if you know this and are asking a different question, but:)
I don't know how familiar you are with LLMs, but "context" used in that context generally has the pretty clear meaning of "the blob of text you give in between (the text of) the system prompt and (the text of) the user prompt"[1], which acts as context for the user's request (hence the name). Very often this is the conversation history in chatbot-style LLMs, but it can include stuff like the content of text files you're working with, or search/function results.
[1] If you want to be pedantic, technically each instance of "text" should say "tokens" there, and the maximum "context" length includes the length of both prompts.