MCP Support

vLLora provides full support for Model Context Protocol (MCP) servers, enabling seamless integration with external tools by connecting with MCP Servers through HTTP and SSE. When your model requests a tool call, vLLora automatically executes the MCP tool call on your behalf and returns the results to the model, allowing your AI models to dynamically access external data sources, APIs, databases, and tools during conversations.

What is MCP?

Model Context Protocol (MCP) is an open standard that enables AI models to seamlessly communicate with external systems. It allows models to dynamically process contextual data, ensuring efficient, adaptive, and scalable interactions. MCP simplifies request orchestration across distributed AI systems, enhancing interoperability and context-awareness.

With native tool integrations, MCP connects AI models to APIs, databases, local files, automation tools, and remote services through a standardized protocol. Developers can effortlessly integrate MCP with IDEs, business workflows, and cloud platforms, while retaining the flexibility to switch between LLM providers. This enables the creation of intelligent, multi-modal workflows where AI securely interacts with real-world data and tools.

For more details, visit the Model Context Protocol official page and explore Anthropic MCP documentation.

Using MCP with vLLora

vLLora supports two ways to use MCP servers:

Configure MCP servers in settings - Set up MCP servers through the vLLora UI and use them in Chat
Send MCP servers in request body - Include MCP server configuration directly in your chat completions API request

Method 1: Configure MCP Servers in Settings

You can configure MCP servers through the vLLora settings. Once configured, these servers will be available for use in the chat interface.

Navigate to the Settings section in the vLLora UI
Add your MCP server configuration

MCP Configuration in Settings

Use the configured servers in your chat conversations

MCP Tools selection in Chat

Settings Configuration

MCP servers configured in settings are persistent and available across all your projects. This is ideal for frequently used MCP servers.

Method 2: Send MCP Servers in Request Body

You can include MCP server configuration directly in your chat completions request body. This method gives you full control over which MCP servers to use for each request.

Request Format

Add an mcp_servers array to your chat completions request body:

{
  "model": "openai/gpt-4o-mini",
  "messages": [
    {
      "role": "user",
      "content": "use deepwiki and get information about java"
    }
  ],
  "stream": true,
  "mcp_servers": [
    {
      "type": "http",
      "server_url": "https://mcp.deepwiki.com/mcp",
      "headers": {},
      "env": null
    }
  ]
}

MCP Server Configuration

Each MCP server in the mcp_servers array supports the following configuration:

Field	Type	Required	Description
`type`	string	Yes	Connection type for MCP server. Must be one of: `"ws"` (WebSocket), `"http"`, or `"sse"` (Server-Sent Events)
`server_url`	string	Yes	URL for the MCP server connection. Supports WebSocket (wss://), HTTP (https://), and SSE (https://) endpoints
`headers`	object	No	Custom HTTP headers to send with requests to the MCP server (default: `{}`)
`env`	object/null	No	Environment variables for the MCP server (default: `null`)
`filter`	array	No	Optional filter to limit which tools/resources are available from this server. Each item should have a `name` field (and optionally `description`). Supports regex patterns in the name field

Complete Example

Here's a complete example using multiple MCP servers:

{
  "model": "openai/gpt-4o-mini",
  "messages": [
    {
      "role": "user",
      "content": "use deepwiki and get information about java"
    }
  ],
  "stream": true,
  "mcp_servers": [
    {
      "type": "http",
      "server_url": "https://mcp.deepwiki.com/mcp",
      "headers": {},
      "env": null
    },
    {
      "type": "http",
      "server_url": "https://remote.mcpservers.org/edgeone-pages/mcp",
      "headers": {},
      "env": null
    }
  ]
}

Using Filters

You can optionally filter which tools or resources are available from an MCP server by including a filter array:

{
  "mcp_servers": [
    {
      "filter": [
        {
          "name": "read_wiki_structure"
        },
        {
          "name": "read_wiki_contents"
        },
        {
          "name": "ask_question"
        }
      ],
      "type": "http",
      "server_url": "https://mcp.deepwiki.com/mcp",
      "headers": {},
      "env": null
    }
  ]
}

When filter is specified, only the tools/resources matching the filter criteria will be available to the model.

How MCP Tool Execution Works

When you include MCP servers in your request, vLLora:

Connects to the MCP server - Establishes a connection using the specified transport type (HTTP, SSE, or WebSocket)
Discovers available tools - Retrieves the list of tools and resources exposed by the MCP server
Makes tools available to the model - The model can see and request these tools during the conversation
Executes tool calls automatically - When the model requests a tool call, vLLora executes it on the MCP server and returns the results
Traces all interactions - All MCP tool calls, their parameters, and results are captured in vLLora's tracing system

This means you don't need to handle tool execution yourself—vLLora manages the entire MCP workflow, from connection to execution to result delivery.

Code Examples

Python
curl
TypeScript

Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:9090/v1",
    api_key="no_key",
)

response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": "use deepwiki and get information about java"
        }
    ],
    stream=True,
    extra_body={
        "mcp_servers": [
            {
                "type": "http",
                "server_url": "https://mcp.deepwiki.com/mcp",
            }
        ]
    }
)

curl
curl -X POST 'http://localhost:9090/v1/chat/completions' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "openai/gpt-4o-mini",
  "messages": [
    {
      "role": "user",
      "content": "use deepwiki and get information about java"
    }
  ],
  "mcp_servers": [
    {
      "type": "http",
      "server_url": "https://mcp.deepwiki.com/mcp",
    }
  ]
}'

TypeScript (OpenAI SDK)
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:9090/v1',
  apiKey: 'no_key',
});

const response = await client.chat.completions.create({
  model: 'openai/gpt-4o-mini',
  messages: [
    {
      role: 'user',
      content: 'use deepwiki and get information about java',
    },
  ],
  // @ts-expect-error mcp_servers is a vLLora extension
  mcp_servers: [
    {
      type: 'http',
      server_url: 'https://mcp.deepwiki.com/mcp'
    },
  ],
});

What is MCP?​

Using MCP with vLLora​

Method 1: Configure MCP Servers in Settings​

Method 2: Send MCP Servers in Request Body​

Request Format​

MCP Server Configuration​

Complete Example​

Using Filters​

How MCP Tool Execution Works​

Code Examples​