MCP Server and Client in Spring AI: Decoupling Tools from AI Hosts

Authors
  • avatar
    Name
    Nino
    Occupation
    Senior Tech Editor

Building Large Language Model (LLM) applications with Spring Boot has become significantly easier with the advent of Spring AI. However, as applications scale, a common architectural bottleneck emerges: tool coupling. When you register tools directly within your AI host application using @Bean and @Tool annotations, you create a monolithic dependency that hinders agility and scalability.

In this guide, we explore the Model Context Protocol (MCP), a revolutionary approach to separating tool logic from the AI orchestration layer. By using n1n.ai as your high-performance API gateway for models like OpenAI o3 and Claude 3.5 Sonnet, combined with a decoupled MCP architecture, you can build enterprise-grade AI systems that are both flexible and robust.

The Problem: The Tool-Coupling Monolith

Most developers start their Spring AI journey by embedding tools directly into the chat service. While this works for prototypes, it introduces several critical issues in production:

  1. Deployment Coupling: Any update to a tool's logic (e.g., changing a database query in an OrderTool) requires a full redeploy of the AI service, even if the LLM logic remains unchanged.
  2. Lack of Reusability: If multiple AI applications (e.g., a customer support bot and an internal analytics tool) need the same "Inventory Search" tool, you are forced to copy-paste code or manage complex shared libraries.
  3. Trust and Security Boundaries: A bug in a tool can potentially crash the main AI service. Furthermore, tools often require specific permissions that the AI host shouldn't necessarily possess.
  4. Static Inventory: Tools are typically fixed at startup. Adding a new capability usually requires a restart, preventing dynamic runtime updates.

Enter Model Context Protocol (MCP)

MCP is an open standard that allows AI models to interact with external tools and data sources through a standardized interface. Instead of the AI host "owning" the tools, it acts as an MCP Client that connects to one or more MCP Servers.

When integrated with an aggregator like n1n.ai, which provides unified access to the world's most powerful models, MCP allows you to swap both the "brain" (the model) and the "hands" (the tools) without rewriting your core application logic.

Architecture Overview

Our implementation consists of two independent Spring Boot services:

  • MCP Tool Server (Port 8080): Hosts the actual business logic. It exposes tools via @McpTool annotations over Streamable HTTP.
  • AI Chat Service (Port 8081): The user-facing gateway. It uses Spring AI's ChatClient and acts as an MCP Client to dynamically discover tools from the server.

Step 1: Building the MCP Tool Server

First, we need the specialized starter for the MCP server. In your pom.xml:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-server-webmvc</artifactId>
</dependency>

Now, define your tool. Notice how we use the @Tool annotation, which Spring AI leverages to generate the JSON Schema required by models like DeepSeek-V3 or GPT-4o.

@Service
public class OrderTool {

    @Tool(description = "Get the current status and details of an order by its ID")
    public Map<String, Object> getOrderStatus(
            @ToolParam(description = "The unique order identifier, e.g. ORD-12345")
            String orderId) {

        // Mock logic - in production, this would call a DB or another API
        return Map.of(
                "orderId", orderId,
                "status", "SHIPPED",
                "estimatedDelivery", "2023-10-25"
        );
    }
}

Configure the server in application.properties to use the STREAMABLE protocol, which allows for persistent sessions:

spring.ai.mcp.server.name=order-tool-server
spring.ai.mcp.server.version=1.0.0
spring.ai.mcp.server.protocol=STREAMABLE
server.port=8080

Step 2: Implementing the AI Chat Service (MCP Client)

The client service needs to connect to the server and the LLM provider. We recommend using n1n.ai to access OpenAI or Anthropic models with lower latency and higher reliability.

Add the dependencies:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-client</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

Configure the client to point to the Tool Server:

spring.ai.mcp.client.toolcallback.enabled=true
spring.ai.mcp.client.connections.tool-server.url=http://localhost:8080/mcp
spring.ai.mcp.client.connections.tool-server.transport=STREAMABLE_HTTP
# Use n1n.ai endpoint for the LLM
spring.ai.openai.base-url=https://api.n1n.ai/v1
spring.ai.openai.api-key=$\{N1N_API_KEY\}

Finally, wire the SyncMcpToolCallbackProvider into your ChatClient. This is where the magic happens: the client will automatically fetch the tool definitions from the server.

@Configuration
public class ChatConfig {

    @Bean
    ChatClient chatClient(ChatModel chatModel,
                          SyncMcpToolCallbackProvider toolCallbackProvider) {
        return ChatClient.builder(chatModel)
                .defaultTools(toolCallbackProvider)
                .build();
    }
}

Pro Tip: Dynamic Tool Discovery

One of the greatest advantages of this setup is dynamic discovery. Because the toolCallbackProvider re-fetches the tool list, you can add new tools to your server without restarting your AI Chat Service.

FeatureInternal Tools (@Bean)MCP Tools (Decoupled)
ScalingScales with AI ServiceIndependent Scaling
UpdatesRequires RestartHot-swappable
LanguageJava OnlyLanguage Agnostic
VisibilityOpaqueStructured Logs/Traces

Testing the Implementation

Once both services are running, you can test the flow using a simple cURL request to the AI Chat Service:

curl -X POST http://localhost:8081/api/chat \
  -H "Content-Type: application/json" \
  -d '{"message":"Where is my order ORD-999?"}'

The Execution Flow:

  1. The AI Chat Service receives the prompt.
  2. It queries the MCP Tool Server for available tools.
  3. It sends the prompt + tool definitions to the LLM (via n1n.ai).
  4. The LLM (e.g., OpenAI o3) returns a tool call request for getOrderStatus.
  5. The AI Chat Service executes the tool call against the MCP Tool Server.
  6. The result is sent back to the LLM to generate the final natural language response.

Advanced: Stateless vs. Stateful

By default, the server uses STREAMABLE mode, which maintains session affinity. However, for high-availability production environments using Kubernetes, you might prefer STATELESS mode:

spring.ai.mcp.server.protocol=STATELESS

In stateless mode, every request is self-contained, allowing your load balancer to distribute traffic across multiple tool server instances without worrying about session sticky-bits.

Conclusion

Decoupling your tools from your AI host using the Model Context Protocol is a prerequisite for building maintainable, enterprise-scale AI applications. It allows for cleaner code, faster deployment cycles, and better resource utilization. When combined with the high-speed LLM APIs provided by n1n.ai, you have a foundation capable of supporting the most demanding RAG and Agentic workflows.

Get a free API key at n1n.ai.