MCP Server and Client in Spring AI: Decoupling Tools from AI Hosts
- Authors

- Name
- Nino
- Occupation
- Senior Tech Editor
Building Large Language Model (LLM) applications with Spring Boot has become significantly easier with the advent of Spring AI. However, as applications scale, a common architectural bottleneck emerges: tool coupling. When you register tools directly within your AI host application using @Bean and @Tool annotations, you create a monolithic dependency that hinders agility and scalability.
In this guide, we explore the Model Context Protocol (MCP), a revolutionary approach to separating tool logic from the AI orchestration layer. By using n1n.ai as your high-performance API gateway for models like OpenAI o3 and Claude 3.5 Sonnet, combined with a decoupled MCP architecture, you can build enterprise-grade AI systems that are both flexible and robust.
The Problem: The Tool-Coupling Monolith
Most developers start their Spring AI journey by embedding tools directly into the chat service. While this works for prototypes, it introduces several critical issues in production:
- Deployment Coupling: Any update to a tool's logic (e.g., changing a database query in an
OrderTool) requires a full redeploy of the AI service, even if the LLM logic remains unchanged. - Lack of Reusability: If multiple AI applications (e.g., a customer support bot and an internal analytics tool) need the same "Inventory Search" tool, you are forced to copy-paste code or manage complex shared libraries.
- Trust and Security Boundaries: A bug in a tool can potentially crash the main AI service. Furthermore, tools often require specific permissions that the AI host shouldn't necessarily possess.
- Static Inventory: Tools are typically fixed at startup. Adding a new capability usually requires a restart, preventing dynamic runtime updates.
Enter Model Context Protocol (MCP)
MCP is an open standard that allows AI models to interact with external tools and data sources through a standardized interface. Instead of the AI host "owning" the tools, it acts as an MCP Client that connects to one or more MCP Servers.
When integrated with an aggregator like n1n.ai, which provides unified access to the world's most powerful models, MCP allows you to swap both the "brain" (the model) and the "hands" (the tools) without rewriting your core application logic.
Architecture Overview
Our implementation consists of two independent Spring Boot services:
- MCP Tool Server (Port 8080): Hosts the actual business logic. It exposes tools via
@McpToolannotations over Streamable HTTP. - AI Chat Service (Port 8081): The user-facing gateway. It uses Spring AI's
ChatClientand acts as an MCP Client to dynamically discover tools from the server.
Step 1: Building the MCP Tool Server
First, we need the specialized starter for the MCP server. In your pom.xml:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-mcp-server-webmvc</artifactId>
</dependency>
Now, define your tool. Notice how we use the @Tool annotation, which Spring AI leverages to generate the JSON Schema required by models like DeepSeek-V3 or GPT-4o.
@Service
public class OrderTool {
@Tool(description = "Get the current status and details of an order by its ID")
public Map<String, Object> getOrderStatus(
@ToolParam(description = "The unique order identifier, e.g. ORD-12345")
String orderId) {
// Mock logic - in production, this would call a DB or another API
return Map.of(
"orderId", orderId,
"status", "SHIPPED",
"estimatedDelivery", "2023-10-25"
);
}
}
Configure the server in application.properties to use the STREAMABLE protocol, which allows for persistent sessions:
spring.ai.mcp.server.name=order-tool-server
spring.ai.mcp.server.version=1.0.0
spring.ai.mcp.server.protocol=STREAMABLE
server.port=8080
Step 2: Implementing the AI Chat Service (MCP Client)
The client service needs to connect to the server and the LLM provider. We recommend using n1n.ai to access OpenAI or Anthropic models with lower latency and higher reliability.
Add the dependencies:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-mcp-client</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>
Configure the client to point to the Tool Server:
spring.ai.mcp.client.toolcallback.enabled=true
spring.ai.mcp.client.connections.tool-server.url=http://localhost:8080/mcp
spring.ai.mcp.client.connections.tool-server.transport=STREAMABLE_HTTP
# Use n1n.ai endpoint for the LLM
spring.ai.openai.base-url=https://api.n1n.ai/v1
spring.ai.openai.api-key=$\{N1N_API_KEY\}
Finally, wire the SyncMcpToolCallbackProvider into your ChatClient. This is where the magic happens: the client will automatically fetch the tool definitions from the server.
@Configuration
public class ChatConfig {
@Bean
ChatClient chatClient(ChatModel chatModel,
SyncMcpToolCallbackProvider toolCallbackProvider) {
return ChatClient.builder(chatModel)
.defaultTools(toolCallbackProvider)
.build();
}
}
Pro Tip: Dynamic Tool Discovery
One of the greatest advantages of this setup is dynamic discovery. Because the toolCallbackProvider re-fetches the tool list, you can add new tools to your server without restarting your AI Chat Service.
| Feature | Internal Tools (@Bean) | MCP Tools (Decoupled) |
|---|---|---|
| Scaling | Scales with AI Service | Independent Scaling |
| Updates | Requires Restart | Hot-swappable |
| Language | Java Only | Language Agnostic |
| Visibility | Opaque | Structured Logs/Traces |
Testing the Implementation
Once both services are running, you can test the flow using a simple cURL request to the AI Chat Service:
curl -X POST http://localhost:8081/api/chat \
-H "Content-Type: application/json" \
-d '{"message":"Where is my order ORD-999?"}'
The Execution Flow:
- The AI Chat Service receives the prompt.
- It queries the MCP Tool Server for available tools.
- It sends the prompt + tool definitions to the LLM (via n1n.ai).
- The LLM (e.g., OpenAI o3) returns a tool call request for
getOrderStatus. - The AI Chat Service executes the tool call against the MCP Tool Server.
- The result is sent back to the LLM to generate the final natural language response.
Advanced: Stateless vs. Stateful
By default, the server uses STREAMABLE mode, which maintains session affinity. However, for high-availability production environments using Kubernetes, you might prefer STATELESS mode:
spring.ai.mcp.server.protocol=STATELESS
In stateless mode, every request is self-contained, allowing your load balancer to distribute traffic across multiple tool server instances without worrying about session sticky-bits.
Conclusion
Decoupling your tools from your AI host using the Model Context Protocol is a prerequisite for building maintainable, enterprise-scale AI applications. It allows for cleaner code, faster deployment cycles, and better resource utilization. When combined with the high-speed LLM APIs provided by n1n.ai, you have a foundation capable of supporting the most demanding RAG and Agentic workflows.
Get a free API key at n1n.ai.