Master the Claude API for Streaming and Tool Use
- Authors

- Name
- Nino
- Occupation
- Senior Tech Editor
The landscape of Large Language Models (LLMs) has shifted dramatically with the release of Claude 3.5 Sonnet and the robust Anthropic Messages API. For developers, the challenge is no longer just about getting a response—it is about building low-latency, agentic, and reliable systems. By leveraging n1n.ai, developers can seamlessly integrate these powerful models into their stack with a single unified interface. This guide explores the advanced implementation of Claude's core features: System Prompts, Streaming, and Tool Use (Function Calling).
1. Setting Up Your Claude Environment
To begin building, you need the official Anthropic SDK. However, for production environments where you might need to switch between Claude, GPT-4o, or DeepSeek-V3, using an aggregator like n1n.ai is highly recommended to avoid vendor lock-in and manage multiple API keys through one dashboard.
First, install the library:
npm install @anthropic-ai/sdk
Initialize your client. If you are using n1n.ai to access Claude, you would simply update the base URL and use your n1n.ai API key:
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
// Optional: baseUrl: 'https://api.n1n.ai/v1'
})
2. The Power of System Prompts
Unlike older models where instructions were mixed with user content, Claude uses a dedicated system parameter. This separation ensures that the model adheres strictly to its persona and constraints, even during long conversations. This is particularly vital for technical tasks like code review or data extraction.
Pro Tip: Use XML Tags
Claude is specifically trained to recognize structure within XML tags. When providing complex instructions or context, wrap them in tags like <instructions> or <context>.
const response = await client.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 2048,
system: `You are a senior TypeScript engineer.
<constraints>
- Focus on security and performance.
- Always return valid JSON.
- Severity levels must be: 'error', 'warning', or 'info'.
</constraints>`,
messages: [{ role: 'user', content: 'Review this code: const x = eval(input);' }],
})
3. Implementing Real-Time Streaming
For modern web applications, waiting for a full 1000-word response to generate creates a poor user experience. Streaming allows you to pipe the model's output to the frontend as it is generated, significantly reducing the "Time to First Token" (TTFT).
Anthropic's SDK provides a high-level helper for streaming that handles event parsing automatically. Here is how you can implement a streaming endpoint in a Node.js environment:
const stream = client.messages.stream({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 4096,
messages: [
{ role: 'user', content: 'Write a detailed technical architecture for a RAG system.' },
],
})
// Iterate over chunks as they arrive
for await (const event of stream) {
if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
// In a real app, send this to the client via WebSockets or SSE
process.stdout.write(event.delta.text)
}
}
const finalMessage = await stream.getFinalMessage()
console.log('\nStream completed. Total tokens: ', finalMessage.usage.output_tokens)
4. Tool Use: Building Agentic Workflows
Tool use (also known as Function Calling) is what transforms an LLM from a chatbot into an agent. It allows Claude to interact with external APIs, databases, or local scripts.
The Two-Step Loop
- Model Call: You define the tools available. Claude decides if it needs to use one.
- Execution: Your code executes the tool and sends the result back to Claude.
const tools = [
{
name: 'fetch_user_data',
description: 'Retrieve user details from the database by ID',
input_schema: {
type: 'object' as const,
properties: {
userId: { type: 'string', description: 'The UUID of the user' },
},
required: ['userId'],
},
},
]
const response = await client.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
tools: tools,
messages: [{ role: 'user', content: 'Get details for user 123-abc' }],
})
if (response.stop_reason === 'tool_use') {
const toolCall = response.content.find((c) => c.type === 'tool_use')
if (toolCall) {
// Execute your local logic here
const result = await myDatabase.getUser(toolCall.input.userId)
// Send the result back to Claude
const finalResponse = await client.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Get details for user 123-abc' },
{ role: 'assistant', content: response.content },
{
role: 'user',
content: [
{
type: 'tool_result',
tool_use_id: toolCall.id,
content: JSON.stringify(result),
},
],
},
],
})
}
}
5. Model Selection and Cost Benchmarks
Choosing the right model is critical for balancing performance and budget. Below is a comparison of the Claude 3 family available via n1n.ai.
| Model | Latency | Reasoning | Use Case |
|---|---|---|---|
| Claude 3.5 Sonnet | Moderate | Exceptional | Coding, complex RAG, agentic loops |
| Claude 3 Haiku | Ultra-Low | Good | Classification, simple extraction, high-volume |
| Claude 3 Opus | High | Maximum | Deep research, complex math, multi-step logic |
6. Optimization and Best Practices
Prompt Caching
Claude now supports Prompt Caching, which can reduce costs by up to 90% for long-context applications like legal document analysis or codebase chat. By marking specific blocks of text as static, you avoid paying for them repeatedly in a conversation.
Temperature and Top-P
For technical tasks like code generation, keep temperature low (e.g., 0.2). For creative writing or brainstorming, increase it to 0.8.
Error Handling
Always implement exponential backoff for rate limits. If you are using n1n.ai, many of these routing and fallback mechanisms can be handled at the gateway level, ensuring your application remains resilient even if a specific provider's endpoint experiences downtime.
Conclusion
Building with Claude 3.5 Sonnet requires a shift toward structured, tool-oriented design. By mastering system prompts, implementing streaming for UX, and utilizing tool use for automation, you can build applications that were impossible just a year ago.
Ready to scale your AI infrastructure? Get a free API key at n1n.ai.