Building a Lightweight Research Agent with Gemma 2, Ollama, and OpenAI Agents SDK

Authors
  • avatar
    Name
    Nino
    Occupation
    Senior Tech Editor

The evolution of Large Language Models (LLMs) has moved rapidly from simple chat interfaces to autonomous agents capable of using tools, browsing the web, and executing complex workflows. While cloud-based models like GPT-4o or Claude 3.5 Sonnet are the standard for high-reasoning tasks, the rise of powerful open-weights models like Gemma 2 has made it possible to build sophisticated, private, and cost-effective agents locally.

In this tutorial, we will explore how to bridge the gap between a local LLM and a functional 'Tool-Using Agent' by combining Ollama, the OpenAI Agents SDK, and Tavily's search capabilities via the Model Context Protocol (MCP). When your local resources are stretched or you need production-grade reliability, platforms like n1n.ai provide the necessary infrastructure to scale these agentic workflows seamlessly.

The Architecture of a Modern Agent

Traditional LLM applications follow a linear 'Prompt-Response' pattern. Agents, however, operate on a 'Loop-Reason-Act' cycle. To build a research agent, we need four core components:

  1. The Brain (LLM): Gemma 2 (9B or 27B), served via Ollama.
  2. The Framework: OpenAI's Agents SDK (or Swarm) to handle state and tool orchestration.
  3. The Tools: Tavily for high-quality, AI-optimized web searching.
  4. The Bridge: An OpenAI-compatible API layer to allow the SDK to talk to local models.

Step 1: Setting Up the Local Brain with Ollama

Ollama has become the de facto standard for running models locally. For an agentic workflow, Gemma 2 is an excellent choice due to its high 'instruction-following' capabilities, which are critical for tool-calling.

First, install Ollama and pull the model:

ollama pull gemma2:9b
ollama serve

By default, Ollama serves an OpenAI-compatible API at http://localhost:11434/v1. This is crucial because it allows us to use the OpenAI Agents SDK without modifying the underlying communication logic.

Step 2: Integrating the OpenAI Agents SDK

The OpenAI Agents SDK is designed to simplify the handoff between different specialized agents. Even when using local models, this SDK provides a clean abstraction for defining 'Functions' (tools) that the model can invoke.

To install the necessary dependencies:

pip install openai tavily-python python-dotenv

Step 3: Implementing the Search Tool (Tavily)

Agents are only as good as the information they can access. While a local LLM's knowledge is frozen in time, Tavily provides a search engine purpose-built for LLMs, returning clean Markdown instead of cluttered HTML.

Here is how we define a search tool for our agent:

from tavily import TavilyClient
import os

tavily = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

def web_search(query: str):
    """Search the web for the latest information on a topic."""
    print(f"[Tool] Searching for: {query}...")
    response = tavily.search(query=query, search_depth="advanced")
    return "\n".join([f"Source: {r['url']}\nContent: {r['content']}" for r in response['results']])

Step 4: Orchestrating the Agent

Now, we connect the dots. We initialize the OpenAI client to point to our local Ollama instance and define an agent that has access to the web_search function.

from openai import OpenAI

# Point to local Ollama
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama" # Required but ignored by Ollama
)

# Define the agent logic
def run_research_agent(user_prompt):
    messages = [
        {"role": "system", "content": "You are a helpful research assistant. Use the web_search tool to find facts."},
        {"role": "user", "content": user_prompt}
    ]

    # Note: Local tool calling requires the model to support it.
    # Gemma 2 works best when the prompt explicitly defines the tool schema.
    response = client.chat.completions.create(
        model="gemma2:9b",
        messages=messages,
        tools=[{
            "type": "function",
            "function": {
                "name": "web_search",
                "description": "Search the web",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "query": {"type": "string"}
                    },
                    "required": ["query"]
                }
            }
        }]
    )
    return response

Pro Tip: Handling Context Limits and Latency

When running locally, you might encounter performance bottlenecks. Smaller models like Gemma 2 9B have a limited context window (usually 8k tokens). If your search results are too large, the agent will fail.

Optimization Strategies:

  • Summarization: Before feeding search results back to the agent, use a secondary call to summarize the text.
  • Hybrid Execution: Use local models for sensitive data and n1n.ai for complex reasoning tasks that require larger context windows or faster inference speeds.
  • Temperature Control: Set temperature to 0 for tool-calling to ensure consistent JSON formatting.

Moving Beyond Local: The n1n.ai Advantage

While building locally is great for development, production environments often require higher availability and lower latency than a local GPU can provide. n1n.ai acts as a high-speed aggregator, allowing you to switch between local-style open models (like Llama 3.1 or DeepSeek) and flagship models (like GPT-4o) with a single API key.

By routing your agentic workflows through n1n.ai, you benefit from:

  1. Unified API: No need to change your code when moving from local Ollama to cloud-hosted Llama 3.
  2. Cost Efficiency: Access the world's most powerful models at the lowest possible price point.
  3. Reliability: Automatic failover and high-concurrency support.

Conclusion

Transforming a local LLM into a tool-using agent is the first step toward building truly autonomous AI systems. By leveraging Gemma 2, Ollama, and the OpenAI SDK, developers can prototype sophisticated research tools on their own hardware. However, for scaling and enterprise-grade performance, integrating a robust API provider is essential.

Get a free API key at n1n.ai