Mastering Gemini Batch API and Webhooks for LINE Bot Restaurant Analysis

When developing Large Language Model (LLM) applications, developers frequently encounter scenarios that require processing massive datasets. Whether it is analyzing hundreds of customer reviews, classifying thousands of documents, or generating batch translations, using traditional synchronous APIs often leads to significant bottlenecks. These include hitting strict Rate Limits, facing network timeouts, and incurring high computational costs.

To address these challenges, Google introduced the Gemini Batch API and Webhook API. These tools allow for asynchronous processing, which is not only more stable but also significantly cheaper. In this tutorial, we will explore how to integrate these APIs into a LINE Bot Restaurant Analysis Assistant. By leveraging the infrastructure provided by n1n.ai, developers can seamlessly manage multiple LLM providers, including Gemini, to ensure their applications remain scalable and cost-effective.

The Synchronous API Bottleneck

In a typical real-time LLM interaction, the client sends a request and waits for a response. This works for simple chat interfaces but fails in data-heavy tasks. For instance, if a user wants to analyze the top 10 restaurants in an area, each with dozens of reviews, a synchronous call might take 30+ seconds. Most messaging platforms, like LINE, require a response within a 3-second window. Failing this results in a timeout error. Furthermore, real-time APIs are priced at a premium.

This is where n1n.ai becomes essential. As a premier LLM API aggregator, n1n.ai provides the stability and high-speed routing needed to handle the initial user interaction before offloading heavy tasks to the Gemini Batch API.

Understanding Gemini Batch and Webhook APIs

Gemini Batch API: This allows you to bundle multiple requests into a single JSONL file. Once uploaded, Gemini processes these requests in the background. The key advantages are that it does not consume your real-time Rate Limit quota and typically costs 50% less than real-time calls.
Webhook API: Instead of constantly polling the server to check if a batch job is finished (which wastes resources), the Webhook API sends an HTTP POST callback to your server the moment the task is completed.

Architectural Workflow

The following diagram illustrates the optimized flow for our Restaurant Analysis Assistant:

graph TD
    A[User Sends Location] -->|Location Message| B[Google Maps Grounding Search]
    B -->|Plain Text Restaurant List| C[Gemini-1.5-Flash Extracts Top 3 Restaurants]
    C -->|Dynamically Generates Quick Reply| D[LINE Bot Replies with 3 Customized Analysis Buttons]
    D -->|User Clicks Specific Analysis| E[FastAPI Background Task]
    E -->|Immediate Reply ACK| F[LINE Chat Message]
    E -->|Package JSONL and Upload| G[Gemini Batch API Submission]
    G -->|Computation Complete Webhook Callback| H[Proactively Pushes Deep Report to User]

Step 1: Structured Extraction for Dynamic UI

When the user sends a location, we first get a list of restaurants via Google Maps. We then use gemini-1.5-flash to extract just the names in a structured JSON format to create LINE Quick Reply buttons.

# Extract top three restaurant names for Quick Reply
names = []
if place_type == "restaurant":
    try:
        extract_prompt = f"Please extract all restaurant names from the following text and return them in a JSON array format (e.g., [\"Restaurant A\", \"Restaurant B\"]). Output the JSON array directly without markdown tags.\n\n{result}"
        extract_res = client.models.generate_content(
            model="gemini-1.5-flash",
            contents=extract_prompt
        )
        extract_text = extract_res.text.strip() if extract_res.text else ""

        try:
            names = json.loads(extract_text)
        except Exception:
            import re
            array_match = re.search(r"\[(.*?)\]", extract_text, re.DOTALL)
            if array_match:
                import ast
                names = ast.literal_eval(f"[{array_match.group(1)}]")

        names = [str(n).strip() for n in names if n]
    except Exception as e_extract:
        logger.error(f"Extraction failed: {e_extract}")

Step 2: Handling LINE API Constraints

LINE has a strict 20-character limit for Quick Reply button labels. If the restaurant name is too long, the API will return a 400 error. We implement a truncation logic to handle this safely.

quick_reply = None
if place_type == "restaurant" and result.get("status") == "success":
    restaurant_names = result.get("restaurant_names", [])
    if restaurant_names:
        buttons = []
        for name in restaurant_names[:3]:
            clean_label = name
            # LINE label limit is 20 characters including prefix
            if len(clean_label) > 10:
                clean_label = clean_label[:9] + "…"
            buttons.append(
                QuickReplyButton(
                    action=PostbackAction(
                        label=f"🍴 Analyze {clean_label}",
                        data=json.dumps({
                            "action": "specific_foodie_deep_analysis",
                            "restaurant_name": name
                        }),
                        display_text=f"🔍 Deep analysis for {name}"
                    )
                )
            )
        quick_reply = QuickReply(items=buttons)

Step 3: Asynchronous Task Dispatching with FastAPI

To prevent the 3-second timeout, we use FastAPI's background tasks. The bot acknowledges the request immediately, then proceeds with the heavy lifting in the background.

Interaction Type	Latency	Solution
Initial User Click	< 500ms	Immediate `reply_message` ACK
Data Gathering	2-5s	Background Task (asyncio)
Gemini Batch Job	1-5 mins	Batch API + Webhook
Result Delivery	< 1s	LINE `push_message`

Step 4: Submitting the Batch Job

A Batch Job requires a .jsonl file where each line is a valid API request.

Example request.jsonl:

{
  "request": {
    "contents": [{ "parts": [{ "text": "Analyze the sentiment of these reviews: ..." }] }]
  }
}

You upload this file using the Gemini SDK and specify a webhook_uri. When the job status moves from JOB_STATE_PENDING to JOB_STATE_SUCCEEDED, Google will ping your endpoint.

Pro Tips for Implementation

Deduplication: Always check if a user already has a pending batch job. This prevents resource exhaustion and redundant costs.
Small Batches for Speed: While the Batch API can handle thousands of rows, using smaller batches (e.g., 1-5 complex tasks) can sometimes result in faster scheduling by the Google backend.
Monitoring: Use a tool like n1n.ai to monitor your overall token usage and success rates across different models to find the optimal balance between performance and cost.

Conclusion

By integrating the Gemini Batch API and Webhooks, we've transformed a potentially slow and expensive feature into a robust, high-performance service. The combination of FastAPI's asynchronous capabilities and the cost-saving nature of batch processing makes this architecture ideal for enterprise-level AI tools.

Get a free API key at n1n.ai

Source: https://dev.to/evanlin/gemini-api-hands-on-6im