Designing a 3-Tier Memory System for a Local AI Agent: STM, MTM, and LTM

Authors
  • avatar
    Name
    Nino
    Occupation
    Senior Tech Editor

Building a local AI agent that truly 'remembers' is one of the most significant challenges in modern LLM development. While high-performance models like those available via n1n.ai offer incredible reasoning capabilities, the persistence of user facts across sessions often falls short due to poorly designed memory pipelines. This tutorial explores the architecture of a 3-Tier Memory System—Short-Term Memory (STM), Mid-Term Memory (MTM), and Long-Term Memory (LTM)—inspired by human cognitive processes.

The Failure of Naive Memory Systems

In our initial implementation of 'Androi,' a 20B-parameter local LLM agent, we used a direct-to-LTM approach. Every time a user shared a fact (e.g., "My name is Namhyun" or "My hobby is hiking"), the agent extracted the key-value pair and stored it permanently.

However, out of 53 end-to-end tests, 8 failed consistently. The root causes were revealing:

  1. Semantic Dilution: The system was saving transient data like weather|Seoul or time|8AM. When the LTM exceeded 30 entries, semantic filtering kicked in. These 'junk' entries diluted the vector space, causing the agent to ignore critical data like height or salary during BMI calculations.
  2. The CJK Key Length Trap: A simple filter len(key) < 2 was designed to drop English noise. However, in Korean, 'Height' is '키' (1 character). This meant vital physical data was never saved, while 'Weight' (몸务게, 3 characters) was.
  3. Stale Data Persistence: When a user corrected a hobby, the agent often recalled the old value because the semantic similarity between 'hiking' and 'cycling' was too high, leading to retrieval conflicts.

To solve this, we looked toward the human brain's hippocampus-to-cortex consolidation model.

The 3-Tier Architecture: STM, MTM, and LTM

To build a reliable agent, we implemented a tiered hierarchy that mimics the Ebbinghaus Forgetting Curve and emotional significance weighting. This architecture ensures that only high-value, frequently accessed, or immutable data reaches permanent storage.

1. Short-Term Memory (STM)

STM represents the immediate conversation context (the sliding window). It is volatile and cleared once the session ends. Its primary role is to handle the immediate flow of logic and tool calls.

2. Mid-Term Memory (MTM) - The Hippocampus

MTM acts as a buffer or a 'promotion queue.' When the agent extracts a fact, it doesn't go to LTM immediately. Instead, it stays in MTM.

  • Decay Logic: Facts in MTM have a TTL (Time-to-Live). If they aren't accessed again, they decay.
  • Promotion Logic: If a fact is referenced multiple times across different turns, it is 'consolidated' into LTM.

3. Long-Term Memory (LTM) - The Cortex

LTM is the permanent vector store. We use a hybrid retrieval approach here. For small datasets (under 30 entries), we inject the entire memory set into the system prompt. For larger sets, we use semantic matching with a cosine similarity threshold of 0.3 or higher.

To maximize the performance of such agents, using a low-latency API gateway like n1n.ai is crucial for real-time memory extraction and reasoning.

Implementation: Priority-Based Classification

We moved away from 'extract and save all' to a priority-based extraction logic. We categorized user information into three distinct buckets:

PriorityCriteriaExamplesStorage Logic
HIGHImmutable core identityName, Birthday, AllergiesDirect to LTM
MIDMutable preferences/stateHobbies, Job, SalaryMTM → Promotion Queue
LOWTransient/Contextual infoWeather, News, TimestampsDiscard after session

Code Snippet: The Promotion Logic

def process_memory(extracted_fact):
    priority = classify_priority(extracted_fact) # Uses LLM to categorize

    if priority == "HIGH":
        ltm.save(extracted_fact)
    elif priority == "MID":
        if mtm.exists(extracted_fact.key):
            # If seen again, promote to LTM
            ltm.save(extracted_fact)
            mtm.delete(extracted_fact.key)
        else:
            mtm.save(extracted_fact, ttl=3600)
    else:
        pass # Ignore low priority junk

Overcoming the Semantic Matching Blind Spot

One of the most interesting findings was that LLMs often fail to find relevant data via semantic search for mathematical tasks. For example, a query 'Calculate my BMI' has low cosine similarity with the string 'weight: 72kg'.

The Fix:

  • Increase the auto_retrieve threshold from 15 to 30 entries.
  • Use 'Multi-Chain Guidance.' We instructed the agent to first 'Recall all physical attributes' before performing calculations. This forced the agent to pull 'height' and 'weight' into the context window explicitly.

Results and Performance Benchmarks

By implementing this 3-tier system and refining our tool descriptions, we achieved a 100% pass rate across 53 test categories.

Test CategoryInitial Pass RateFinal Pass RateKey Improvement
Memory CRUD80%100%MTM/LTM Promotion logic
Calc + Memory60%100%Semantic threshold increase
Multi-Chain71%100%System prompt guidance

For developers looking to replicate these results, we recommend testing your agent's reasoning capabilities using the high-speed endpoints at n1n.ai. The platform provides access to models like Claude 3.5 Sonnet and DeepSeek-V3, which are excellent for the complex 'Priority Classification' step required in this architecture.

Pro Tips for Local Agent Memory

  1. Beware of the len() filter: If you are building a multilingual agent (supporting Korean, Chinese, or Japanese), never use character length as a noise filter. Use an LLM-based 'is_useful' check instead.
  2. Tool Naming Matters: We found that changing a tool name from schedule_task to create_task solved several hallucination issues where the agent confused the scheduler with the calendar.
  3. Negative Constraints: Add "Do NOT use web_search for personal contacts" to your tool descriptions to prevent the agent from leaking local data to external search engines.

Building a memory system isn't just about storage; it's about filtration. By modeling your AI's memory after human consolidation, you create an agent that is both efficient and deeply personalized.

Get a free API key at n1n.ai