Designing a 3-Tier Memory System for a Local AI Agent: STM, MTM, and LTM
- Authors

- Name
- Nino
- Occupation
- Senior Tech Editor
Building a local AI agent that truly 'remembers' is one of the most significant challenges in modern LLM development. While high-performance models like those available via n1n.ai offer incredible reasoning capabilities, the persistence of user facts across sessions often falls short due to poorly designed memory pipelines. This tutorial explores the architecture of a 3-Tier Memory System—Short-Term Memory (STM), Mid-Term Memory (MTM), and Long-Term Memory (LTM)—inspired by human cognitive processes.
The Failure of Naive Memory Systems
In our initial implementation of 'Androi,' a 20B-parameter local LLM agent, we used a direct-to-LTM approach. Every time a user shared a fact (e.g., "My name is Namhyun" or "My hobby is hiking"), the agent extracted the key-value pair and stored it permanently.
However, out of 53 end-to-end tests, 8 failed consistently. The root causes were revealing:
- Semantic Dilution: The system was saving transient data like
weather|Seoulortime|8AM. When the LTM exceeded 30 entries, semantic filtering kicked in. These 'junk' entries diluted the vector space, causing the agent to ignore critical data like height or salary during BMI calculations. - The CJK Key Length Trap: A simple filter
len(key) < 2was designed to drop English noise. However, in Korean, 'Height' is '키' (1 character). This meant vital physical data was never saved, while 'Weight' (몸务게, 3 characters) was. - Stale Data Persistence: When a user corrected a hobby, the agent often recalled the old value because the semantic similarity between 'hiking' and 'cycling' was too high, leading to retrieval conflicts.
To solve this, we looked toward the human brain's hippocampus-to-cortex consolidation model.
The 3-Tier Architecture: STM, MTM, and LTM
To build a reliable agent, we implemented a tiered hierarchy that mimics the Ebbinghaus Forgetting Curve and emotional significance weighting. This architecture ensures that only high-value, frequently accessed, or immutable data reaches permanent storage.
1. Short-Term Memory (STM)
STM represents the immediate conversation context (the sliding window). It is volatile and cleared once the session ends. Its primary role is to handle the immediate flow of logic and tool calls.
2. Mid-Term Memory (MTM) - The Hippocampus
MTM acts as a buffer or a 'promotion queue.' When the agent extracts a fact, it doesn't go to LTM immediately. Instead, it stays in MTM.
- Decay Logic: Facts in MTM have a TTL (Time-to-Live). If they aren't accessed again, they decay.
- Promotion Logic: If a fact is referenced multiple times across different turns, it is 'consolidated' into LTM.
3. Long-Term Memory (LTM) - The Cortex
LTM is the permanent vector store. We use a hybrid retrieval approach here. For small datasets (under 30 entries), we inject the entire memory set into the system prompt. For larger sets, we use semantic matching with a cosine similarity threshold of 0.3 or higher.
To maximize the performance of such agents, using a low-latency API gateway like n1n.ai is crucial for real-time memory extraction and reasoning.
Implementation: Priority-Based Classification
We moved away from 'extract and save all' to a priority-based extraction logic. We categorized user information into three distinct buckets:
| Priority | Criteria | Examples | Storage Logic |
|---|---|---|---|
| HIGH | Immutable core identity | Name, Birthday, Allergies | Direct to LTM |
| MID | Mutable preferences/state | Hobbies, Job, Salary | MTM → Promotion Queue |
| LOW | Transient/Contextual info | Weather, News, Timestamps | Discard after session |
Code Snippet: The Promotion Logic
def process_memory(extracted_fact):
priority = classify_priority(extracted_fact) # Uses LLM to categorize
if priority == "HIGH":
ltm.save(extracted_fact)
elif priority == "MID":
if mtm.exists(extracted_fact.key):
# If seen again, promote to LTM
ltm.save(extracted_fact)
mtm.delete(extracted_fact.key)
else:
mtm.save(extracted_fact, ttl=3600)
else:
pass # Ignore low priority junk
Overcoming the Semantic Matching Blind Spot
One of the most interesting findings was that LLMs often fail to find relevant data via semantic search for mathematical tasks. For example, a query 'Calculate my BMI' has low cosine similarity with the string 'weight: 72kg'.
The Fix:
- Increase the
auto_retrievethreshold from 15 to 30 entries. - Use 'Multi-Chain Guidance.' We instructed the agent to first 'Recall all physical attributes' before performing calculations. This forced the agent to pull 'height' and 'weight' into the context window explicitly.
Results and Performance Benchmarks
By implementing this 3-tier system and refining our tool descriptions, we achieved a 100% pass rate across 53 test categories.
| Test Category | Initial Pass Rate | Final Pass Rate | Key Improvement |
|---|---|---|---|
| Memory CRUD | 80% | 100% | MTM/LTM Promotion logic |
| Calc + Memory | 60% | 100% | Semantic threshold increase |
| Multi-Chain | 71% | 100% | System prompt guidance |
For developers looking to replicate these results, we recommend testing your agent's reasoning capabilities using the high-speed endpoints at n1n.ai. The platform provides access to models like Claude 3.5 Sonnet and DeepSeek-V3, which are excellent for the complex 'Priority Classification' step required in this architecture.
Pro Tips for Local Agent Memory
- Beware of the
len()filter: If you are building a multilingual agent (supporting Korean, Chinese, or Japanese), never use character length as a noise filter. Use an LLM-based 'is_useful' check instead. - Tool Naming Matters: We found that changing a tool name from
schedule_tasktocreate_tasksolved several hallucination issues where the agent confused the scheduler with the calendar. - Negative Constraints: Add "Do NOT use web_search for personal contacts" to your tool descriptions to prevent the agent from leaking local data to external search engines.
Building a memory system isn't just about storage; it's about filtration. By modeling your AI's memory after human consolidation, you create an agent that is both efficient and deeply personalized.
Get a free API key at n1n.ai