Under the Hood: How the /compact Command Optimizes Claude Code Context

As LLM-powered CLI tools like Claude Code become central to developer workflows, managing context windows and API costs is a critical challenge. Claude Code stores your entire conversation history locally in ~/.claude/projects/ as JSONL files. Without intervention, every subsequent message you send transmits the full history to Anthropic's servers, leading to mounting latency and higher token consumption. To solve this, Claude Code provides the /compact command.

In this guide, we will perform a forensic analysis of what happens when you run /compact. We will use proxy tools to intercept traffic and Python scripts to measure token reduction. If you are looking for high-performance access to models like Claude 3.5 Sonnet or DeepSeek-V3 with enterprise-grade stability, consider using n1n.ai.

The Setup: Tools for Deep Inspection

To see what's happening under the hood, we need to intercept the encrypted HTTPS traffic between the Claude CLI and the Anthropic API. We will use mitmproxy for interception, jq for JSON processing, and tiktoken for token estimation.

First, install the necessary tools:

brew install mitmproxy
brew install jq
pip install tiktoken

Initialize mitmproxy to generate the necessary SSL certificates:

mitmproxy
# Press q to exit

The certificates are generated at ~/.mitmproxy/. We must configure Claude Code to trust this local certificate authority so it doesn't reject the proxied connection.

Step 1: Establishing a Baseline

Create a fresh directory for our experiment and configure the environment variables to route traffic through the proxy:

mkdir ~/compact-experiment && cd ~/compact-experiment

export HTTPS_PROXY=http://127.0.0.1:8080
export HTTP_PROXY=http://127.0.0.1:8080
export NODE_EXTRA_CA_CERTS=~/.mitmproxy/mitmproxy-ca-cert.pem

claude

In a separate terminal, start mitmproxy and set a filter (f) to api.anthropic.com to isolate the relevant traffic.

To build a realistic context, I ran 10 sequential prompts to build a Python FastAPI server. This included adding Pydantic validation, rate limiting with slowapi, unit tests, exception handling, and middleware. By the 10th prompt, the conversation history was substantial.

Step 2: Analyzing Local Storage (Pre-Compact)

Claude Code maintains a local state. You can find the JSONL history in your home directory. Let's inspect the file size and structure before compaction.

ls ~/.claude/projects/
cp ~/.claude/projects/&lt;your-project-id&gt;/*.jsonl ~/compact-experiment/pre-compact.jsonl

# Check line count and size
wc -l pre-compact.jsonl
ls -lh pre-compact.jsonl

A typical user message in this file looks like this:

{
  "uuid": "msg_01XK...",
  "type": "human",
  "message": {
    "role": "user",
    "content": "Add rate limiting using slowapi..."
  },
  "timestamp": "2025-01-17T10:23:45.123Z",
  "sessionId": "sess_abc123..."
}

At this stage, the pre-compact.jsonl file was approximately 38 KB. However, the actual request payload sent to the API is larger because it includes the system prompt and formatting metadata.

Step 3: Measuring the API Payload

Using a Python script, we can calculate the token count of the last request captured by mitmproxy. Note that while Anthropic uses a proprietary tokenizer, tiktoken (using the cl100k_base encoding) provides a very close approximation for modern LLMs like Claude 3.5 Sonnet or GPT-4o.

# count_tokens.py
import json, os, sys, tiktoken

enc = tiktoken.get_encoding("cl100k_base")
filepath = sys.argv[1]

with open(filepath) as f:
    messages = json.load(f).get("messages", [])

tokens = sum(len(enc.encode(m.get("content", ""))) for m in messages)

print(f"Messages: {len(messages)}")
print(f"Tokens: {tokens}")
print(f"Bytes: {os.path.getsize(filepath)}")

Running this on the pre-compact request yielded:

Messages: 21
Tokens: 14,280
Bytes: 41,847

For developers managing high-frequency requests, these token counts translate directly to cost and latency. Platforms like n1n.ai help manage these costs by providing aggregated access to various providers, ensuring you always get the best rate for your token usage.

Step 4: Executing the /compact Command

Now, run the command in the Claude Code CLI:

/compact

After a few seconds, Claude confirms the compaction. Let's look at the updated local JSONL file.

cp ~/.claude/projects/&lt;your-project-id&gt;/*.jsonl ~/compact-experiment/post-compact.jsonl

Interestingly, the local file increased in size (from 38 KB to 41 KB). This confirms that compaction does not delete history from your local machine. Instead, it appends a compact_boundary document.

Step 5: The Anatomy of a Compact Boundary

Using grep and jq, we can inspect the boundary entry:

grep -i "compact_boundary" post-compact.jsonl | jq .

Key fields in this JSON object include:

subtype: compact_boundary — This tells the CLI where to start the summary.
compactMetadata.preTokens: 37418 — The total tokens processed before this checkpoint.
logicalParentUuid: Links this boundary to the last full message.

Step 6: Post-Compact Efficiency Gains

To see the real benefit, I sent one more message: "Add a /metrics endpoint." I then analyzed the new request payload captured in mitmproxy.

Metric	Pre-Compact	Post-Compact	Reduction
Messages	21	3	86%
Tokens	14,280	1,820	87.3%
Payload Size	41.8 KB	6.2 KB	85.1%

By running /compact, the system prompt now contains a concise summary of the previous 21 messages. Only the new messages following the boundary are sent as individual entries in the messages array.

Technical Implications and Pro Tips

1. Context Loss (The Trade-off)

Compaction is "lossy." While it preserves the state of the code and the general logic of the conversation, it loses granular details. If you asked the model to "remember that variable name I suggested in message #3," it might fail after a /compact if that specific detail wasn't deemed important enough for the summary.

2. When to Compact

Task Switching: If you finish the "Authentication" module and start the "Database" module, run /compact.
Latency Spikes: If the model takes too long to respond, the context window might be saturated.
Debugging: Avoid compacting while actively debugging a specific error, as the model needs the full traceback and previous attempts to find the root cause.

3. Enterprise Scalability

For large-scale teams using Claude Code, the accumulation of context can lead to significant overhead. Using an aggregator like n1n.ai allows you to monitor usage across different models (like Claude 3.5 Sonnet and DeepSeek-V3) to ensure that your context management strategies are actually saving money.

Summary

Claude Code's /compact command is a powerful tool for context management. It leverages a local-first storage strategy to keep your full history safe while using summarized "checkpoints" to keep API interactions fast and affordable. By reducing payloads by ~85%, it effectively resets the "gravity" of a long conversation, allowing for smoother development cycles.

Get a free API key at n1n.ai

Source: https://dev.to/rigby_/what-actually-happens-when-you-run-compact-in-claude-code-3kl9