Building an Explainable Graph RAG System with SAGE

The evolution of Retrieval-Augmented Generation (RAG) has moved beyond simple vector similarity. While traditional RAG treats data as isolated 'chunks' in a flat vector space, real-world information is inherently interconnected. This is where SAGE (Structure Aware Graph Expansion) enters the frame, offering a robust methodology for building explainable, multi-hop retrieval systems. By integrating high-performance LLM backbones from n1n.ai, developers can now implement these complex graph structures with unprecedented speed and reliability.

The Limitations of Flat Retrieval

Standard RAG architectures often suffer from 'semantic gaps.' In a flat retrieval model, the system calculates the distance between a query embedding and document embeddings. If a query requires connecting two disparate pieces of information that do not share direct semantic overlap—such as a band's formation date and a specific album's theme—the retriever may fail to bridge the gap.

For instance, if you ask, "Which rock opera was released by the band formed in 1965?", a flat retriever might find the band's history (mentioning 1965) or the album 'The Wall' (mentioning rock opera), but it rarely ranks both high enough to synthesize an answer. SAGE solves this by building a structural graph offline, capturing the 'connective tissue' between heterogeneous data points.

Core Architecture: The SAGE Framework

SAGE (Structure Aware Graph Expansion) focuses on creating an auditable retrieval substrate. Unlike black-box embedding models, SAGE makes every retrieval step deterministic and inspectable. This is critical for enterprise applications where explainability is not just a feature, but a regulatory requirement.

1. Offline Graph Construction

The process begins by processing heterogeneous data chunks. In our example, we use the legendary band Pink Floyd as a dataset, consisting of 8 distinct nodes:

Band Identity: Pink Floyd (London, 1965).
Discography: Dark Side of the Moon, The Wall, Wish You Were Here.
Personnel: David Gilmour, Roger Waters, Syd Barrett.
Solo Projects: David Gilmour’s 'About Face'.

SAGE performs an $O(N^2)$ similarity analysis across all pairs. Using models like Claude 3.5 Sonnet or GPT-4o via n1n.ai, we can generate high-dimensional embeddings that capture the nuanced relationships between these entities.

2. The 95th Percentile Pruning Rule

A common pitfall in Graph RAG is the 'hairball' effect—where every node is connected to every other node, leading to noise. SAGE implements a rigorous pruning strategy based on statistical thresholds.

Specifically, the system calculates the cosine similarity for every possible edge. To ensure high precision, only edges that fall into the top 5% of similarity scores (the 95th percentile) are retained. This ensures that the resulting graph reflects only the strongest semantic neighborhoods. For example, David Gilmour and Roger Waters will have a surviving edge due to their high shared context, while a random connection between a city and a solo album might be pruned if the similarity is < 0.85.

Data Representation with JSON-LD

SAGE utilizes JSON-LD (JSON for Linked Data) as its primary export format. This choice is strategic. JSON-LD allows the graph to be interoperable with existing Knowledge Graph standards and web-scale Linked Data systems.

{
  "@context": {
    "schema": "https://schema.org/",
    "sage": "urn:sage:ontology:",
    "isPartOf": { "@type": "@id" }
  },
  "@type": "sage:KnowledgeGraph",
  "sage:framework": "SAGE (Structure Aware Graph Expansion)",
  "@graph": [
    {
      "@type": "CreativeWork",
      "@id": "urn:sage:chunk:pf-band",
      "name": "Pink Floyd Band Info",
      "description": "Pink Floyd is a British rock band formed in London in 1965.",
      "mentions": { "London": "CITY", "Pink Floyd": "BAND", "1965": "DATE" }
    },
    {
      "@type": "Relationship",
      "source": { "@id": "urn:sage:chunk:pf-band" },
      "target": { "@id": "urn:sage:chunk:pf-album-wall" },
      "relationshipType": "DOC_DOC",
      "weight": 0.85
    }
  ]
}

By using this format, the graph becomes more than just a list of vectors; it becomes a structured map that AI agents can traverse logically.

The Multi-Hop Retrieval Process

When a complex query enters the system, SAGE follows a two-stage execution path:

Seed Selection: The retriever identifies the 'Seed Node.' In our query about the band formed in 1965, the system matches the 'Pink Floyd Band Info' node because it explicitly contains the year 1965.
Graph Expansion: Instead of stopping at the seed, the retriever 'walks' along the edges defined in the JSON-LD structure. It follows the DOC_DOC relationship to find 'The Wall.' Because 'The Wall' is tagged as a 'rock opera,' the system now has all the components required to answer the query.

This 'walk' is what enables multi-hop reasoning. The LLM acts as the reasoning engine that decides which edges to follow. Using the unified API at n1n.ai, you can leverage models with large context windows to process these expanded graph contexts without hitting token limits.

Pro Tip: Scaling SAGE for Production

While $O(N^2)$ is feasible for small datasets, scaling to millions of documents requires optimization. We recommend using an Approximate Nearest Neighbor (ANN) search to pre-filter potential edges before calculating the exact similarity for the 95th percentile pruning. This hybrid approach maintains the structural integrity of SAGE while ensuring the offline build process remains performant.

Furthermore, by exposing the retrieval engine as a callable tool (using frameworks like Tools4AI), your AI agents can decide dynamically whether they need to expandNode or if the seedNode provides sufficient information. This agency is the hallmark of advanced RAG implementations.

Conclusion

SAGE transforms RAG from a simple search-and-summarize task into a sophisticated knowledge discovery process. By leveraging structural bridges and statistical pruning, it eliminates the 'black box' nature of traditional vector search. To start building your own Graph RAG systems with the world's most powerful models, integrate your workflow with n1n.ai.

Get a free API key at n1n.ai

Source: https://dev.to/vishalmysore/building-an-explainable-graph-rag-system-with-sage-json-ld-percentile-pruning-multi-hop-10p2