Building a Global Directory of 28,577 MCP Servers: Engineering Discovery at Scale

The rise of the Model Context Protocol (MCP) has fundamentally changed how we connect LLMs to local and remote data sources. However, as the ecosystem expands, a significant problem has emerged: discovery. Finding a reliable MCP server often involves sifting through outdated GitHub repositories or hand-curated lists that only scratch the surface. To solve this, I built a comprehensive directory of 28,577 verified MCP servers. This project wasn't just about data collection; it was an exercise in scaling a technical index while keeping infrastructure costs near zero. To power these types of high-utility AI applications, developers increasingly rely on high-speed, stable aggregators like n1n.ai to ensure their agents remain responsive.

The Problem: Fragmentation in the MCP Ecosystem

MCP server discovery is currently in a state of chaos. Most developers find servers through "awesome-list" repositories on GitHub, which are frequently unmaintained. Many aggregators only show a fraction of the available servers, often filtering them based on manual curation. Furthermore, there is no standardized way to browse by functionality. If you need a PostgreSQL-MCP wrapper, you might find a dozen options but have no way of knowing which one actually implements the protocol correctly or has been updated recently.

I wanted a flat, searchable index of every public MCP server, categorized by intent. The goal was simple: provide a single source of truth that separates the signal from the noise. To achieve this, I built a scanner that pulls candidates from GitHub code search, the GitHub API, and several publishing aggregators. This data is then deduplicated by owner/name and passed through a rigorous classification pipeline.

The Classification Pipeline: Gemini vs. Claude

With over 30,000 candidates, manual classification was impossible. Every candidate repo is run through a classifier that performs two tasks:

Verification: Confirms the repository actually implements the MCP protocol rather than just mentioning it in a README.
Categorization: Assigns one of 95 specific categories and extracts a concise description.

For the classifier, I chose Gemini Flash via OpenRouter. While I performed spot checks using Claude 3.5 Sonnet and GPT-4o, the agreement rate was high enough that Gemini Flash's speed and cost-efficiency made it the clear winner for bulk processing. The total spend for classifying the entire pool was approximately $8. This efficiency is critical when building developer tools; similarly, using n1n.ai allows developers to access multiple high-end models like OpenAI o3 or DeepSeek-V3 through a single, cost-effective interface.

Architecture: Solving the Static Asset Limit

When deploying a site with 28,000+ individual pages, standard deployment strategies often fail. I considered three architectural approaches:

Strategy	Pros	Cons
Pure Static	Fastest load times, $0 hosting.	Cloudflare Workers Free Tier has a 20,000 static asset limit. I would hit the cap instantly.
Pure SSR	No build-time limits, dynamic data.	Every hit triggers a worker invocation, increasing costs and latency.
Hybrid (Chosen)	Best of both worlds.	Requires complex configuration with Astro and Cloudflare D1.

I opted for a hybrid approach using Astro Hybrid Mode and the @astrojs/cloudflare adapter. I prerendered the top 100 high-traffic pages (Home, Browse, Search, Categories) into static HTML. The 28,000+ per-server pages are dynamic. On the first request, they query Cloudflare D1 (a serverless SQLite database at the edge). The rendered HTML is then cached by Cloudflare’s CDN for 24 hours. Subsequent visits to popular server pages are as fast as static assets, but I stay well under the asset limit.

Overcoming Cloudflare D1 Constraints

Implementing Cloudflare D1 presented its own set of challenges. During the initial data export, I attempted to batch INSERT statements with 500 rows each. This immediately triggered a SQLITE_TOOBIG error.

Pro Tip: Cloudflare D1 caps a single SQL statement at 100KB. This limit is not prominently featured in the main documentation. To fix this, I reduced the batch size to 50 rows per INSERT. This increased the total number of statements to around 570, which the Wrangler CLI handled without issue.

The .assetsignore Troubleshooting

A particularly frustrating hurdle involved the Astro Cloudflare adapter. By default, it outputs both the _worker.js (the SSR entry point) and the prerendered HTML into the same /dist directory. Wrangler then attempts to upload _worker.js as a static asset. This not only consumes an asset slot but often fails due to size restrictions.

The solution is to create a .assetsignore file in the public/ directory with the following entries:

_worker.js
_routes.json

This ensures that Cloudflare treats the worker as a functional script rather than a static file, a fix that saved hours of debugging.

Results and Performance Metrics

The directory, hosted at safemcp.info, launched with 28,577 verified servers across 95 categories. Within the first 24 hours, the launch post reached #1 on the MCP subreddit.

One interesting finding was that roughly 20% of servers fell into an "Other" category. Because the MCP ecosystem is so new, developers are building highly niche tools that don't fit traditional buckets. Instead of hiding this, I display it openly to showcase the diversity of the protocol.

When building agents that utilize these servers, the bottleneck is often the API latency of the underlying LLM. Using a provider like n1n.ai ensures that your calls to Claude 3.5 Sonnet or other models are routed through the fastest possible paths, complementing the speed of the edge-hosted directory.

Security and Discovery Signals

Each server in the directory is assigned a "Discovery Score" from 0-100 based on metadata (GitHub stars, description quality, and presence across multiple sources). It is important to note that this is a signal for discovery, not a comprehensive security audit. Before connecting an MCP server to your local environment or sensitive data, always review the source code.

This project demonstrates that with the right combination of edge computing and LLM-assisted classification, we can bring order to even the most fragmented technical ecosystems.

Get a free API key at n1n.ai

Source: https://dev.to/safemcp/how-i-built-a-directory-of-28577-mcp-servers-32m8