Scalable RAG with .NET and Relational Databases

Efficient storage and retrieval algorithms are critical to the performance of embedded databases because they directly impact how quickly and accurately queries can be executed. In the context of a Retrieval-Augmented Generation (RAG) system, where embeddings are used to find relevant data, these algorithms determine the speed and quality of nearest neighbor searches.

For example, embeddings often exist as high-dimensional vectors. Performing brute-force comparisons between these vectors is computationally expensive, especially as the dataset grows to millions or billions of records. Storage strategies, such as partitioning and indexing, combined with retrieval algorithms like Approximate Nearest Neighbor (ANN) searches, ensure that the system remains performant.

In our scenario, we leverage PostgreSQL with the pg_vector extension and advanced algorithms like DiskANN to optimize how embeddings are stored, indexed, and retrieved. This ensures scalability without sacrificing query speed, even when handling large-scale datasets.

Approximate Nearest Neighbors Oh Yeah (ANNOY)

ANNOY is an open-source library developed by Spotify for performing approximate nearest neighbor searches. It is designed for applications where fast retrieval is more important than exact results. ANNOY builds a tree structure to partition the embedding space and performs searches efficiently by traversing these trees.

Key features of ANNOY include:

Approximation: Trades off exact accuracy for faster query times.
Disk-Based Indexing: Supports large datasets by storing indices on disk, which minimizes memory usage.
Multi-Tree Structure: Utilizes multiple random projection trees to improve accuracy during searches.

While ANNOY is highly efficient for specific use cases, it is better suited for static datasets, as the tree structure needs to be rebuilt when the dataset changes. In our RAG system, ANNOY could be considered for offline or static embedding searches, but for dynamic and large-scale systems, DiskANN provides better scalability and flexibility.

DiskANN: Efficient Disk-Based Approximate Nearest Neighbor Search

DiskANN is a state-of-the-art algorithm designed for high-speed approximate nearest neighbor (ANN) searches on massive datasets. Unlike in-memory ANN algorithms, DiskANN optimizes the use of both memory and disk to handle datasets that exceed available RAM.

Key Features of DiskANN:

Memory Efficiency: Keeps frequently accessed data in RAM while storing less-used data on disk.
Scalability: Capable of managing billions of high-dimensional embeddings without requiring extensive memory resources.
Fast Query Performance: Achieves low latency by intelligently balancing disk I/O and in-memory operations.

How DiskANN Works:

DiskANN builds a graph structure on disk, which represents the relationships between embeddings. When a query is made, the algorithm:

Traverses the graph to find a starting point.
Expands the search to nearby nodes, prioritizing paths likely to yield accurate results.
Uses caching and other optimizations to reduce disk read times.

In our RAG scenario, DiskANN integrates seamlessly with PostgreSQL's pg_vector_scale extension to provide high-performance vector searches. This combination ensures that even with large-scale data, our system can handle real-time queries efficiently.

Indexing in PostgreSQL with Embeddings

To ensure high performance when querying embeddings, it is essential to create and optimize indices in the database. Here’s how you can set up indexing in our example:

Adding an Index for Vector Searches

PostgreSQL’s pg_vector extension provides support for Approximate Nearest Neighbor (ANN) indexing. Add an ann index to optimize vector searches:

CREATE INDEX embeddings_vector_idx ON embeddings USING ann (embeddings);

This index enables fast similarity searches based on cosine similarity or other metrics.

There is no difference is how we search as we have changed how indexing works, but the rest of the search remains same.

SELECT id, article_id, text, embeddings FROM embeddings ORDER BY embeddings <-> '[0.1, 0.2, 0.3, ...]' LIMIT 5;

Benchmarks

Below are the benchmarks using my MacBook Air M1 (8GB) with default settings. The results are exceptionally impressive for an 8GB machine, showing how an RDBMS can achieve high performance with very limited resources. This even proves that specialized databases are not necessary while enjoying all the features of an RDBMS.

Scalable RAG with .NET and Relational Databases - Part 2