Scalable RAG with .NET and Relational Databases - Part 1

Scalable RAG with .NET and Relational Databases - Part 1

RAG based AI System to Handle 100M Embeddings in a Single Node using .NET and Postgres

Retrieval-Augmented Generation (RAG) enhances Large Language Model (LLM) outputs by integrating trusted data sources. This article explores why .NET and a relational database like PostgreSQL are practical and effective alternatives to Python-based LLM orchestrators such as LlamaIndex or LangChain solutions and vector databases, and even in some cases provide better outcomes than them.

Why use RAG with .NET and Postgres?

RAG systems transform how LLMs like OpenAI, Llama, Phi, and Gemini deliver results by incorporating relevant, reliable data. They are particularly impactful in industries such as Legal, Insurance, and Media, where text-based data is critical.

In this series, we try to demonstrate building a scalable RAG system with ASP.NET to manage 1 million Wikipedia articles and retrieve relevant information from 100 million embeddings extracted from the 1 million articles.

If you are an individual or a company that has invested in existing platforms already built using .NET and want to embrace Large Language Models, RAGs, and other similar tools, we aim to prove here that it is not mandatory to use Python. While learning a new language should not be discouraged, when it comes to practicality, it's not always possible to set up new Python-based teams and processes due to economic, time, and other factors.

Even though we are using Postgres here, it's common for .NET-based projects to use SQL Server as the RDBMS. We are using Postgres due to its completely open-source nature and ability to be extended by third parties, which has positioned Postgres in a better place than SQL Server. However, SQL Server still has its own counterpart to save embeddings and perform other vector-based operations, which I might explore another day. Since both Postgres and SQL Server are fundamentally relational database engines, there is no reason SQL Server will not be able to catch up with Postgres features or extensibility, as it's all a matter of time and commitment from Microsoft regarding its roadmap.

Semantic Kernel

Semantic Kernel (SK) is an open-source framework developed by Microsoft that combines Large Language Models (LLMs), AI orchestration, and traditional programming to create intelligent and highly adaptable applications. It empowers .NET developers to integrate advanced AI capabilities seamlessly into their software, enabling features such as natural language understanding, reasoning, and autonomous decision-making.

dotnet add package Microsoft.SemanticKernel

Setting Up the ASP.NET Environment

I am using the ASP.NET 9 Web APIs standard project structure, with Controllers for handling APIs, but you are free to use your own structure. For demonstration purposes, we will use the sample WeatherForecast example found in most .NET IDEs like Rider and Visual Studio. We are using OpenAI for this example, but you are free to use any of your favorite LLMs or SLMs.

builder.Services.AddControllers();

builder.Services.AddKernel();
builder.Services.AddOpenApi();
builder.Services.AddOpenAIChatCompletion("gpt-4o", builder.Configuration["OpenAI:ApiKey"] ?? string.Empty);
builder.Services.AddOpenAITextEmbeddingGeneration("gpt-4o", builder.Configuration["OpenAI:ApiKey"] ?? string.Empty);

Setting Up the Database Environment

What Are Vector Databases? Vector databases store embeddings (numerical representations of data), making it easy to find similar items using mathematical calculations. They are commonly used for AI/ML applications where high-dimensional data needs quick access. Examples include Pinecone and Milvus.

Installing pg_vector Installing PostgreSQL with the pg_vector extension allows efficient storage and querying of embeddings.

Steps to Install and Verify:

  1. Ensure PostgreSQL is installed and running.

  2. Install the pg_vector extension via your PostgreSQL package manager.

  3. Enable the extension in your database:
    CREATE EXTENSION IF NOT EXISTS pg_vector;

  4. Confirm the extension is installed by querying:

    SELECT * FROM pg_available_extensions WHERE name IN ('pg_vector');

  5. Test vector operations by creating a sample table and executing a query to verify functionality. Creating the Database Structure

To efficiently manage articles and their corresponding embeddings, we define two tables: articles and embeddings.

Articles Table The articles table stores the metadata and information about the articles:

CREATE TABLE IF NOT EXISTS articles ( id SERIAL PRIMARY KEY, title TEXT, meta JSONB, tenant_id TEXT  );

Embeddings Table The embeddings table links embeddings to their respective articles:

CREATE TABLE IF NOT EXISTS embeddings ( id SERIAL PRIMARY KEY, article_id INT, text TEXT, embeddings VECTOR(384), sequence_id INT, page_id INT, tenant_id TEXT);

Using NpgsqlConnection and Dapper

To use PostgreSQL and Dapper in your .NET application, follow these steps: Install Required Packages Install the following NuGet packages to set up your application:

Install-Package Npgsql
Install-Package Dapper
Install-Package Pgvector.Dapper

Connecting to the Database

In my example, I am using Dapper, but you can use any of your favorite connectors that support pg_vector.

using Npgsql;
using Dapper;
using Pgvector.Dapper;
using System.Data;

var dataSourceBuilder = new NpgsqlDataSourceBuilder(configuration1.GetConnectionString("Postgres"));
        dataSourceBuilder.UseVector(); // Add vector type handler
        _dataSource = dataSourceBuilder.Build();
        // Add custom type handler
        SqlMapper.AddTypeHandler(new VectorTypeHandler());
var connection = await _dataSource.OpenConnectionAsync();

Embeddings and Chunk Size

Chunk Size is a concept you should be aware of when dealing with RAG systems. Let's say you have an article with 10,000 characters. Most LLMs do not support working with such large datasets directly due to their token size limitations. Even if we were able to overcome the token size limitation, an embedding created from a large article might not result in accurate embeddings, as it will add a lot of noise (Noise refers to random or unpredictable fluctuations in data that disrupt the ability to identify target patterns or relationships).

Because of this, we will break the large article into multiple chunks. There are many chunking strategies available now (like Fixed-size Chunking, Content Aware Chunking, Recursive Chunking, etc.), but we will use the simplest one since we are just starting. Depending on your input type, you are free to come up with your own strategy.

Our Simple Strategy: We look for a maximum of 500 characters and find an appropriate place, like a full stop (or "!", "?", ";"), line feed, or a tab.

private static List<string> SplitByMultipleDelimiters(string text)
{
    if (string.IsNullOrEmpty(text)){
            throw new ArgumentException("Text cannot be null or empty.", nameof(text));
        }

        // Comprehensive set of delimiters
        var delimiters = new[]
        {
            "\n", "\r", "\t", ".", "!", "?", ";"
        };

        // Character limit for each chunk
        const int charLimit = 500;

        // Build a regex pattern from the delimiters
        var pattern = string.Join("|", Array.ConvertAll(delimiters, Regex.Escape));
        var parts = Regex.Split(text, pattern);

        var chunks = new List<string>();
        var currentChunk = "";

        foreach (var part in parts)
        {
            var trimmedPart = part.Trim();

            if (currentChunk.Length + trimmedPart.Length <= charLimit)
            {
                currentChunk += (currentChunk.Length == 0 ? "" : " ") + trimmedPart;
            }
            else
            {
                if (!string.IsNullOrWhiteSpace(currentChunk))
                {
                    chunks.Add(currentChunk.Trim());
                }

                currentChunk = trimmedPart;
            }
        }

        if (!string.IsNullOrWhiteSpace(currentChunk))
        {
            chunks.Add(currentChunk.Trim());
        }

        return chunks;
}

Ingest Data

First, create a service that brings the ITextEmbeddingGenerationService into the scene. This is possible due to the AddOpenAITextEmbeddingGeneration command we initially executed.

public IngestDataService(ITextEmbeddingGenerationService textEmbeddingGenerationService)

Now, the code below is used to insert into the database.


var articleId = await connection.ExecuteScalarAsync<int>(
    "INSERT INTO articles (title, tenant_id) VALUES (@Title, @TenantId) RETURNING id",
    new { Title = atricleName, TenantId = tenantId });

// Prepare a strongly-typed list for batch insert
var batchInsertData = new List<EmbeddingRecord>();

var splits = SplitByMultipleDelimiters(data);

for (var i = 0; i < splits.Count; i++)
{
    var embeddings = await _textEmbeddingGenerationService.GenerateEmbeddingsAsync([splits[i]]);
    batchInsertData.Add(new EmbeddingRecord
    {
        ArticleId = articleId,
        Text = splits[i],
        Embeddings = embeddings[0].ToArray(), // Assuming embeddings match splits in count
        SequenceId = i, // Sequence number starts from 1
        PageId = 1, //todo
        TenantId = tenantId
    });
}

// Execute batch insert
const string insertQuery = """INSERT INTO embeddings (article_id, text, embeddings, sequence_id, page_id, tenant_id) 
                              VALUES (@ArticleId, @Text, @Embeddings, @SequenceId, @PageId, @TenantId)""";

await connection.ExecuteAsync(insertQuery, batchInsertData);

Now it's time to go to Postgres and check whether the Embeddings table and articles table got populated. Actually, we will be doing this part repeatedly to import 100 million embeddings. In the next part, I am planning to explain those strategies, but for now, we can keep a few articles for tests.

Ask from the RAG

Implementing the answering engine is actually straightforward. Take the question as an API endpoint, then get the embeddings of the question. We don't have to worry about chunking as we are currently assuming the questions are not as lengthy as the articles.

Pass the embedding to an SQL query for similarity search.


var embeddings = await _embeddingGenerationService.GenerateEmbeddingsAsync(yourQuestionText);
var queryVectorArray = new Vector(embeddings[0]); // Initialize with your values

const string sql = """ SELECT text FROM embeddings ORDER BY embeddings <=> @InputVector LIMIT 1;""";
var ragOutput = await connection!.QueryAsync<string>(sql, new {InputVector = inputVector        });

var semanticFunctionRag = _semanticKernel.CreateFunctionFromPrompt(fullRag + "\n using the above details answer below question \n" + text);
var  output = (await _semanticKernel.InvokeAsync(semanticFunctionRag)).GetValue<string>() + "\n" + fullRag;

Now lets check whats the result we are getting from our API

Conclusion

With the foundational setup of the RAG system completed, including the database structure, connectors, and data ingestion strategies, you are ready to proceed to the next phase. In Part 2, we will focus on advanced topics such as implementing DiskANN with the pg_vector_scale extension, optimizing query performance, and integrating scalability improvements.

Currently, we have ingested 1 million records into Postgres with pg_vector_scale, and we were able to receive similarity search results within 40ms using just a MacBook Air - M1 based setup. However, this is not sufficient for us, and we will discuss other scaling factors in our next part.

Stay tuned for detailed insights and practical implementations in the continuation of this series.