Blog

Building a RAG powered chat bot

Building a Production-Ready RAG System - Technical Deep Dive
Ben Kissi

Ben Kissi

Fullstack AI Engineer

AIRAGNuxtSupabasePythonFastAPICeleryOpenAI

23 Dec 2025

19 min read

Introduction

In the era of AI-powered applications, Retrieval-Augmented Generation (RAG) has emerged as a powerful pattern for building intelligent systems that can answer questions based on specific knowledge bases. Bot42 is a production-ready RAG system that demonstrates how to architect and implement a scalable document processing and chat platform using modern web technologies.

Bot42 combines the best of multiple technology stacks: Nuxt 4 for a full-stack TypeScript frontend, FastAPI and Celery for robust Python-based document processing, Supabase for database and vector storage, and OpenAI for embeddings and chat completion. The system is designed to handle document uploads, automatic processing, embedding generation, and intelligent chat interactions with source citations.

This article provides a comprehensive technical deep dive into Bot42's architecture, exploring three core components: the frontend and server routes, the document processor, and the embeddable widget. We'll examine the implementation details, architectural decisions, and patterns that make this system production-ready.

Full-Stack TypeScript: Bot42 leverages Nuxt 4's full-stack capabilities, allowing you to write both frontend and backend code in TypeScript within a single codebase. This reduces context switching and improves type safety across the entire application.

Section 1: Frontend and Server Routes

1.1 Architecture Overview

Bot42's frontend is built on Nuxt 4, a full-stack Vue.js framework that provides server-side rendering (SSR), API routes, and seamless integration with Supabase for authentication and data management. The architecture leverages Nuxt's file-based routing system, where files in the server/api/ directory automatically become API endpoints.

// bot42-app/nuxt.config.ts
export default defineNuxtConfig({
  modules: ["@nuxt/ui", "@nuxtjs/supabase", "@pinia/nuxt", "nuxt-workers"],
  supabase: {
    url: process.env.NUXT_PUBLIC_SUPABASE_URL,
    key: process.env.NUXT_PUBLIC_SUPABASE_KEY,
    redirect: true,
  },
});

The application uses Supabase for authentication, which provides Row Level Security (RLS) policies that automatically filter data based on the authenticated user's permissions. This eliminates the need for manual authorization checks in most API routes.

RLS Benefits: Row Level Security policies in Supabase automatically enforce data access rules at the database level. This means you don't need to remember to add authorization checks in every API route - the database handles it automatically based on the authenticated user's context.

1.2 Chat API Implementation

Chat functionality is available in both the dashboard and the widget, but serves different purposes:

  • Dashboard Chat (bot42-app/server/api/chat/app.ts): Research-oriented chat for authenticated users with full reference/citation support. Designed for in-depth document analysis and research workflows.
  • Widget Chat (bot42-app/server/api/widget/chat.post.ts): Public-facing, support-oriented chat with CORS support. Optimized for customer support and quick question-answering.

Both routes delegate to the same Supabase Edge Function for RAG processing and handle streaming and non-streaming modes:

Authentication & Authorization

export default defineEventHandler(async (event) => {
  const supabase = await serverSupabaseClient(event);
  const {
    data: { user },
  } = await supabase.auth.getUser();

  if (!user) {
    throw createError({
      statusCode: 401,
      message: "Unauthorized: User not authenticated",
    });
  }

  // Widget access verification if widget_id is provided
  if (widget_id) {
    const { data: widget } = await supabase
      .from("widgets")
      .select("id, organization, is_active")
      .eq("widget_id", widget_id)
      .single();

    // Verify user has access to widget's organization
    // ...
  }
});

Streaming Mode Implementation

The Nuxt API route calls the Supabase Edge Function which handles the RAG pipeline. The streaming mode uses Server-Sent Events (SSE) to deliver real-time responses:

// Streaming mode
const response = await callSupabaseFunction(
  mm,
  finalEmbedding,
  widget_id,
  true
);

const transformStream = new TransformStream();
const writer = transformStream.writable.getWriter();
const encoder = new TextEncoder();

const reader = response.body.getReader();
const decoder = new TextDecoder();

const metaData = [];
let references: any[] = [];
let fullResponse = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const text = decoder.decode(value, { stream: true });

  for (const line of text.split("\n")) {
    if (!line.trim()) continue;
    const obj = JSON.parse(line);

    if (obj.type === "metadata") {
      // Handle metadata and references
      if (obj.value && Array.isArray(obj.value)) {
        metaData.push(...obj.value);
      }
      if (obj.references && Array.isArray(obj.references)) {
        references = obj.references;
      }
    } else {
      // Stream text chunks
      const textValue = obj.value || "";
      fullResponse += textValue;
      await writer.write(
        encoder.encode(
          `data: {"type":"text-delta","id":"${messageId}","delta":"${escapedText}"}\n\n`
        )
      );
    }
  }
}

// Save bot response with metadata
if (conversation_id && fullResponse) {
  const serviceSupabase = serverSupabaseServiceRole(event);
  const metadataToSave =
    metaData.length > 0 || references.length > 0
      ? {
          metadata: metaData,
          references: references,
        }
      : null;

  await serviceSupabase.from("messages").insert({
    conversation_id,
    sender: "bot",
    message: fullResponse,
    ...(metadataToSave && { metadata: metadataToSave }),
  });
}

Key Features:

  • Dual Mode Support: Handles both streaming (real-time) and generation (complete response) modes
  • Metadata Persistence: Stores references and citations in the metadata JSONB column for later retrieval
  • Service Role Client: Uses serverSupabaseServiceRole to bypass RLS when saving messages
  • Error Handling: Comprehensive error handling with proper HTTP status codes
Service Role Client: The serverSupabaseServiceRole client bypasses Row Level Security policies. Only use this when absolutely necessary (like saving messages on behalf of users) and ensure you implement your own authorization checks. Never expose service role credentials to the client.

1.3 Dashboard Pages Structure

The dashboard is organized into several key pages:

Folder Management (bot42-app/app/pages/dashboard/folders/[id].vue)

  • Document upload with drag-and-drop
  • Real-time processing progress tracking
  • Document list with status badges
  • Auto-refresh during processing (5-second intervals)

Chat Interface (bot42-app/app/pages/dashboard/chat/index.vue)

  • Research-oriented: Designed for in-depth document analysis and research workflows
  • Multi-widget support - select which widget's knowledge base to query
  • Conversation management (create, edit, delete)
  • Message history with metadata restoration
  • Full reference/citation support: Interactive popovers showing source documents, page numbers, and section content
  • Metadata panel: Sidebar displaying all references with document names, folders, and content previews
  • Uses the same Supabase Edge Function as the widget for RAG processing
Metadata Restoration: When loading conversation history, the system automatically restores metadata and references from the last bot message. This ensures citations remain available even after page refreshes, providing a seamless user experience.

The chat interface demonstrates sophisticated state management:

// Load conversation history with metadata restoration
async function loadConversationHistory() {
  const { data: msgs } = await supabase
    .from("messages")
    .select("id, sender, message, created_at, metadata")
    .eq("conversation_id", conversationId.value)
    .order("created_at", { ascending: true });

  // Restore metadata and references from last bot message
  const lastBotMessage = [...msgs]
    .reverse()
    .find((msg: any) => msg.sender === "bot" && msg.metadata);

  if (lastBotMessage && lastBotMessage.metadata) {
    const metadataObj = lastBotMessage.metadata;
    if (metadataObj.metadata && Array.isArray(metadataObj.metadata)) {
      metadataList.value = metadataObj.metadata;
    }
    if (metadataObj.references && Array.isArray(metadataObj.references)) {
      const refMap: Record<number, any> = {};
      metadataObj.references.forEach((ref: any) => {
        if (ref.number) {
          refMap[ref.number] = ref;
        }
      });
      referencesMap.value = refMap;
    }
  }
}

1.4 Server-Side Rendering and API Routes

Nuxt's server routes provide a clean way to handle backend logic without a separate API server. The routes automatically have access to:

  • Request/Response objects: Via H3 event handlers
  • Supabase clients: Both user-scoped and service role clients
  • Runtime config: Environment variables and secrets
  • Type safety: Full TypeScript support

Widget API Routes (bot42-app/server/api/widget/chat.post.ts)

The widget API routes handle public-facing, support-oriented chat requests with CORS support. This is separate from the dashboard chat route (which is research-oriented) but uses the same Supabase Edge Function for RAG processing. The widget route is optimized for customer support scenarios:

export default defineEventHandler(async (event) => {
  const origin = getHeader(event, "origin");
  const { widget_id, conversation_id, messages } = await readBody(event);

  // Verify widget exists and is active
  const { data: widget } = await supabase
    .from("widgets")
    .select("id, organization, is_active, allowed_domains")
    .eq("widget_id", widget_id)
    .single();

  // CORS validation
  const allowed =
    widget.allowed_domains?.includes(origin) ||
    widget.allowed_domains?.length === 0;

  if (!allowed) {
    setResponseStatus(event, 403);
    return { error: "Origin not allowed" };
  }

  // Process chat request...
});

Section 2: Document Processor

2.1 Python Processor Architecture

The document processor is built on FastAPI and Celery, providing a scalable, asynchronous processing pipeline. The architecture separates concerns into distinct services:

bot42-processor/
├── src/
│   ├── api.py                    # FastAPI application
│   ├── celery_worker.py          # Celery configuration
│   ├── tasks.py                  # Celery task definitions
│   ├── services/
│   │   ├── document_processor.py
│   │   └── embedding_service.py
│   └── routes/v1/
│       └── controller.py         # API endpoints
Chat Handling: Chat functionality is handled by the Supabase Edge Function (bot42-app/supabase/functions/chat/index.ts), not the FastAPI processor. The processor focuses solely on document processing and embedding generation, while chat requests are routed through Supabase for better integration with the database and vector search capabilities.

FastAPI Application Setup

# bot42-processor/src/api.py
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from src.routes.v1.router import router as v1_router

app = FastAPI(title="Bot42 Processor", version="1.0.0")

app.add_middleware(
    CORSMiddleware,
    allow_origins=origins,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

app.include_router(v1_router)

The application uses Docker Compose for orchestration, running:

  • FastAPI (port 8000)
  • Celery Worker
  • RabbitMQ (message broker)
  • Redis (result backend)
  • Flower (task monitoring)
Docker Compose Benefits: Using Docker Compose simplifies local development and ensures consistency across environments. All services (FastAPI, Celery, RabbitMQ, Redis, Flower) can be started with a single docker-compose up command, making it easy for developers to get started quickly.

2.2 Document Processing Pipeline

The document processing pipeline is triggered automatically when a file is uploaded to Supabase Storage. Here's the complete flow:

1. Storage Trigger

When a file is uploaded, a PostgreSQL trigger fires:

-- bot42-app/supabase/migrations/20250706215025_documents.sql
create trigger on_file_upload
after insert on storage.objects
for each row
when (new.bucket_id = 'organization-files')
execute procedure private.handle_storage_update();

2. Trigger Function

The trigger function creates a document record and calls the processing endpoint:

create or replace function private.handle_storage_update()
returns trigger as $$
declare
  document_id bigint;
  secret_key text;
begin
  -- Create document record
  insert into documents (organization_id, name, storage_object_id, created_by)
  values (org_id, filename, new.id, new.owner)
  returning id into document_id;

  -- Get API key from Vault
  select decrypted_secret into secret_key
  from vault.decrypted_secrets
  where name = 'INTERNAL_FUNCTION_KEY';

  -- Trigger processing via HTTP POST
  perform net.http_post(
    url := 'http://bot42-processor:8000/api/v1/process',
    headers := jsonb_build_object('apikey', secret_key),
    body := jsonb_build_object('document_id', document_id)
  );

  return null;
end;
$$;

3. Document Processing Service

The DocumentProcessingService handles the core processing logic:

# bot42-processor/src/services/document_processor.py
class DocumentProcessingService:
    def __init__(self, supabase_client: Client):
        self.supabase = supabase_client
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=CHUNK_SIZE,  # 500 characters
            chunk_overlap=CHUNK_OVERLAP,  # 100 characters
        )

    async def process_document(self, document_id: int) -> dict:
        # 1. Get document info from view
        result = self.supabase.table("documents_with_storage_path") \
            .select("*") \
            .eq("id", document_id) \
            .single() \
            .execute()

        # 2. Download file from storage
        file_data = self.supabase.storage \
            .from_("organization-files") \
            .download(document["storage_object_path"])

        # 3. Parse document based on file type
        ext = storage_path.split(".")[-1].lower()
        docs = await self._parse_document(file_data, ext, document_id, folder_id, storage_path)

        # 4. Split into chunks
        chunks = self.text_splitter.split_documents(docs)

        # 5. Bulk insert sections
        sections_data = [{
            "document_id": document_id,
            "content": chunk.page_content,
            "metadata": chunk.metadata or {},
            "folder_id": folder_id,
        } for chunk in chunks]

        self.supabase.table("document_sections").insert(sections_data).execute()

        return {"success": True, "sections": len(chunks)}

PDF Parsing

For PDF files, the service uses pypdf:

if ext == "pdf":
    from pypdf import PdfReader
    pdf_file = io.BytesIO(file_data)
    pdf_reader = PdfReader(pdf_file)

    for page_num, page in enumerate(pdf_reader.pages):
        text = page.extract_text()
        if text.strip():
            docs.append(Document(
                page_content=text,
                metadata={
                    "document_id": document_id,
                    "folder_id": folder_id,
                    "source": storage_path,
                    "format": "pdf",
                    "page": page_num + 1,
                }
            ))

Chunking Strategy

The RecursiveCharacterTextSplitter uses a hierarchical approach:

  1. Split by paragraphs
  2. Split by sentences
  3. Split by words
  4. Respects chunk size (500 chars) and overlap (100 chars)

This ensures semantic coherence while maintaining consistent chunk sizes for embedding generation.

Chunk Size Tuning: The default chunk size of 500 characters with 100-character overlap works well for most documents. However, you may want to adjust these values based on your document types:
- Smaller chunks (300-400 chars): Better for technical documentation with code snippets
- Larger chunks (800-1000 chars): Better for narrative content like articles or books
- Overlap: Maintains context between chunks - 20% overlap is a good rule of thumb

2.3 Embedding Generation

The embedding service generates vector representations using OpenAI's text-embedding-3-small model:

# bot42-processor/src/services/embedding_service.py
class EmbeddingService:
    def __init__(self, supabase_client: Client):
        self.supabase = supabase_client
        self.embeddings = OpenAIEmbeddings(
            model=EMBEDDING_MODEL,  # "text-embedding-3-small"
            api_key=OPENAI_API_KEY,
        )

    async def generate_embeddings(
        self,
        ids: List[int],
        table: str,
        content_column: str,
        embedding_column: str
    ) -> dict:
        # Fetch rows without embeddings
        result = self.supabase.table(table) \
            .select(f"id, {content_column}") \
            .in_("id", ids) \
            .is_(embedding_column, None) \
            .execute()

        rows = result.data or []
        success_count = 0
        error_count = 0

        # Process each row
        for row in rows:
            try:
                # Generate embedding
                vector = await self.embeddings.aembed_query(row[content_column])

                # Update row with embedding
                self.supabase.table(table) \
                    .update({embedding_column: vector}) \
                    .eq("id", row["id"]) \
                    .execute()

                success_count += 1
            except Exception as e:
                error_count += 1
                # Log error but continue processing

        return {
            "processed": len(rows),
            "success": success_count,
            "errors": error_count,
        }

Performance Considerations:

  • Async Processing: Uses aembed_query for non-blocking API calls
  • Batch Processing: Processes multiple sections in a single task
  • Error Resilience: Individual failures don't stop the entire batch
  • Idempotency: Only processes rows without existing embeddings
Embedding Performance: OpenAI's text-embedding-3-small model provides a good balance between cost and quality. For better quality (at higher cost), consider text-embedding-3-large. The async aembed_query method allows processing multiple embeddings concurrently, significantly improving throughput for large documents.

2.4 Celery Task Management

Celery tasks provide asynchronous processing with progress tracking:

# bot42-processor/src/tasks.py
@celery_app.task(name="process_document", bind=True)
def process_document_task(self, document_id: int):
    # Progress: 0% - Starting
    self.update_state(
        state='PROGRESS',
        meta={'step': 'Starting', 'progress': 0}
    )

    processor = DocumentProcessingService(supabase_client)

    # Progress: 20% - Downloading
    self.update_state(state='PROGRESS', meta={'step': 'Downloading', 'progress': 20})

    # Progress: 40% - Parsing
    self.update_state(state='PROGRESS', meta={'step': 'Parsing', 'progress': 40})

    # Process document
    result = loop.run_until_complete(processor.process_document(document_id))

    # Progress: 60% - Chunking complete
    self.update_state(state='PROGRESS', meta={'step': 'Chunked', 'progress': 60})

    # Get section IDs
    sections_result = supabase_client.table("document_sections") \
        .select("id") \
        .eq("document_id", document_id) \
        .is_("embedding", None) \
        .execute()

    section_ids = [row["id"] for row in sections_result.data]

    # Progress: 70% - Triggering embeddings
    self.update_state(state='PROGRESS', meta={'step': 'Generating embeddings', 'progress': 70})

    # Trigger embedding generation
    embed_task = generate_embeddings_task.apply_async(
        args=[section_ids, "document_sections", "content", "embedding"]
    )

    # Progress: 100% - Complete
    self.update_state(state='PROGRESS', meta={'step': 'Complete', 'progress': 100})

    return {"status": "COMPLETED", "sections": result["sections"]}

Task Features:

  • Progress Tracking: Updates state at key milestones (0%, 20%, 40%, 60%, 70%, 100%)
  • Task Cancellation: Supports cancellation via cancel_document_task
  • Error Handling: Comprehensive exception handling with proper state updates
  • Monitoring: Flower dashboard provides real-time task monitoring
Task Cancellation: When cancelling a document processing task, ensure you handle cleanup properly. The cancel_document_task deletes the document record, which cascades to sections due to foreign key constraints. However, any in-flight API calls to OpenAI will still complete - consider implementing a cancellation token mechanism for long-running operations.

2.5 Database Integration

Vector Search Implementation

Supabase uses PostgreSQL's pgvector extension for vector similarity search. The system includes a custom RPC function for folder-filtered search:

-- bot42-app/supabase/migrations/20250928100000_add_folder_id_to_document_sections.sql
CREATE OR REPLACE FUNCTION match_document_sections_with_folders(
  query_embedding VECTOR,
  match_threshold FLOAT,
  match_count INT DEFAULT 5,
  filter_folder_ids BIGINT[] DEFAULT NULL
)
RETURNS TABLE (
  id BIGINT,
  content TEXT,
  document_id BIGINT,
  folder_id BIGINT,
  metadata JSONB,
  similarity FLOAT
)
LANGUAGE plpgsql AS $$
BEGIN
  RETURN QUERY
  SELECT
    document_sections.id,
    document_sections.content,
    document_sections.document_id,
    document_sections.folder_id,
    document_sections.metadata,
    1 - (document_sections.embedding <=> query_embedding) AS similarity
  FROM document_sections
  WHERE
    1 - (document_sections.embedding <=> query_embedding) > match_threshold
    AND (filter_folder_ids IS NULL OR document_sections.folder_id = ANY(filter_folder_ids))
  ORDER BY similarity DESC
  LIMIT match_count;
END;
$$;

Key Features:

  • Cosine Similarity: Uses <=> operator for cosine distance
  • Folder Filtering: Optional folder-based access control
  • Threshold Filtering: Only returns results above similarity threshold
  • Metadata Preservation: Returns full metadata for citation generation
Similarity Threshold: The default threshold of 0.2 may seem low, but it's appropriate for semantic search. Lower thresholds (0.1-0.2) capture more diverse results, while higher thresholds (0.3-0.4) return only highly similar content. Adjust based on your use case - lower for exploratory queries, higher for precise fact retrieval.

Section 3: Widget

3.1 Widget Architecture

The Bot42 widget is a self-contained Vue 3 application that can be embedded on any website. It uses Shadow DOM for style isolation and Pinia for state management.

Widget vs Dashboard Chat: The widget chat is designed for public-facing, support-oriented use cases - quick question-answering for customers. The dashboard chat is research-oriented with full citation support, metadata panels, and detailed reference information - ideal for internal teams analyzing documents in depth.

Initialization

// bot42-widget-new/src/main.ts
function initializeWidget() {
  if (document.getElementById("bot42-widget-container")) {
    return; // Already initialized
  }

  const hostElement = document.createElement("div");
  hostElement.id = "bot42-widget-container";
  document.body.appendChild(hostElement);

  const shadowRoot = hostElement.attachShadow({ mode: "open" });
  const appContainer = document.createElement("div");
  shadowRoot.appendChild(appContainer);

  // Inject styles into Shadow DOM
  const styleElement = document.createElement("style");
  styleElement.textContent = tailwindStyles.replace(/:root/g, ":host");
  shadowRoot.appendChild(styleElement);

  const pinia = createPinia();
  pinia.use(persistedStatePlugin);

  const app = createApp(App);
  app.use(pinia);
  app.use(router);
  app.use(ui);
  app.mount(appContainer);
}

Key Features:

  • Shadow DOM Isolation: Prevents CSS conflicts with host page
  • Style Injection: Inlines Tailwind CSS into Shadow DOM
  • Persistent State: Uses Pinia with persisted state plugin for localStorage
  • Single Instance: Prevents multiple widget initializations
Shadow DOM Limitations: While Shadow DOM provides excellent style isolation, it also has some limitations:
- Styles must be explicitly injected (can't rely on external stylesheets)
- Some third-party libraries may not work correctly inside Shadow DOM
- Event propagation can be tricky - use `composed: true` for events that need to bubble out

3.2 Widget Chat Implementation

The widget chat interface (bot42-widget-new/src/pages/ChatView.vue) is designed for public-facing, support-oriented use cases. Unlike the dashboard chat, it focuses on quick question-answering and customer support rather than detailed research:

const chat = new Chat({
  messages: chatMessages.value,
  transport: makeTransport(),
  onData: ({ data, type }) => {
    if (type === "data-metadata") {
      metadataList.value = Array.isArray(data) ? data : [];
    }
  },
});

function makeTransport() {
  return new DefaultChatTransport({
    api: `http://localhost:3000/api/widget/chat?widget_id=${selectedWidgetId.value}`,
    body: {
      widget_id: selectedWidgetId.value,
      conversation_id: app.conversationId,
    },
  });
}

Conversation Management

The widget supports multiple conversations with history:

// Load previous messages on mount
onMounted(async () => {
  if (app.conversationId) {
    const response = await apiClient.get(
      `/widget/messages?conversation_id=${app.conversationId}`
    );

    if (response.messages && response.messages.length > 0) {
      chatMessages.value = response.messages.map((msg) => ({
        id: msg.id,
        role: msg.sender === "bot" ? "assistant" : "user",
        parts: [{ type: "text", text: msg.message }],
      }));

      chat.messages = chatMessages.value;
    }
  }
});

State Management

The widget uses Pinia stores for persistent state:

// bot42-widget-new/src/stores/app.ts
export const useAppStore = defineStore("app", {
  state: () => ({
    visitorId: null as string | null,
    conversationId: null as string | null,
    userId: null as string | null,
  }),

  actions: {
    ensureIds() {
      // Load from localStorage or generate new IDs
      this.visitorId = getVisitorId();
      this.conversationId = getConversationId();
      this.userId = getUserId();
    },
  },

  persist: true, // Persists to localStorage
});

3.3 Embedding and Deployment

Widget Script Generation

The widget is built as a single JavaScript bundle that can be embedded:

<!-- Embedding script -->
<script src="https://cdn.example.com/widget-bot42.js"></script>
<script>
  window.Bot42Widget.init({
    widget_id: "your-widget-id",
    api_url: "https://api.example.com",
  });
</script>

Configuration Injection

The widget configuration is injected via the script tag:

// bot42-widget-new/src/utils/widgetConfig.ts
export function loadWidgetConfig() {
  const script = document.querySelector("script[data-widget-config]");
  if (script) {
    const config = JSON.parse(
      script.getAttribute("data-widget-config") || "{}"
    );
    window.__BOT42_CONFIG__ = config;
  }
}

CORS Handling

The widget API routes handle CORS for cross-origin requests:

// bot42-app/server/api/widget/chat.post.ts
const corsHeaders = {
  "Access-Control-Allow-Origin": origin || "*",
  "Access-Control-Allow-Methods": "POST, OPTIONS",
  "Access-Control-Allow-Headers": "Content-Type",
};

if (req.method === "OPTIONS") {
  return new Response(null, { headers: corsHeaders });
}
CORS Security: Always validate the origin header against your widget's allowed_domains configuration. Never use "*" as the origin in production - this allows any website to make requests to your API. The widget should only accept requests from domains you've explicitly authorized.

3.4 Widget Features

Multi-Conversation Support

Users can switch between conversations, with each conversation maintaining its own message history and metadata.

Reference/Citation Display

The widget displays citations inline, allowing users to see source documents:

// Reference component fetches section content on hover
const fetchSectionContent = async (sectionId: number) => {
  const { data } = await supabase
    .from("document_sections")
    .select("content")
    .eq("id", sectionId)
    .single();

  sectionContent.value = data?.content || null;
};

Customizable UI

The widget UI is built with Nuxt UI components, providing a consistent, modern interface that can be customized via CSS variables.

Widget Customization: The widget can be customized by passing configuration options during initialization. You can override colors, fonts, and other UI elements by setting CSS custom properties in the Shadow DOM. This allows each website embedding the widget to match their brand while maintaining the core functionality.

Architecture Diagrams

System Overview

Syntax error in textmermaid version 11.12.2

Document Processing Flow

Syntax error in textmermaid version 11.12.2

Chat Flow

The chat flow is the same for both dashboard and widget, with the main differences being:

  • Authentication: Dashboard uses user authentication, widget uses widget-based authentication
  • Use Case: Dashboard is research-oriented with full citations, widget is support-oriented with simplified UI
Syntax error in textmermaid version 11.12.2

Future Plans

Bot42 is continuously evolving to meet the growing demands of document processing and AI-powered chat. Here's a roadmap of planned features and enhancements:

OCR and Image Processing

One of the most requested features is Optical Character Recognition (OCR) support for scanned documents and images. This will enable Bot42 to process:

  • Scanned PDFs: Extract text from image-based PDFs that currently can't be processed
  • Image Files: Support for PNG, JPG, and other image formats containing text
  • Handwritten Documents: Advanced OCR models for handwritten text recognition
  • Multi-page Documents: Batch processing of image-based documents

Implementation Approach:

  • Integration with OCR services like Tesseract, Google Cloud Vision API, or AWS Textract
  • Pre-processing pipeline for image enhancement (deskewing, noise reduction)
  • Post-processing for OCR accuracy improvement
  • Hybrid approach: OCR for images, existing parsing for text-based documents
OCR Strategy: For production use, consider using cloud-based OCR services (Google Cloud Vision, AWS Textract) for better accuracy, especially for complex layouts and handwritten text. For cost-sensitive applications, Tesseract provides a good open-source alternative.

Multi-Modal Support

Expanding beyond text to support various content types:

  • Audio Transcription: Process audio files and podcasts
  • Video Processing: Extract transcripts from video files
  • Tables and Charts: Better extraction and understanding of tabular data
  • Code Snippets: Enhanced parsing and indexing of code documentation

Advanced Chunking Strategies

Moving beyond simple character-based chunking:

  • Semantic Chunking: Use embeddings to identify semantic boundaries
  • Topic-Based Chunking: Group content by topics or themes
  • Adaptive Chunking: Dynamically adjust chunk sizes based on content type
  • Hierarchical Chunking: Maintain parent-child relationships between chunks

Enhanced Search and Retrieval

Improving the RAG retrieval pipeline:

  • Hybrid Search: Combine vector similarity with keyword matching (BM25)
  • Re-ranking: Use cross-encoders to re-rank retrieved results
  • Query Expansion: Automatically expand user queries with synonyms and related terms
  • Multi-query Generation: Generate multiple query variations for better retrieval

Fine-Tuned Embedding Models

Custom embedding models trained on domain-specific data:

  • Domain Adaptation: Fine-tune embeddings on your specific document corpus
  • Multi-lingual Support: Better support for non-English documents
  • Specialized Models: Industry-specific models (legal, medical, technical)

Enhanced Citation and References

Improving how sources are displayed and accessed:

  • Document Previews: Show document snippets in citation popovers
  • Page-Level Citations: Link directly to specific pages in source documents
  • Citation Confidence Scores: Display how relevant each source is
  • Source Highlighting: Highlight the exact text used from each source

Analytics and Monitoring

Better insights into system usage and performance:

  • Usage Analytics: Track query patterns, popular documents, user engagement
  • Performance Metrics: Monitor embedding generation times, search latency
  • Quality Metrics: Track answer quality, citation accuracy
  • Cost Tracking: Monitor API usage and costs (OpenAI, OCR services)

Advanced Widget Features

Enhancing the embeddable widget:

  • Custom Branding: More customization options for colors, fonts, layout
  • Multi-language Support: Widget UI in multiple languages
  • Voice Input: Speech-to-text for chat input
  • File Upload: Allow users to upload documents directly through the widget
  • Conversation Sharing: Share conversations via links

Security and Compliance

Enterprise-grade security features:

  • End-to-End Encryption: Encrypt sensitive documents at rest and in transit
  • Audit Logging: Comprehensive audit trails for compliance
  • Data Residency: Support for region-specific data storage
  • Access Controls: More granular permission systems
  • GDPR Compliance: Data deletion, export, and privacy controls

Performance Optimizations

Scaling for larger deployments:

  • Caching Layer: Redis-based caching for frequent queries
  • CDN Integration: Serve widget assets via CDN
  • Database Optimization: Query optimization, indexing strategies
  • Horizontal Scaling: Better support for multi-region deployments
  • Batch Processing: Optimize embedding generation for large document sets

Integration Ecosystem

Expanding integrations with other tools:

  • Slack Integration: Chat with documents directly in Slack
  • Microsoft Teams: Native Teams integration
  • API Webhooks: Real-time notifications for document processing
  • Zapier/Make: No-code automation integrations
  • REST API: Comprehensive API for custom integrations
Prioritization: The roadmap is prioritized based on user feedback and business needs. OCR support is currently the highest priority, followed by enhanced search capabilities and multi-modal support. Features are developed incrementally and released as they become production-ready.

Conclusion

Bot42 demonstrates a production-ready RAG system architecture that combines the strengths of modern web frameworks, Python-based processing, and vector databases. The system's key strengths include:

Architecture Decisions:

  • Separation of Concerns: Frontend, API, and processing are cleanly separated
  • Asynchronous Processing: Celery enables scalable, non-blocking document processing
  • Real-time Updates: SSE streaming provides responsive chat experiences
  • Security: RLS policies and API key authentication protect data access

Scalability Considerations:

  • Horizontal Scaling: Celery workers can be scaled independently
  • Vector Search: PostgreSQL's pgvector provides efficient similarity search
  • Caching: Redis caches task results and reduces database load
  • Monitoring: Flower provides visibility into task execution

Future Enhancements:

  • Multi-modal support (images, audio)
  • Advanced chunking strategies (semantic chunking)
  • Fine-tuned embedding models
  • Enhanced citation UI with document previews
  • Analytics and usage tracking

App Walkthrough

Try it out here

About the author

Fullstack developer passionate about building scalable, high-performance web applications. Experienced in building AI powered applications and datascience, and enjoy hands-on tinkering with Arduino and emerging technologies.

Ben Kissi

Related articles