Special thanks to Inference.AI for the amazing program — “Machine Learning Engineer in the Generative AI Era.”

Through this 10-week journey, I gained a deep understanding of modern AI development — from LLM architecture, RAG systems, fine-tuning, and alignment, to voice and multimodal agents.

This RAG SaaS project, fully deployed on Google Cloud with Docker, is the result of that learning journey.

Grateful to all the mentors and the Inference.AI community for their inspiration and guidance. 🚀

SoRag — a reusable RAG infrastructure for multi-tenant SaaS platforms, powering public and private AI copilots.

SoRag Topology

Three coordinated services power the platform: the admin portal, the chat client, and the Python retrieval/indexing engine.

  • Admin (.NET Core): User management, permissions, and knowledge base CRUD.
  • Chat Client (React): Frontline UI where users ask questions and receive grounded answers.
  • Retrieval & Indexing (Python): Admin dispatches ingestion jobs asynchronously; retrieval APIs serve the React app.
  • Chat UI: https://github.com/vercel/ai-chatbot
  • Host: Deployed using Docker
SoRag topology diagram
— PROBLEM

Platform Problem

Modern companies need domain-specific AI assistants — both for customers (public) and for internal teams (private). But most AI copilots today are isolated prototypes: each built from scratch, with fragile retrieval pipelines and no reusable foundation. Every new AI use case starts over — wasting time, data, and compute.

Two Sides of the Problem
Public AI

Customer-Facing

Who: Clients or visitors on company websites — e.g., an insurance firm, law office, or clinic.

Problem: Customers want quick, trustworthy answers about products, services, or policies.

Impact: Without automation, human agents handle repetitive FAQs and form submissions.

Need: An AI chatbot powered by a public knowledge base that provides accurate, branded responses 24/7 — reducing workload while improving engagement.

Private AI

Internal-Facing

Who: Internal teams — support, operations, sales, and maintenance staff.

Problem: They waste time searching across documents, CRMs, and ticketing systems.

Impact: Lost productivity, repeated work, and inconsistent responses to customers.

Need: A secure, tenant-isolated AI copilot that retrieves data from private knowledge bases — handling orders, appointments, and internal workflows safely.

Shop flow diagram Clinic flow diagram Law flow diagram
The Result

Enterprises end up maintaining two disconnected systems — one for public users, one for internal teams — duplicating infrastructure, data, and maintenance.

My Solution

That is why I built SoRag — a SaaS RAG infrastructure that unifies both public and private AI experiences. It turns retrieval pipelines, LangGraph agent orchestration, and hybrid dense + sparse search into reusable building blocks.

With one multi-tenant codebase, SoRag powers both public AI assistants that engage customers through knowledge-driven chat and private AI copilots that empower teams internally — all orchestrated through the same supervisor graph.

Live Showcase — Public AI Examples
Shop Public RAG + Ordering Customers ask for product insights, receive grounded answers, and trigger orders through the integration agent.
Clinic Appointment Concierge Patients inquire about symptoms, insurance, or practitioners and confirm appointment slots using the same orchestration logic.
Law Legal Knowledge Concierge Clients explore case strategy, compliance, or engagement terms while the agent drafts intakes and schedules consultations.
Future Your Next Vertical Finance, field service, logistics — plug in a new domain model while reusing ingestion, retrieval, and evaluation pipelines.
Summary

SoRag supports both sides of the enterprise AI spectrum.

  • Public AI → Engages customers through public knowledge bases (for example: Shop, Clinic, Law).
  • Private AI → Empowers employees through secure tenant-isolated copilots (for example: Support, Operations, Legal, Finance).

Both run on the same retrieval-augmented graph orchestration layer — one codebase, two audiences, infinitely extensible.

Reusable by Design

Collections, knowledge base versioning, run tracking, and evaluation are shared services. Swapping the domain means pointing those services at a new bounded context.

1
Shared Knowledge Lifecycle

Upload, chunk, embed, and version sources once. Whether the payload is product specs, clinic policies, or case briefs, the KbSourceVersion pipeline stays identical.

2
Agent Orchestration

The supervisor graph coordinates retrieval plus tool usage. We just switch the downstream API connector (ShopService vs. ClinicService vs. PracticeService).

3
Observability and Evaluation

ParseRun, RetrievalRun, and EvaluationRun capture every decision, enabling AB tests for orders, bookings, or matter recommendations alike.

4
Memory and Personalization

Long-term memories retain past purchases, visit notes, or case history, governed by tenant-specific privacy policies.

SoRag at a Glance

What it delivers
  • Reusable RAG backbone with LangGraph multi-agent orchestration and FastAPI tool adapters.
  • Hybrid retrieval (pgvector dense + BM25 sparse) with run telemetry for parse/chunk/embed/upsert stages.
  • Evaluation pipeline ready for AB tests on query quality, latency, and cost.
How to demo it
  • Start with the problem statement, then walk through the shop, clinic, and law firm flows.
  • Use the diagrams on the Gallery page to explain supervisors, pipelines, and integrations.
  • Close with the evaluation + long-term memory roadmap from Next Steps.

Infrastructure Capabilities

Reusable building blocks that make vertical pivots fast.

Vector Knowledge

PostgreSQL + pgvector store every domain's facts with version control.

Pipeline Automation

Ingestion, chunking, embedding, and sparse indexing as shared services.

Graph Agents

Conversation supervisors orchestrate retrieval and domain-specific actions.

Tenant Guardrails

Identity, permissions, and auditing powered by ABP multi-tenancy.