Through this 10-week journey, I gained a deep understanding of modern AI development — from LLM architecture, RAG systems, fine-tuning, and alignment, to voice and multimodal agents.
This RAG SaaS project, fully deployed on Google Cloud with Docker, is the result of that learning journey.
Grateful to all the mentors and the Inference.AI community for their inspiration and guidance. 🚀
Three coordinated services power the platform: the admin portal, the chat client, and the Python retrieval/indexing engine.
Modern companies need domain-specific AI assistants — both for customers (public) and for internal teams (private). But most AI copilots today are isolated prototypes: each built from scratch, with fragile retrieval pipelines and no reusable foundation. Every new AI use case starts over — wasting time, data, and compute.
Customer-Facing
Who: Clients or visitors on company websites — e.g., an insurance firm, law office, or clinic.
Problem: Customers want quick, trustworthy answers about products, services, or policies.
Impact: Without automation, human agents handle repetitive FAQs and form submissions.
Need: An AI chatbot powered by a public knowledge base that provides accurate, branded responses 24/7 — reducing workload while improving engagement.
Internal-Facing
Who: Internal teams — support, operations, sales, and maintenance staff.
Problem: They waste time searching across documents, CRMs, and ticketing systems.
Impact: Lost productivity, repeated work, and inconsistent responses to customers.
Need: A secure, tenant-isolated AI copilot that retrieves data from private knowledge bases — handling orders, appointments, and internal workflows safely.
Enterprises end up maintaining two disconnected systems — one for public users, one for internal teams — duplicating infrastructure, data, and maintenance.
That is why I built SoRag — a SaaS RAG infrastructure that unifies both public and private AI experiences. It turns retrieval pipelines, LangGraph agent orchestration, and hybrid dense + sparse search into reusable building blocks.
With one multi-tenant codebase, SoRag powers both public AI assistants that engage customers through knowledge-driven chat and private AI copilots that empower teams internally — all orchestrated through the same supervisor graph.
Shop | Public RAG + Ordering | Customers ask for product insights, receive grounded answers, and trigger orders through the integration agent. |
---|---|---|
Clinic | Appointment Concierge | Patients inquire about symptoms, insurance, or practitioners and confirm appointment slots using the same orchestration logic. |
Law | Legal Knowledge Concierge | Clients explore case strategy, compliance, or engagement terms while the agent drafts intakes and schedules consultations. |
Future | Your Next Vertical | Finance, field service, logistics — plug in a new domain model while reusing ingestion, retrieval, and evaluation pipelines. |
SoRag supports both sides of the enterprise AI spectrum.
Both run on the same retrieval-augmented graph orchestration layer — one codebase, two audiences, infinitely extensible.
Collections, knowledge base versioning, run tracking, and evaluation are shared services. Swapping the domain means pointing those services at a new bounded context.
Upload, chunk, embed, and version sources once. Whether the payload is product specs, clinic policies, or case briefs, the KbSourceVersion pipeline stays identical.
The supervisor graph coordinates retrieval plus tool usage. We just switch the downstream API connector (ShopService vs. ClinicService vs. PracticeService).
ParseRun, RetrievalRun, and EvaluationRun capture every decision, enabling AB tests for orders, bookings, or matter recommendations alike.
Long-term memories retain past purchases, visit notes, or case history, governed by tenant-specific privacy policies.
Reusable building blocks that make vertical pivots fast.
PostgreSQL + pgvector store every domain's facts with version control.
Ingestion, chunking, embedding, and sparse indexing as shared services.
Conversation supervisors orchestrate retrieval and domain-specific actions.
Identity, permissions, and auditing powered by ABP multi-tenancy.