Roadmap & Next Steps

Guiding the platform toward an enterprise-grade RAG stack

Version 2 Focus

Traceable source lifecycles, experiment-ready instrumentation, integrated RAG evaluation.

All upgrades align with the layered architecture and bounded contexts described on the Architecture page.

Current Foundation (v1.x)

RAG Pipeline

Document ingest, chunking, embedding, and pgvector storage
Hybrid dense retrieval exposed through the Supervisor graph
Chat experience with conversation memory and tool orchestration
Observability baseline via indexing events and background jobs

Platform & Tenancy

ABP modules for identity, permissioning, and tenant boundaries
SaaS-ready deployment with PostgreSQL, Redis, and background workers
Rich admin client backed by CQRS application services
AI provider adapters for OpenAI and Claude

Why V2 Matters

Current source updates overwrite embeddings with limited auditability
Run diagnostics spread across logs instead of structured tables
No automated evaluation loop for relevance or regression checks
Experiments require manual toggles and lack result comparison

Version 2 Spotlight

Five pillars unlock reproducibility, experimentation, memory, human oversight, and evaluation.

Tenant-Safe Middleware

Authorization and boundary hardening

Harden the React chat client middleware to enforce tenant isolation, subdomain validation, and cross-origin blocking.
Add Python FastAPI middleware that maps identities to permitted knowledge bases before executing retrieval jobs.
Design the authorization schema (tenants, roles, KB grants) and apply it consistently across ingestion and retrieval flows.
Extend multi-agent and supervisor logic with guardrail checks so tool actions inherit the same authorization context.

Outcome: a multi-tenant RAG surface that respects domain boundaries while keeping agents extensible.

Source Versioning

Immutable knowledge base history

Introduce immutable KbSourceVersion snapshots; KbSource retains a pointer to the active version.
Persist every chunk, embedding, and collection item with source_version_id for full lineage.
Workspace-level retention policy (default keep 5 versions) with manual pinning for critical releases.
Collections become first-class: IdxVectorCollection and IdxSparseCollection track strategy, provider, and version scope.
Document attributes move to JSONB metadata; future schema definitions via DocPropertySchema.

Outcome: reproducible rebuilds, rollback capabilities, and side-by-side source comparisons.

Run Tracking

Structured observability for AB testing

Indexing Runs

ParseRun logs parsers, token counts, and errors per source version.
ChunkRun stores strategy, window size, and overlap decisions.
EmbeddingRun captures model, latency, cost, and chunk outcomes.
SparseRun records BM25/SPLADE index inserts by variant.
UpsertRun links batches pushed into FAISS, pgvector, or Elastic collections.

Retrieval Runs

RetrievalRun represents each query with filters, latency, and status.
ComponentRecallRun aggregates dense and sparse scores for diagnostics.
FusionResultRun stores the hybrid ordering with fusion parameters.

Outcome: experiment toggles backed by data, quick root-cause analysis, and per-tenant analytics.

Long-Term Memory

Persistent cross-session intelligence

Introduce MemoryStream aggregates per tenant/user to organize episodic, semantic, and profile memories.
Persist normalized memory facts in MemoryFact (structured data) and MemoryEmbedding (vectorized recall).
Schedule MemoryCondenseRun jobs that distill recent conversations into durable facts with decay policies.
Surface MemorySnapshot versions that align with KbSourceVersion to keep knowledge and memory in sync.
Expose memory controls in the admin UI for retention, redaction, and tenant-level privacy rules.

Outcome: richer personalization, fewer repeated questions, and auditable cross-session recall.

Human-in-the-Loop

Safe human intervention

Use LangGraph interrupts to pause agent workflows whenever confidence drops or compliance policies trigger.
Route escalations to reviewers who can edit responses, approve actions, or annotate gaps for retraining.
Log reviewer feedback alongside RetrievalRun records to enrich evaluation datasets.
Expose admin tooling for queue management, SLAs, and audit-ready decision trails.
Feed curated corrections back into versioned sources to continuously improve hit rates.

Outcome: trustable workflows where humans catch high-risk answers, reinforce model learning, and satisfy regulatory requirements.

Domain Model Updates

New Tables

Entity/Table	Description
KbSourceVersion	Immutable snapshot of a knowledge source with JSONB metadata, retention flags, and audit columns.
IdxVectorCollection	Represents a dense index configuration (model, dimension, distance metric) scoped per tenant.
IdxSparseCollection	Parallel sparse index strategy (BM25, SPLADE, Elastic) with connection details and parameters.
ParseRun / ChunkRun / EmbeddingRun / SparseRun / UpsertRun	Telemetry aggregates for each stage of the indexing pipeline with status, metrics, and ownership.
RetrievalRun	Captures a retrieval attempt end-to-end, linking query context to the active source version and collections.
ComponentRecallRun	Dense and sparse retrieval scores before fusion, enabling diagnostics and AB comparison.
FusionResultRun	Final ranked results with fusion weights and attribution for debugging hybrid logic.
EvaluationRun	Stores evaluation dataset references, metrics, prompts, and decision outcomes.
MemoryStream / MemoryFact / MemoryEmbedding / MemorySnapshot	Tiered memory aggregates that capture episodic, semantic, and vectorized context with snapshot history.

Extended Tables

KbChunk, IdxVectorCollectionItem, IdxSparseCollectionItem gain source_version_id and collection_id.
KbSource retains active_version_id plus retention settings per tenant.
RetrievalConversation links to MemoryStream for context injection and auditing.
Existing background job tables reference the new run identifiers for traceability.

Implementation Roadmap

Phase 1: Versioning Foundation (Sprints 1-2)

Add KbSourceVersion aggregate, migration scripts, retention policies, and UI support to select active versions.

Deliverables: schema migrations, repository/service updates, admin tools for manual version pinning.

Phase 2: Run Instrumentation (Sprints 2-3)

Emit structured events from indexing and retrieval supervisors into the new run tables with correlation ids.

Deliverables: background job telemetry, AB experiment toggles, dashboards for run statistics.

Phase 3: Evaluation & AB Testing (Sprint 4)

Integrate evaluation datasets, orchestrate EvaluationRun, and surface comparison reports in the admin UI.

Deliverables: regression alerts, exportable reports, automated checks in deployment pipelines.

Phase 4: Long-Term Memory (Sprints 4-5)

Design memory schemas, condensation jobs, privacy controls, and retrieval hooks across agents.

Deliverables: memory management UI, retention/expiration policies, automated memory AB validation.

Risks & Dependencies

Schema migrations must be forward-only; use ABP data seed contributors to backfill existing sources safely.
Version retention increases storage; plan tiered policies and background cleanup jobs.
Run tables can grow quickly; enforce partitioning and configurable TTLs per tenant.
Evaluation datasets require curated ground truth; integrate dataset management into admin tooling.
Long-term memory introduces privacy and compliance obligations; implement redaction workflows and tenant isolation.