Roadmap & Next Steps

Guiding the platform toward an enterprise-grade RAG stack

Version 2 Focus
Traceable source lifecycles, experiment-ready instrumentation, integrated RAG evaluation.

All upgrades align with the layered architecture and bounded contexts described on the Architecture page.

Current Foundation (v1.x)

RAG Pipeline
  • Document ingest, chunking, embedding, and pgvector storage
  • Hybrid dense retrieval exposed through the Supervisor graph
  • Chat experience with conversation memory and tool orchestration
  • Observability baseline via indexing events and background jobs
Platform & Tenancy
  • ABP modules for identity, permissioning, and tenant boundaries
  • SaaS-ready deployment with PostgreSQL, Redis, and background workers
  • Rich admin client backed by CQRS application services
  • AI provider adapters for OpenAI and Claude
Why V2 Matters
  • Current source updates overwrite embeddings with limited auditability
  • Run diagnostics spread across logs instead of structured tables
  • No automated evaluation loop for relevance or regression checks
  • Experiments require manual toggles and lack result comparison

Version 2 Spotlight

Five pillars unlock reproducibility, experimentation, memory, human oversight, and evaluation.

Tenant-Safe Middleware
Authorization and boundary hardening
  • Harden the React chat client middleware to enforce tenant isolation, subdomain validation, and cross-origin blocking.
  • Add Python FastAPI middleware that maps identities to permitted knowledge bases before executing retrieval jobs.
  • Design the authorization schema (tenants, roles, KB grants) and apply it consistently across ingestion and retrieval flows.
  • Extend multi-agent and supervisor logic with guardrail checks so tool actions inherit the same authorization context.

Outcome: a multi-tenant RAG surface that respects domain boundaries while keeping agents extensible.

Source Versioning
Immutable knowledge base history
  • Introduce immutable KbSourceVersion snapshots; KbSource retains a pointer to the active version.
  • Persist every chunk, embedding, and collection item with source_version_id for full lineage.
  • Workspace-level retention policy (default keep 5 versions) with manual pinning for critical releases.
  • Collections become first-class: IdxVectorCollection and IdxSparseCollection track strategy, provider, and version scope.
  • Document attributes move to JSONB metadata; future schema definitions via DocPropertySchema.

Outcome: reproducible rebuilds, rollback capabilities, and side-by-side source comparisons.

Run Tracking
Structured observability for AB testing
Indexing Runs
  • ParseRun logs parsers, token counts, and errors per source version.
  • ChunkRun stores strategy, window size, and overlap decisions.
  • EmbeddingRun captures model, latency, cost, and chunk outcomes.
  • SparseRun records BM25/SPLADE index inserts by variant.
  • UpsertRun links batches pushed into FAISS, pgvector, or Elastic collections.
Retrieval Runs
  • RetrievalRun represents each query with filters, latency, and status.
  • ComponentRecallRun aggregates dense and sparse scores for diagnostics.
  • FusionResultRun stores the hybrid ordering with fusion parameters.

Outcome: experiment toggles backed by data, quick root-cause analysis, and per-tenant analytics.

Long-Term Memory
Persistent cross-session intelligence
  • Introduce MemoryStream aggregates per tenant/user to organize episodic, semantic, and profile memories.
  • Persist normalized memory facts in MemoryFact (structured data) and MemoryEmbedding (vectorized recall).
  • Schedule MemoryCondenseRun jobs that distill recent conversations into durable facts with decay policies.
  • Surface MemorySnapshot versions that align with KbSourceVersion to keep knowledge and memory in sync.
  • Expose memory controls in the admin UI for retention, redaction, and tenant-level privacy rules.

Outcome: richer personalization, fewer repeated questions, and auditable cross-session recall.

Human-in-the-Loop
Safe human intervention
  • Use LangGraph interrupts to pause agent workflows whenever confidence drops or compliance policies trigger.
  • Route escalations to reviewers who can edit responses, approve actions, or annotate gaps for retraining.
  • Log reviewer feedback alongside RetrievalRun records to enrich evaluation datasets.
  • Expose admin tooling for queue management, SLAs, and audit-ready decision trails.
  • Feed curated corrections back into versioned sources to continuously improve hit rates.

Outcome: trustable workflows where humans catch high-risk answers, reinforce model learning, and satisfy regulatory requirements.

Domain Model Updates

New Tables
Entity/Table Description
KbSourceVersion Immutable snapshot of a knowledge source with JSONB metadata, retention flags, and audit columns.
IdxVectorCollection Represents a dense index configuration (model, dimension, distance metric) scoped per tenant.
IdxSparseCollection Parallel sparse index strategy (BM25, SPLADE, Elastic) with connection details and parameters.
ParseRun / ChunkRun / EmbeddingRun / SparseRun / UpsertRun Telemetry aggregates for each stage of the indexing pipeline with status, metrics, and ownership.
RetrievalRun Captures a retrieval attempt end-to-end, linking query context to the active source version and collections.
ComponentRecallRun Dense and sparse retrieval scores before fusion, enabling diagnostics and AB comparison.
FusionResultRun Final ranked results with fusion weights and attribution for debugging hybrid logic.
EvaluationRun Stores evaluation dataset references, metrics, prompts, and decision outcomes.
MemoryStream / MemoryFact / MemoryEmbedding / MemorySnapshot Tiered memory aggregates that capture episodic, semantic, and vectorized context with snapshot history.
Extended Tables
  • KbChunk, IdxVectorCollectionItem, IdxSparseCollectionItem gain source_version_id and collection_id.
  • KbSource retains active_version_id plus retention settings per tenant.
  • RetrievalConversation links to MemoryStream for context injection and auditing.
  • Existing background job tables reference the new run identifiers for traceability.

Implementation Roadmap

Phase 1: Versioning Foundation (Sprints 1-2)

Add KbSourceVersion aggregate, migration scripts, retention policies, and UI support to select active versions.

Deliverables: schema migrations, repository/service updates, admin tools for manual version pinning.

Phase 2: Run Instrumentation (Sprints 2-3)

Emit structured events from indexing and retrieval supervisors into the new run tables with correlation ids.

Deliverables: background job telemetry, AB experiment toggles, dashboards for run statistics.

Phase 3: Evaluation & AB Testing (Sprint 4)

Integrate evaluation datasets, orchestrate EvaluationRun, and surface comparison reports in the admin UI.

Deliverables: regression alerts, exportable reports, automated checks in deployment pipelines.

Phase 4: Long-Term Memory (Sprints 4-5)

Design memory schemas, condensation jobs, privacy controls, and retrieval hooks across agents.

Deliverables: memory management UI, retention/expiration policies, automated memory AB validation.

Risks & Dependencies

  • Schema migrations must be forward-only; use ABP data seed contributors to backfill existing sources safely.
  • Version retention increases storage; plan tiered policies and background cleanup jobs.
  • Run tables can grow quickly; enforce partitioning and configurable TTLs per tenant.
  • Evaluation datasets require curated ground truth; integrate dataset management into admin tooling.
  • Long-term memory introduces privacy and compliance obligations; implement redaction workflows and tenant isolation.