Indexing Workflow - Document Processing Pipeline
Linear LangGraph pipeline that processes documents from raw files to vector search-ready data. 6 stages with 59 components.
graph TB
%% Style definitions
classDef workflow fill:#667eea,stroke:#fff,stroke-width:3px,color:#fff
classDef node fill:#764ba2,stroke:#fff,stroke-width:2px,color:#fff
classDef factory fill:#f093fb,stroke:#333,stroke-width:2px,color:#333
classDef storage fill:#ff6b6b,stroke:#333,stroke-width:2px,color:#333
classDef startEnd fill:#4CAF50,stroke:#fff,stroke-width:2px,color:#fff
%% Entry and Exit
START(("📄 Documents")):::startEnd
END(("🔍 Search Ready")):::startEnd
%% Main Pipeline
WORKFLOW["🏗️ INDEXING WORKFLOW
Linear LangGraph Pipeline"]:::workflow
%% Processing Nodes
TASK_INIT["📋 Task Init
Load metadata"]:::node
PARSE["📝 Parse
Extract content
4 file types"]:::node
CHUNK["✂️ Chunk
Split text
4 strategies"]:::node
EMBED["🧠 Embed
Generate vectors
Dense + Sparse"]:::node
UPSERT["⬆️ Upsert
Store in DBs
5 systems"]:::node
COMMIT["✅ Commit
Finalize & cleanup"]:::node
%% Component Factories
subgraph COMPONENTS["Processing Components (59 files)"]
PARSERS["📝 Parsers
PDF, Image, Text, Doc"]:::factory
CHUNKERS["✂️ Chunkers
Fixed, Semantic, Page, Smart"]:::factory
EMBEDDERS["🧠 Embedders
BGE, OpenAI, BM25"]:::factory
UPSERTERS["⬆️ Upserters
FAISS, Qdrant, Pinecone, ES"]:::factory
end
%% Main flow
START --> WORKFLOW
WORKFLOW --> TASK_INIT
%% Linear pipeline with error handling
TASK_INIT --> PARSE
PARSE --> CHUNK
CHUNK --> EMBED
EMBED --> UPSERT
UPSERT --> COMMIT
COMMIT --> END
%% Component usage
PARSE ==> PARSERS
CHUNK ==> CHUNKERS
EMBED ==> EMBEDDERS
UPSERT ==> UPSERTERS
%% Error paths
TASK_INIT -.->|error| ERROR["❌ Failed"]:::startEnd
PARSE -.->|error| ERROR
CHUNK -.->|error| ERROR
EMBED -.->|error| ERROR
UPSERT -.->|error| ERROR