Grounding pipeline

Как SLAtech избегает hallucination

12-stage RAG-based grounding pipeline + structured citation system. Каждый response grounded в tenant content — не fine-tuned. Visitors видят citation snippets per source. Confidence-scored. Audit-trail logged. Continuous eval feedback. Пара с architecture overview, eval scoreboard и AI ethics statement.

1. Ingestion tenant content

Документы (PDF, DOCX, scraped HTML, FAQ pairs, manually-authored articles) chunked в 200-500 token segments с 50-token overlap. Каждый chunk gets metadata: ClientId (tenant partition), sourceUrl, chunkIndex, lastUpdated.

Technical: Chunking algorithm respects document structure (paragraph boundaries) когда possible. Overlap предотвращает context-loss across chunk boundaries.

2. Embedding generation

OpenAI text-embedding-3-small (1536 dimensions) converts каждый chunk в semantic vector. Embeddings stored в Qdrant с chunk's metadata.

Technical: Тот же embedding model используется для queries — обеспечивает cosine-similarity comparisons semantically valid.

3. Query embedding

Когда visitor submits сообщение, бот embeds query, using тот же OpenAI model. Query embedding filtered против ClientId до retrieval — cross-tenant contamination — structurally невозможна.

Technical: Repository pattern enforces ClientId filter на compile time через static analyzer rule (SLATECH001).

4. Top-K retrieval

Qdrant returns top-K (default 10) chunks с highest cosine similarity к query. Default ScoreThreshold = 0.5 filters out low-relevance chunks.

Technical: TopK clamped к [1, 20] per per-tenant configuration. Below-threshold queries route к "no relevant content" fallback вместо hallucination.

5. Context assembly

Retrieved chunks + system prompt + conversation history pass к LLM. System prompt explicitly instructs LLM: "Answer only from provided context. If answer not в context, say so."

Technical: Token budget enforced (default 4000 tokens of context); если budget exceeds, low-score chunks dropped first.

6. LLM generation

GPT-4o-mini (default) или tenant-configured LLM generates response, grounded в retrieved context. Temperature default 0.3 для grounded customer-facing answers.

Technical: Per-tenant LLM provider abstraction позволяет OpenAI / Anthropic / Cohere swap без customer-side migration.

7. Citation extraction

Response includes structured citation list: { sourceUrl, snippet, score } per retrieved chunk. Snippet — actual quoted text — не just URL.

Technical: BuildSnippet helper в QueryRequest record extracts relevant 200-character span из chunk.

8. SSE streaming с sources-early event

Server-Sent Events transport. First event — sources-early — emits citation metadata до LLM streaming starts. Widget renders "according to" hover-card пока ответ всё ещё стримится.

Technical: Сокращает perceived latency на ~70% vs synchronous response. Enables AI scrapers extract grounded quotes из response.

9. LLM-as-Judge confidence scoring

Каждый response scored секондарным LLM call против трёх axes: factuality, hallucination и confidence. Scores surface в admin Inbox.

Technical: Confidence score below 0.5 обычно triggers human-handoff fallback вместо guessed answer.

10. Human-handoff fallback

Когда confidence low ИЛИ query identified как high-risk (clinical advice, legal position, regulatory question) — бот routes к "human will follow up" pattern. Visitor receives acknowledgement + follow-up channel.

Technical: Per-vertical risk classifier tuned для каждой industry. Med routes ВСЕ diagnosis-adjacent queries к human; Legal routes ВСЕ substantive-legal-question queries к human.

11. Per-response audit trail

Каждый response logged с full context: input query, retrieved chunks с scores, system prompt, LLM model used, generated response, citation snippets, confidence scores.

Technical: Audit logs retained 13 месяцев. Per-tenant audit log exportable на Enterprise tier.

12. Continuous eval feedback

Eval harness работает nightly против sealed 200-question test set per vertical. Hallucination scores tracked over time. Regressions ≥3 points trigger manual triage alert.

Technical: Eval methodology open-source — buyers могут запустить её против своего SLAtech tenant. Published scoreboard на /ru/eval/.

Verify grounding на вашем tenant

Eval harness и methodology — open-source. Запустите его против вашего SLAtech tenant для verify per-response factuality.