Multi-tenant data isolation
Every tenant gets а strict ClientId partition key applied at every storage layer. SQL Server tables (SLAtech.Db) и Qdrant collections (vector store) are keyed by ClientId с row-level и collection-level filters enforced at the query layer. The repository pattern rejects any query that omits ClientId at compile time via а static analyzer rule (SLATECH001). Sentry projects, audit logs и blob containers all maintain the same partition discipline. Cross-tenant data is а structural impossibility, not а runtime check.
RAG retrieval pipeline
Ingest → chunk → embed → upsert: documents (PDF, DOCX, scraped HTML, FAQ pairs) are chunked at 200-500 tokens с 50-token overlap. Chunks embedded via OpenAI text-embedding-3-small (1536 dimensions) и upserted into Qdrant с metadata (ClientId, sourceUrl, chunkIndex). Query path: question embedded, top-K cosine-similarity search filtered by ClientId, ScoreThreshold 0.5 default. Retrieved chunks pass к the LLM с structured citation metadata so the response carries а Snippet per source. Reconciliation runs nightly as а Worker BackgroundService.
LLM provider abstraction
All LLM calls go through ILlmProvider — а thin abstraction implemented by OpenAiQueryService, AnthropicQueryService (Enterprise tier), CohereRerankerService (optional re-ranking). Provider selection per tenant via configuration; failover routes automatically on rate-limit или 5xx. Token usage и cost report per-call в the response payload так что tenants can monitor spend в real time. Provider switching doesn't require customer-side migration — abstraction insulates application code.
Channel adapter pattern
Each channel (web widget, Telegram, WhatsApp Business, Instagram DM, email) implements IChannelAdapter — а common contract that translates channel-native message envelopes к а unified Conversation domain model. Adapter pattern lets the core query pipeline run channel-agnostic; per-channel formatting (Telegram inline keyboards, WhatsApp template messages) лежит within the adapter. Adding а channel doesn't touch the query pipeline.
Streaming response pipeline
SSE-based streaming via /v1/query/ask-stream endpoint. First event is sources-early — emits citation metadata before LLM streaming starts (so widgets render "according to" hover-cards while the answer is still streaming). Subsequent events are token chunks; final event is done с aggregate metadata (total tokens, total cost, conversation log ID). Cuts perceived latency by ~70% vs synchronous response.
Sub-processor data flow
Customer query → Kestrel ingress (Azure West Europe) → SLAtech.Api → embedding call к OpenAI (US, SCC 2021/914) → vector search в Qdrant (Azure West Europe) → LLM call к OpenAI или Anthropic с retrieved chunks → response к Kestrel → SSE back к widget. Sentry receives sanitised error envelopes (PII scrubbed pre-emission). SendGrid handles transactional email (US, SCC 2021/914). Cloudflare WAF и CDN edge globally. Full sub-processor list at /en/sub-processors/.
Deployment topology
Azure App Service (Linux) for SLAtech.Api, SLAtech.Web, SLAtech.AdminUI, SLAtech.Business + 8 vertical hubs. Azure SQL Database for relational store с daily backup и 24-hour point-in-time recovery. Qdrant on Azure VM с per-tenant collections. Azure Cache for Redis для session + rate-limit token bucket. Azure Storage для document blobs. Cloudflare in front для WAF / DDoS mitigation / CDN. GitHub Actions workflows trigger production deploys on push к the production branch.
Disaster recovery posture
RTO 4 hours, RPO 1 hour. Daily Azure SQL backups с 35-day retention; point-in-time recovery within the last 24 hours. Qdrant snapshotted nightly к Azure Storage. Multi-region failover within EU (West Europe primary, North Europe failover). DR runbook tested quarterly с simulated region failure. Status page at status.slatech.ai surfaces real-time uptime + last 90-day incident log.
Eval pipeline
Per-vertical eval harness runs nightly against а sealed 200-question test set (held out of training/tuning loops). LLM-as-Judge scores factuality, hallucination и confidence per response. Aggregate per-vertical scores surface к the public scoreboard at /en/eval/. Score regressions ≥3 points trigger а manual triage alert. The eval harness itself is open-source и downloadable as а repro template — buyers can run it against their own SLAtech tenant.
Observability stack
Sentry per backend service с PII scrubbing pre-emission. Synthetic transaction monitoring at 5-minute cadence covering 12 critical user journeys. OpenTelemetry-instrumented spans for query pipeline timings. Per-tenant audit log exportable on the Enterprise tier. Real-time uptime dashboard at status.slatech.ai surfaces 12 metric groups including p95 query latency, retrieval recall и channel-specific error rates.