Phase 1 of 6
Scoping & Latency Constraints
Define the channels, time-to-first-token budget, language coverage, PII surface, and regulatory footprint that will govern every architectural decision for the conversational AI stack.
0/8
Phase Progress
Required Recommended Optional Open-Source Proprietary Trinidy
Channels & Interaction Surface
Identify channels the assistant must serve
Why This Matters
Channel selection materially changes the latency envelope, the PII surface, and the hallucination risk profile. Voice channels demand sub-800ms time-to-first-token to preserve conversational cadence, while in-app chat tolerates 1–2 seconds before users perceive lag. Bank of America's Erica runs across mobile, voice, and web with a 48-second average interaction — a design point only reachable when channel-specific budgets are set explicitly. The most common mistake is treating every channel as a single assistant; the latency and compliance envelope differ by channel, not by use case.
Note prompts — click to add
+ Which channels share enough context to justify a single assistant versus channel-specific tuning?+ Have we inventoried the p95 latency budget for each channel before selecting a model?+ Who owns the channel-by-channel handoff to a human agent when the assistant cannot resolve?
Required
Confirm every surface on which the chatbot or virtual assistant will answer customer queries.
Select all that apply
Mobile app in-session chat
Online banking web chat
SMS / messaging (iMessage, WhatsApp Business)
IVR / voice assistant (phone channel)
Smart speaker / voice (Alexa, Google Assistant)
Branch kiosk / ATM conversational UI
Agent-assist copilot (internal use)
Email / async case intake
required
✓ saved
Define time-to-first-token (TTFT) SLA by channel
Why This Matters
Perceived latency in conversational AI is dominated by time-to-first-token, not end-to-end generation time — users judge the assistant as "fast" based on how quickly it starts responding. TensorRT-LLM on H100 delivers ~100ms TTFT at 64 concurrent requests, and vLLM v0.6.0 cut TTFT by 5× versus prior releases — the serving substrate now matters as much as the model size. A cache-hit path through semantic caching returns in 5–20ms, while a cache miss requires the full LLM inference — so the effective TTFT is the blended average and hinges on cache hit rate.
Note prompts — click to add
+ What is our current p95 TTFT by channel, and where is the hot spot — retrieval, serving, or network?+ Have we measured TTFT separately for cache-hit versus cache-miss paths?+ What does the assistant do when TTFT is breached — keep streaming, time out, or hand off?
Required
Select the TTFT budget your conversational stack must hold at p95 under peak load.
Single choice
< 200ms TTFT (voice / IVR — aggressive)
< 500ms TTFT (premium mobile experience)
< 1s TTFT (standard in-app chat)
< 2s TTFT (web / async — acceptable)
Tiered by channel (mixed SLA)
requirededgetrinidy
TrinidyCloud-routed LLM inference consumes 100–300ms of network round-trip before a single token is produced — often half of the perceived latency budget. Trinidy collocates the serving tier with the semantic cache and RAG retriever, keeping TTFT predictable even during traffic spikes.
✓ saved
Define end-to-end response completion SLA
Required
Specify the p95 full-response latency target distinct from TTFT.
Single choice
< 2 seconds (simple FAQ / account lookup)
< 5 seconds (RAG with citations)
< 10 seconds (multi-turn agentic workflow)
< 30 seconds (complex research / dispute intake)
Not currently measured end-to-end
required
✓ saved
Specify language and dialect coverage
Why This Matters
Wells Fargo has publicly reported that Spanish accounts for more than 80% of Fargo's non-English usage — language coverage is not a nice-to-have, it is a primary product decision. Hallucination rates and safety-tuning quality differ materially across languages in frontier models, and many guardrail evaluations are English-only. Shipping a chatbot that is fluent in English and unreliable in Spanish creates a measurable fair-lending exposure in addition to a CX problem.
Note prompts — click to add
+ What is our customer base's language distribution, and does our assistant match it?+ Do we evaluate hallucination and guardrail performance per language, or only in English?+ Is our RAG corpus available in every supported language, or is non-English a translation-only surface?
Required
Confirm language support, with particular attention to Spanish and other high-volume non-English segments.
Select all that apply
English (US)
Spanish (US / Latin America)
French (Canadian)
Mandarin / Cantonese
Portuguese (Brazilian)
Tagalog
Korean
Vietnamese
Arabic
Other regional languages
required
✓ saved
Map the PII surface entering LLM context
Why This Matters
The CFPB June 2023 Issue Spotlight specifically named account numbers, transaction histories, SSNs, beneficiary designations, and health-related financial data (HSAs/FSAs) as categories of sensitive PII that financial chatbots routinely put into LLM context — every one of which triggers GLBA Safeguards Rule and CFPA obligations. Cloud-routed LLM inference means that context becomes a third-party processor relationship, not just an engineering choice. Wells Fargo's solution — voice input locally transcribed, SLM scrubs PII, only anonymized text reaches the external LLM — is the architectural pattern that allowed 11.5× interaction growth without compliance exposure.
Note prompts — click to add
+ Which PII categories reach our LLM context today, and which reach a third-party LLM provider?+ Do we have a local PII scrubbing / anonymization layer, or does raw customer text hit the LLM directly?+ Have we mapped the LLM provider relationship against GLBA service-provider requirements?
Required
Inventory every PII category that may be placed into the LLM context window.
Select all that apply
Account numbers / routing numbers
Social Security Number / TIN
Transaction history
Beneficiary designations
Health-related financial data (HSA / FSA)
Biometric identifiers (voiceprint)
Geolocation / device fingerprint
Free-text messages containing any of the above
No PII ever reaches LLM context (scrubbed upstream)
required
✓ saved
Confirm data residency and cross-border constraints
Required
Map conversational context and retrieval corpora to jurisdictional constraints before architecture is finalized.
Select all that apply
US-only deployment (GLBA scope)
EU GDPR — data must remain in EU
UK GDPR — UK residency required
Canada PIPEDA / provincial rules
State-level biometric laws (Illinois BIPA, Texas CUBI)
CCPA / CPRA (California residents)
Colorado AI Act residency preference
Cross-border permitted under SCCs / DPAs
requiredtrinidy
TrinidyGLBA Safeguards, CCPA/CPRA, BIPA (for voice), and EU GDPR all press against cloud-hosted LLM serving. Trinidy keeps the RAG index, PII scrubbing, and audit logging entirely within the institution's perimeter — no cross-border flow of customer dialogue for any interaction.
✓ saved
Define scope of consumer-facing statutory rights handling
Why This Matters
The CFPB August 2024 guidance explicitly stated that AI chatbot errors that fail to recognize a consumer's invocation of statutory rights — Reg E dispute notices being the canonical example — may constitute UDAAP violations, with no "AI error" defense available. Reg E starts a regulatory clock (10 business days to investigate, 45 days to resolve); a chatbot that confidently answers a dispute question without triggering the Reg E process has created a regulatory liability, not an ops issue. Statutory-rights recognition must be a first-class routing decision, not a side effect of intent classification.
Note prompts — click to add
+ Does our intent classifier have first-class intents for every protected statutory notice a consumer might give?+ When a statutory right is invoked, does the assistant hand off to a compliant workflow rather than attempting to answer?+ Do we log the moment a statutory-rights intent was detected for regulatory audit?
Required
Specify how the assistant recognizes and routes consumer invocations of statutory rights (Reg E dispute, Reg Z billing error, FCRA, etc).
Select all that apply
Reg E EFT error notices (12 CFR 1005.11 — 60-day clock)
Reg Z billing error notices
FCRA dispute intents
SCRA / MLA protected-status invocations
Bankruptcy / cease-communication requests
Death of accountholder notifications
Unauthorized transaction reports
Assistant does not currently route statutory rights
required
✓ saved
Specify deployment topology for the serving plane
Required
Select the physical/logical deployment target for the LLM serving tier and the RAG retriever.
Single choice
On-premises serving (vLLM / TensorRT-LLM on owned GPUs)
Private cloud / VPC in-region
Hybrid: on-prem RAG + cloud LLM for non-PII path
Public cloud API (OpenAI / Anthropic / Gemini)
Multi-model routing across on-prem + cloud
requirededgetrinidy
TrinidyFor PII residency and sub-second TTFT, cloud-API-only serving is economically and regulatorily fragile at Wells Fargo / BofA scale. Trinidy provides on-prem vLLM / TensorRT-LLM serving with the semantic cache and RAG index collocated — the entire hot path stays inside the institution's perimeter.
✓ saved