Phase 1 of 6
Scoping & Latency
Define the channels, time-to-first-token budget, seat topology, and language coverage that will govern every RAG and LLM decision downstream.
0/8
Phase Progress
Required Recommended Optional Open-Source Proprietary Trinidy
Channels & Conversation Surface
Identify channels in scope for agent assist
Why This Matters
Voice, chat, and video have materially different latency envelopes, transcription dependencies, and compliance obligations — voice triggers state two-party consent laws and MiFID II recordkeeping, while chat is text-native and avoids voice biometric exposure (Illinois BIPA). Bundling channels into one copilot without scoping each explicitly is how teams discover six months in that their voice pipeline cannot reuse the chat RAG stack because ASR latency alone consumes the entire TTFT budget.
Note prompts — click to add
+ Which channels are live today vs. planned in the next 12 months?+ Do we own the transcription stack end-to-end or is it outsourced to the CCaaS vendor?+ Is our video banking channel in scope for the same knowledge base as voice?
Required
Confirm which customer-facing channels the copilot must support.
Select all that apply
Inbound voice (phone / IVR-routed)
Outbound voice (collections / retention)
Live web chat
In-app mobile chat
SMS / RCS
Secure messaging inside online banking
Video banking (screen share + face-to-face)
Email triage + draft-reply
Branch / in-person agent terminal
required
✓ saved
Define time-to-first-token (TTFT) latency budget
Why This Matters
Sub-500ms TTFT is the threshold at which a suggestion lands before the agent finishes the customer's sentence — anything slower forces the agent to wait through an awkward pause or talk over a stale suggestion that no longer matches the conversation turn. Industry telemetry from 2024–2025 shows 68% of financial services agent-assist deployments stuck above a 2-second P95, which is why most of them are used for after-call wrap rather than in-conversation. Setting the TTFT budget is a first-order architectural decision — infrastructure choices made after the SLA is fixed have an order of magnitude less leverage than choices that set it correctly.
Note prompts — click to add
+ What is our current P95 TTFT in our pilot deployment, and which stage dominates?+ Have we measured agent abandonment of suggestions as a function of TTFT?+ Is our TTFT target the same across voice, chat, and video, or tiered by channel?
Required
Select the P95 TTFT the agent copilot must hold during a live conversation.
Single choice
< 500ms TTFT (invisible to the conversation — target)
500ms – 1s (acceptable for chat, awkward on voice)
1s – 2s (agent must talk over the pause)
> 2s (breaks agent flow — 68% of FSI deployments today)
Tiered by channel (sub-500ms voice, 1–2s email)
requirededgetrinidy
TrinidyTrinidy's optimized RAG pipeline — embedding 20ms, ANN retrieval 80ms, rerank 50ms, prompt build 50ms, first token 250ms — completes in under 450ms on-prem. Cloud-routed LLM calls alone consume 200–800ms of network and queue time before the model starts generating.
✓ saved
Quantify daily call volume and concurrency
Why This Matters
Daily call volume misleads capacity planning — peak concurrent conversations determine the GPU fleet because each active agent holds a streaming LLM session. Bank of America's Erica handled 676M interactions in 2024 with concurrency peaks far above the daily-average implied load. Sizing the fleet to the daily average produces a queue during peak hour that blows through the TTFT budget well before any model is at fault.
Note prompts — click to add
+ What is our peak concurrent call count today and how does it scale in the next 24 months?+ Are we sizing the LLM fleet on daily volume or measured peak concurrency?+ What is our fallback behavior when concurrency exceeds provisioned capacity?
Required
Capacity planning anchor for the LLM serving fleet — peak concurrency drives GPU count, not daily volume.
Single choice
< 10k calls/day (regional / credit union)
10k – 100k calls/day (mid-size bank)
100k – 1M calls/day (super-regional / specialty issuer)
> 1M calls/day (top-10 retail bank tier)
Mixed voice + chat, not currently aggregated
required
✓ saved
Specify concurrent agent seat count
Why This Matters
Seat count multiplied by average streaming-session duration determines the number of concurrent LLM streams the inference fleet must support — and LLM serving is concurrency-bound far more than throughput-bound. A 10,000-seat contact center at 80% utilization typically holds 4,000–6,000 simultaneous streaming LLM sessions during peak hour, which maps directly to GPU count. Undersizing for concurrency is the single most common cause of production TTFT regressions.
Note prompts — click to add
+ What is our measured peak concurrent streaming session count today?+ Is our GPU fleet sized for peak concurrency or rolling average?+ Do we have headroom for seasonal peaks (tax season, holiday retail)?
Required
Number of simultaneously active agent seats that must hold sub-500ms TTFT under peak load.
Single choice
< 500 seats
500 – 2,500 seats
2,500 – 10,000 seats
> 10,000 seats (multi-site enterprise contact center)
Hybrid human + AI chat deflection (variable seats)
required
✓ saved
Define language and dialect coverage
Why This Matters
Wells Fargo's Fargo upgrade to Gemini 2.0 Flash drove Spanish-version adoption to 80% — the Spanish-speaking segment is the single largest non-English bloc in US retail banking and is frequently underserved by English-first copilots that translate on the fly. Language coverage also cascades into PII masking (entity extractors are language-specific), compliance-tagged response libraries (Reg E disclosures must be delivered in the language of the conversation), and embedding models (multilingual embeddings sacrifice some retrieval accuracy vs. language-specific ones).
Note prompts — click to add
+ What percentage of our inbound volume is non-English, and is our copilot viable in those languages today?+ Do we have compliance-approved Reg E / TILA disclosures in every supported language?+ Is our PII masking model language-aware, or are we leaking PII in non-English transcripts?
Required
Which languages the copilot must support in RAG retrieval, LLM generation, and PII masking.
Select all that apply
English (US)
Spanish (US — ~20% of retail banking inbound)
French (Canadian — OSFI regulated institutions)
Mandarin / Cantonese
Tagalog
Vietnamese
Portuguese (Brazilian)
Arabic
Multi-language single-model (mBERT / XLM-R / multilingual LLM)
English-only today, multi-language planned
required
✓ saved
Confirm data residency and cross-border constraints
Required
Map conversation, customer, and 1033 open-banking data to jurisdictional constraints before architecture is finalized.
Select all that apply
US — GLBA requires in-perimeter processing of customer financial information
Canada — OSFI / PIPEDA data residency
EU — GDPR residency + MiFID II recordkeeping
UK — UK GDPR + FCA recordkeeping
India — RBI localization
Brazil — LGPD
Cross-border permitted under SCCs for non-PII metadata only
No customer conversation data may leave on-prem perimeter
requiredtrinidy
TrinidyConversation transcripts containing SSN, account numbers, and 1033 open-banking payloads cannot transit public LLM APIs without triggering GLBA and potential CFPB scrutiny. Trinidy keeps ASR, RAG retrieval, and LLM inference entirely inside the institution's perimeter — no customer conversation data leaves the network boundary.
✓ saved
Define deployment topology for inference
Required
Select the physical / logical deployment target for the LLM serving fleet.
Single choice
On-premises in the institution's data center (required for top-tier banks)
Private cloud / VPC in-region
CCaaS-embedded (NICE / Genesys / Five9 / LivePerson — vendor-hosted)
Public cloud LLM API (OpenAI / Anthropic / Bedrock)
Hybrid: on-prem inference + cloud training / evaluation
requirededgetrinidy
TrinidyFor sub-500ms TTFT plus GLBA-compliant residency, public cloud LLM APIs are physically and regulatorily marginal. Trinidy runs the full agentic RAG + LLM stack on-prem with GPU or CPU targets and deterministic egress-free inference.
✓ saved
Scope agent workflow integration surface
Required
Which systems the copilot must read from and write into during a live call.
Select all that apply
Core banking system (account balance, transactions)
CRM (Salesforce Financial Services Cloud / MS Dynamics)
Case management / ticketing (ServiceNow / Pega)
Dispute management (Reg E workflow)
Fraud case system
Loan origination / LOS
Wealth / brokerage platform
CCaaS desktop (Genesys / NICE / Five9)
1033 open-banking data aggregator
required
✓ saved