WP - MM - Financial

Phase 1 of 6

Scoping & Latency

Define the channels, time-to-first-token budget, seat topology, and language coverage that will govern every RAG and LLM decision downstream.

0/8

Phase Progress

Required Recommended Optional Open-Source Proprietary Trinidy

Channels & Conversation Surface

Identify channels in scope for agent assist

Required

Confirm which customer-facing channels the copilot must support.

Select all that apply

Inbound voice (phone / IVR-routed)

Outbound voice (collections / retention)

Live web chat

In-app mobile chat

SMS / RCS

Secure messaging inside online banking

Video banking (screen share + face-to-face)

Email triage + draft-reply

Branch / in-person agent terminal

required

✓ saved

Define time-to-first-token (TTFT) latency budget

Required

Select the P95 TTFT the agent copilot must hold during a live conversation.

Single choice

< 500ms TTFT (invisible to the conversation — target)

500ms – 1s (acceptable for chat, awkward on voice)

1s – 2s (agent must talk over the pause)

> 2s (breaks agent flow — 68% of FSI deployments today)

Tiered by channel (sub-500ms voice, 1–2s email)

requirededgetrinidy

Trinidy — Trinidy's optimized RAG pipeline — embedding 20ms, ANN retrieval 80ms, rerank 50ms, prompt build 50ms, first token 250ms — completes in under 450ms on-prem. Cloud-routed LLM calls alone consume 200–800ms of network and queue time before the model starts generating.

✓ saved

Quantify daily call volume and concurrency

Required

Capacity planning anchor for the LLM serving fleet — peak concurrency drives GPU count, not daily volume.

Single choice

< 10k calls/day (regional / credit union)

10k – 100k calls/day (mid-size bank)

100k – 1M calls/day (super-regional / specialty issuer)

> 1M calls/day (top-10 retail bank tier)

Mixed voice + chat, not currently aggregated

required

✓ saved

Specify concurrent agent seat count

Required

Number of simultaneously active agent seats that must hold sub-500ms TTFT under peak load.

Single choice

< 500 seats

500 – 2,500 seats

2,500 – 10,000 seats

> 10,000 seats (multi-site enterprise contact center)

Hybrid human + AI chat deflection (variable seats)

required

✓ saved

Define language and dialect coverage

Required

Which languages the copilot must support in RAG retrieval, LLM generation, and PII masking.

Select all that apply

English (US)

Spanish (US — ~20% of retail banking inbound)

French (Canadian — OSFI regulated institutions)

Mandarin / Cantonese

Tagalog

Vietnamese

Portuguese (Brazilian)

Arabic

Multi-language single-model (mBERT / XLM-R / multilingual LLM)

English-only today, multi-language planned

required

✓ saved

Confirm data residency and cross-border constraints

Required

Map conversation, customer, and 1033 open-banking data to jurisdictional constraints before architecture is finalized.

Select all that apply

US — GLBA requires in-perimeter processing of customer financial information

Canada — OSFI / PIPEDA data residency

EU — GDPR residency + MiFID II recordkeeping

UK — UK GDPR + FCA recordkeeping

India — RBI localization

Brazil — LGPD

Cross-border permitted under SCCs for non-PII metadata only

No customer conversation data may leave on-prem perimeter

requiredtrinidy

Trinidy — Conversation transcripts containing SSN, account numbers, and 1033 open-banking payloads cannot transit public LLM APIs without triggering GLBA and potential CFPB scrutiny. Trinidy keeps ASR, RAG retrieval, and LLM inference entirely inside the institution's perimeter — no customer conversation data leaves the network boundary.

✓ saved

Define deployment topology for inference

Required

Select the physical / logical deployment target for the LLM serving fleet.

Single choice

On-premises in the institution's data center (required for top-tier banks)

Private cloud / VPC in-region

CCaaS-embedded (NICE / Genesys / Five9 / LivePerson — vendor-hosted)

Public cloud LLM API (OpenAI / Anthropic / Bedrock)

Hybrid: on-prem inference + cloud training / evaluation

requirededgetrinidy

Trinidy — For sub-500ms TTFT plus GLBA-compliant residency, public cloud LLM APIs are physically and regulatorily marginal. Trinidy runs the full agentic RAG + LLM stack on-prem with GPU or CPU targets and deterministic egress-free inference.

✓ saved

Scope agent workflow integration surface

Required

Which systems the copilot must read from and write into during a live call.

Select all that apply

Core banking system (account balance, transactions)

CRM (Salesforce Financial Services Cloud / MS Dynamics)

Case management / ticketing (ServiceNow / Pega)

Dispute management (Reg E workflow)

Fraud case system

Loan origination / LOS

Wealth / brokerage platform

CCaaS desktop (Genesys / NICE / Five9)

1033 open-banking data aggregator

required

✓ saved