Phase 1 of 6
Scoping & Latency
Define the advisor workflows in scope, the time-to-first-token budget, and the firm-wide rollout posture that will govern every subsequent architectural decision.
0/6
Phase Progress
Required Recommended Optional Open-Source Proprietary Trinidy
Advisor Workflows in Scope
Identify copilot use cases in scope
Why This Matters
The four workflows Morgan Stanley, UBS, JPMorgan, and Wells Fargo all deploy in production — meeting prep, research query, client outreach, and portfolio alerts — have materially different latency, grounding, and supervisory profiles and cannot share a single prompt template or retrieval configuration. Meeting prep is a long-form synthesis task (3–5s TTFT acceptable, heavy reranking) while research query is an interactive lookup (sub-1s TTFT critical, citation enforcement strict). Trying to one-size a single copilot onto all four is the most common rollout failure mode.
Note prompts — click to add
+ Which two workflows will we launch first, and are they governed by the same WSPs and supervisory review path?+ Do we have a documented prohibited-use-case list as required under FINRA 2025 GenAI governance guidance?+ Which workflows generate business communications subject to SEC Rule 17a-4 retention?
Required
Confirm which advisor workflows your copilot must support at launch.
Select all that apply
Meeting prep automation (client briefings, agenda synthesis)
Research query / natural language search across proprietary corpus
Personalized client outreach drafting (email / call scripts)
Proactive portfolio alerts and synthesis (drift, concentration, tax-loss)
Compliance-ready note generation post-meeting
Investment policy statement (IPS) drafting and review
Regulatory / product explainers (529, Roth conversion, SMA)
Trade rationale generation for suitability documentation
CRM enrichment (auto-populate contact notes, next-best-action)
required
✓ saved
Define time-to-first-token (TTFT) SLA
Why This Matters
TTFT is the metric advisors actually feel — total completion latency matters less than how fast the first token streams. Morgan Stanley's publicly reported production benchmark sits around 1.2s P95 with optimized RAG (20ms embedding + 80ms ANN retrieval + 50ms rerank + 50ms prompt build + 250ms first token). Every 500ms of additional TTFT measurably degrades adoption, which is the metric Morgan Stanley's 98% advisor adoption rate is anchored on.
Note prompts — click to add
+ What is our measured P95 TTFT today across the top 5 query types, and where are the hot spots?+ Have we instrumented retrieval, reranking, and first-token latency separately or only end-to-end?+ Do we have semantic caching in place, and what is our measured cache hit rate on repeat queries?
Required
Select the P95 TTFT budget that advisor-facing copilot responses must hold.
Single choice
< 1s TTFT (aggressive — interactive research lookup)
1–2s TTFT (Morgan Stanley production benchmark)
2–3s TTFT (standard copilot / meeting prep)
3–5s TTFT (long-form synthesis / deep research agent)
Tiered by workflow
requiredtrinidy
TrinidyCloud LLM APIs introduce 200–500ms of network and gateway latency before the first token, and that variance compounds under load. Trinidy runs the RAG retrieval and generation graph inside the firm perimeter — TTFT stays predictable at 1–2s even at peak advisor concurrency.
✓ saved
Define advisor population and firm-wide rollout scope
Why This Matters
Concurrent advisor population drives LLM throughput, embedding cache sizing, and GPU footprint in a way that does not scale linearly. Morgan Stanley's 15,000+ advisor deployment achieves 98% adoption, which means peak concurrency approaches 30–40% of the population during market open and meeting-prep windows — dramatically higher than a naive per-user allocation would suggest. Sizing for average load and discovering concurrency at launch is the most common capacity failure.
Note prompts — click to add
+ What is our projected P95 concurrent advisor query rate, not just the steady-state average?+ Have we modeled meeting-prep concurrency around market open and the first Monday of the month?+ What is our reserved GPU capacity, and does it assume peak or average concurrency?
Required
Quantify the advisor population and concurrency envelope the copilot must support.
Single choice
< 500 advisors (pilot / single business unit)
500 – 5,000 advisors (regional rollout)
5,000 – 20,000 advisors (firm-wide, mid-size wirehouse)
> 20,000 advisors (Morgan Stanley / UBS / Merrill scale)
Phased — starting pilot with firm-wide roadmap
required
✓ saved
Confirm data residency and on-prem inference requirements
Required
Map proprietary research, client PII, and portfolio data to jurisdictional and firm-policy constraints.
Select all that apply
Proprietary research corpus must remain on-premises
Client PII must not reach third-party LLM APIs
Portfolio positions must stay in firm perimeter
EU GDPR — EU client data must remain in EU
UK GDPR — UK residency required
APAC (Singapore MAS, HK SFC) residency constraints
Cross-border permitted under approved vendor DPAs
Reg S-P (Regulation S-P) privacy constraints apply
requiredtrinidy
TrinidyProprietary research represents decades of institutional IP, and client PII is subject to Reg S-P and GDPR. Trinidy keeps the entire RAG pipeline — embedding, retrieval, reranking, generation — inside the firm perimeter. No research document, CRM note, or portfolio position ever reaches a third-party API.
✓ saved
Define supervisory review path (FINRA Notice 24-09)
Why This Matters
FINRA Regulatory Notice 24-09 (June 2024) explicitly addresses generative AI in member communications and requires written supervisory procedures (WSPs) covering AI use. AI-generated content distributed to clients is a business communication under FINRA Rule 4511 and SEC Rule 17a-4 and requires supervisory review. "AI said so" is not a valid defense under Reg BI — the advisor and the firm retain suitability responsibility regardless of whether a human or model generated the recommendation.
Note prompts — click to add
+ Do our WSPs explicitly address copilot output review, or do they predate 24-09?+ Who in compliance signs off on the supervisory tier assignment for each copilot workflow?+ How do we demonstrate to a FINRA examiner that every client-delivered AI output was reviewed?
Required
Specify how AI-generated content reaches the client, and who reviews before delivery.
Single choice
Advisor-in-the-loop — every output reviewed before client delivery
Tiered — low-risk outputs auto-send, high-risk require review
Post-hoc supervisory sampling with WSP-defined thresholds
Client-facing autonomous — none (not recommended under 24-09)
Not yet designed — open policy question
required
✓ saved
Specify deployment topology
Required
Select the physical/logical deployment target for the copilot inference plane.
Single choice
On-premises GPU cluster (H100 / A100 in firm data center)
Private cloud / VPC in-region with dedicated tenancy
Hybrid — on-prem retrieval + private-cloud inference
Public cloud with enterprise DPA and zero-retention
Per-region deployment to satisfy data residency
requiredtrinidy
TrinidyFor proprietary research isolation and SEC Rule 17a-4 recordkeeping inside the firm perimeter, public cloud LLM APIs create residency and audit complications. Trinidy supports on-prem GPU inference, private-cloud VPC, and hybrid — embedding index and generation stay local, with cloud reserved for non-sensitive capabilities.
✓ saved