Phase 1 of 6
Scoping & Latency
Define the advisor workflows in scope, the time-to-first-token budget, and the firm-wide rollout posture that will govern every subsequent architectural decision.
0/6
Phase Progress
Required Recommended Optional Open-Source Proprietary Trinidy
Advisor Workflows in Scope
Identify copilot use cases in scope
Why This Matters
The four workflows Morgan Stanley, UBS, JPMorgan, and Wells Fargo all deploy in production — meeting prep, research query, client outreach, and portfolio alerts — have materially different latency, grounding, and supervisory profiles and cannot share a single prompt template or retrieval configuration. Meeting prep is a long-form synthesis task (3–5s TTFT acceptable, heavy reranking) while research query is an interactive lookup (sub-1s TTFT critical, citation enforcement strict). Trying to one-size a single copilot onto all four is the most common rollout failure mode.
Note prompts — click to add
+ Which two workflows will we launch first, and are they governed by the same WSPs and supervisory review path?+ Do we have a documented prohibited-use-case list as required under FINRA 2025 GenAI governance guidance?+ Which workflows generate business communications subject to SEC Rule 17a-4 retention?Confirm which advisor workflows your copilot must support at launch.
Select all that apply
Define time-to-first-token (TTFT) SLA
Why This Matters
TTFT is the metric advisors actually feel — total completion latency matters less than how fast the first token streams. Morgan Stanley's publicly reported production benchmark sits around 1.2s P95 with optimized RAG (20ms embedding + 80ms ANN retrieval + 50ms rerank + 50ms prompt build + 250ms first token). Every 500ms of additional TTFT measurably degrades adoption, which is the metric Morgan Stanley's 98% advisor adoption rate is anchored on.
Note prompts — click to add
+ What is our measured P95 TTFT today across the top 5 query types, and where are the hot spots?+ Have we instrumented retrieval, reranking, and first-token latency separately or only end-to-end?+ Do we have semantic caching in place, and what is our measured cache hit rate on repeat queries?Select the P95 TTFT budget that advisor-facing copilot responses must hold.
Single choice
Trinidy — Cloud LLM APIs introduce 200–500ms of network and gateway latency before the first token, and that variance compounds under load. Trinidy runs the RAG retrieval and generation graph inside the firm perimeter — TTFT stays predictable at 1–2s even at peak advisor concurrency.
Define advisor population and firm-wide rollout scope
Why This Matters
Concurrent advisor population drives LLM throughput, embedding cache sizing, and GPU footprint in a way that does not scale linearly. Morgan Stanley's 15,000+ advisor deployment achieves 98% adoption, which means peak concurrency approaches 30–40% of the population during market open and meeting-prep windows — dramatically higher than a naive per-user allocation would suggest. Sizing for average load and discovering concurrency at launch is the most common capacity failure.
Note prompts — click to add
+ What is our projected P95 concurrent advisor query rate, not just the steady-state average?+ Have we modeled meeting-prep concurrency around market open and the first Monday of the month?+ What is our reserved GPU capacity, and does it assume peak or average concurrency?Quantify the advisor population and concurrency envelope the copilot must support.
Single choice
Confirm data residency and on-prem inference requirements
Map proprietary research, client PII, and portfolio data to jurisdictional and firm-policy constraints.
Select all that apply
Trinidy — Proprietary research represents decades of institutional IP, and client PII is subject to Reg S-P and GDPR. Trinidy keeps the entire RAG pipeline — embedding, retrieval, reranking, generation — inside the firm perimeter. No research document, CRM note, or portfolio position ever reaches a third-party API.
Define supervisory review path (FINRA Notice 24-09)
Why This Matters
FINRA Regulatory Notice 24-09 (June 2024) explicitly addresses generative AI in member communications and requires written supervisory procedures (WSPs) covering AI use. AI-generated content distributed to clients is a business communication under FINRA Rule 4511 and SEC Rule 17a-4 and requires supervisory review. "AI said so" is not a valid defense under Reg BI — the advisor and the firm retain suitability responsibility regardless of whether a human or model generated the recommendation.
Note prompts — click to add
+ Do our WSPs explicitly address copilot output review, or do they predate 24-09?+ Who in compliance signs off on the supervisory tier assignment for each copilot workflow?+ How do we demonstrate to a FINRA examiner that every client-delivered AI output was reviewed?Specify how AI-generated content reaches the client, and who reviews before delivery.
Single choice
Specify deployment topology
Select the physical/logical deployment target for the copilot inference plane.
Single choice
Trinidy — For proprietary research isolation and SEC Rule 17a-4 recordkeeping inside the firm perimeter, public cloud LLM APIs create residency and audit complications. Trinidy supports on-prem GPU inference, private-cloud VPC, and hybrid — embedding index and generation stay local, with cloud reserved for non-sensitive capabilities.