Phase 1 of 6
Scoping & Signal Profile
Define the consuming strategy, signal horizon, latency envelope, and capacity ceiling before a single model is trained — these constraints govern every downstream decision.
0/8
Phase Progress
Required Recommended Optional Open-Source Proprietary Trinidy
Strategy & Signal Consumer
Identify consuming strategy archetype(s)
Why This Matters
Strategy archetype determines every other scoping decision — an event-driven desk reacting to an 8-K needs sub-second classification over structured filings, while a long-only factor overlay can tolerate 15-minute batch sentiment averaged across thousands of sources. The most common program failure is building one "sentiment platform" that pretends to serve a long-only quant book and an HFT desk simultaneously, and ends up serving neither well. Pick the strategy first, then design the signal pipeline around its horizon.
Note prompts — click to add
+ Are we building for one strategy or several, and do they share enough horizon to justify a shared signal?+ Who owns the P&L for the consuming strategy — is there a single decision-maker for model tradeoffs?+ Is the sentiment signal the primary alpha source or a factor augmentation on an existing strategy?Select every strategy family that will consume this sentiment signal in production.
Select all that apply
Define signal latency envelope
Why This Matters
HFT-frequency signals have estimated half-lives under 0.02 seconds; news-wire signals on large-caps decay within ~1 trading day; small/mid-cap signals can persist for a week. Your latency budget must be tight enough that your strategy can trade inside the decay window, but no tighter — over-engineering for HFT when the consuming strategy holds for a week is wasted infrastructure. Cloud API latency alone (1–3s per call) can consume the entire budget on an event-driven desk.
Note prompts — click to add
+ What is the measured alpha half-life of the signal class we are targeting?+ Have we actually measured cloud API p99 latency during FOMC / earnings peaks, not just the average?+ How much of our latency budget is consumed by network round-trip vs. actual inference?Select the end-to-end latency budget from raw text arrival to tradable signal.
Single choice
Trinidy — Cloud LLM APIs typically add 1–3s of round-trip latency — enough on its own to erode most of the alpha on a news-wire event. Trinidy co-locates ingestion, NLP inference, and signal scoring so the entire pipeline fits inside a sub-5s budget, with a sub-second path available for event-driven desks.
Define signal horizon / decay profile
Why This Matters
Horizon is the single most important modeling decision because it determines the label you train against. A FinBERT model trained on next-day returns learns a fundamentally different relationship than one trained on 5-minute returns on the same text. Training-label horizon mismatches with production use are a silent and extremely common source of "why did this work in backtest but not live" failures — the model is predicting the wrong thing.
Note prompts — click to add
+ What exact holding-period return do we use as the training label, and does it match the consuming strategy?+ Have we measured the decay curve on our own signal, or are we using published half-life estimates?+ Do we have a documented retraining trigger when the measured decay curve shifts?Select the expected holding period the sentiment signal is designed to predict.
Single choice
Estimate capacity / strategy size
Why This Matters
Sentiment signals are particularly vulnerable to capacity compression because many funds consume the same vendor feeds (RavenPack, Bloomberg). When many funds use identical NLP signals correlation rises, alpha decays faster, and 2022-style quant crowding events become a liquidity risk. Anchoring capacity at scoping time forces the conversation about whether proprietary fine-tuning is actually necessary or whether vendor scores are sufficient — a $10B book cannot rely on crowded vendor feeds.
Note prompts — click to add
+ Have we measured our signal correlation with RavenPack / Bloomberg baseline scores?+ What is our liquidity-adjusted capacity estimate under current market depth?+ Are we explicitly differentiated from the crowded consensus feed, or replicating it?Quantify the AUM the signal is designed to support before alpha is capacity-constrained.
Single choice
Select coverage universe
Define the tradable universe the signal must cover.
Select all that apply
Define deployment environment
Select the primary deployment target for inference and signal generation.
Single choice
Trinidy — Cloud is acceptable from a latency standpoint for most sentiment use cases, but proprietary signal IP, query-pattern leakage to cloud LLM providers, and EU AI Act audit obligations push many firms to on-prem or VPC deployment. Trinidy supports both — the same inference fabric runs in the institution's own data center, a private VPC, or a hybrid of both.
Define signal-IP protection posture
Why This Matters
Sentiment strategies derive alpha from model uniqueness — the exact text that causes your firm to buy vs. hold is a trade secret. Sending that text to a hosted LLM endpoint creates a telemetry channel where query patterns, timing, and content can in principle be aggregated by the vendor. Firms with material signal IP increasingly treat this as a meaningful competitive intelligence leak rather than a theoretical one.
Note prompts — click to add
+ Have we documented what a cloud LLM vendor could learn about our strategy from our query patterns?+ Are queries, responses, and metadata logged at the vendor — and do their retention terms align with our IP posture?+ Would we be comfortable if a competitor saw our exact query log for the last 30 days?Document the threat model for proprietary signal IP leaking through infrastructure choices.
Trinidy — Cloud NLP APIs can theoretically observe query patterns and infer strategy logic — which text is queried, at what time, in what order. On-premises inference on Trinidy eliminates this side channel entirely; no query metadata ever leaves the firm's perimeter.
Confirm MNPI / information-barrier posture
Why This Matters
Sentiment models trained on or inferring from expert-network transcripts, channel checks, or corporate-relationship data can cross into insider-trading territory even when each individual datum looks public. The SEC's 2024 DraftKings action ($200K fine for a 30-minute social media slip) and its examinations of mosaic theory at alternative data vendors have raised the bar on source documentation. Every data source must be provenance-reviewed before it is wired to a model.
Note prompts — click to add
+ Do we have a documented mosaic-theory review on every alternative data source we consume?+ Are expert-network transcripts clearly segregated from systematic signal inputs?+ Who in legal or compliance signs off on new data source onboarding?Document the information-source review that prevents MNPI from entering the signal pipeline.
✓ saved