Phase 1 of 6
Scoping & Real-Time Detection Latency
Define the RAN surface, detection latency budget, alarm tolerance, and regulatory envelope that will govern every architectural decision in the anomaly-detection pipeline.
0/8
Phase Progress
Required Recommended Optional Open-Source Proprietary Trinidy
RAN Surface & Technology Scope
Identify RAN technologies in scope for anomaly detection
Why This Matters
Each generation exposes a different KPI surface (TS 28.552 for NR is not the same as TS 32.425 for LTE) and a different remediation vocabulary, and one-sizing a model across them loses signal on both ends. O-RAN deployments additionally expose E2 and O1 interfaces that non-O-RAN vendors do not, which changes what telemetry is actually available at sub-second granularity. The most common scoping error is training a single anomaly head on mixed 4G/5G data and discovering too late that NR-specific anomalies (beam failure, numerology mismatch) were never representable in the feature set.
Note prompts — click to add
+ Have we inventoried every RAN technology live in the network plus what is being added in the next 12 months?+ Which technologies share enough KPI overlap to justify a shared detection head vs. dedicated sub-models?+ Do we have the O-RAN E2/O1 interfaces exposed in the segments we need to cover, or only vendor-proprietary OAM?
Required
Confirm which radio technologies and generations the anomaly model must cover.
Select all that apply
5G NR Standalone (3GPP Release 16/17+)
5G NR Non-Standalone (anchored to LTE)
LTE / LTE-Advanced (4G)
Open RAN (O-RAN Alliance split — CU / DU / RU)
Massive MIMO / beamforming cells
Small cells / DAS / indoor
Private 5G / enterprise network slices
Fixed Wireless Access (FWA) sectors
required
✓ saved
Define end-to-end anomaly detection latency SLA
Why This Matters
The O-RAN architecture formally distinguishes Near-RT RIC (10ms–1s control loop, xApps) from Non-RT RIC (>1s, rApps) and anomaly workloads must pick a side — a model built for one cannot be retrofitted for the other without a full feature and inference redesign. Sub-100ms detection is what actually prevents subscriber-visible degradation; above that, you are optimizing post-mortem analytics rather than real-time protection. Every millisecond budgeted for network egress is a millisecond unavailable for feature assembly, inference, and action dispatch.
Note prompts — click to add
+ Which RIC tier (Near-RT xApp vs. Non-RT rApp) is the architectural home for each detection use case we care about?+ What is our current p99 KPI-to-action latency today and where are the hot spots — OAM collection, feature store, or inference?+ Have we stress-tested the pipeline at peak-hour cell load and during handover storms, not just average conditions?
Required
Select the p99 latency budget from KPI arrival to classified anomaly output.
Single choice
< 1ms per TTI (intra-frame, sub-frame remediation)
< 10ms (near-real-time, O-RAN Near-RT RIC envelope)
< 100ms (fast remediation before subscriber impact)
< 1 second (O-RAN Near-RT RIC upper bound)
> 1 second (Non-RT RIC / rApp — planning / policy)
requirededgetrinidy
TrinidyCloud-routed inference adds 50–200ms of backhaul round-trip before a single KPI is scored — exceeding the full TTI and sub-100ms detection budget on its own. Trinidy runs the full anomaly + classification pipeline on-node at the RAN site so the budget is preserved end-to-end.
✓ saved
Establish acceptable false positive rate (alarm fatigue ceiling)
Why This Matters
Anomaly models that fire constantly train the NOC to ignore them — the alarm-fatigue ceiling is typically <2% FPR before Tier-1 ticket throughput collapses under noise. Nokia has publicly reported ~60% reductions in Tier-1 alarm volumes vs. threshold-based monitoring using ML anomaly detection, which is only achievable when FPR is held below that ceiling. Once auto-remediation is introduced, the FPR tolerance drops another order of magnitude because each false positive now executes a physical change (power, tilt, handover threshold) on the network.
Note prompts — click to add
+ What is the measured Tier-1 alarm-to-close rate today, and how much of it is automation-suppressible noise?+ Are we targeting analyst-gated or closed-loop auto-remediation, and have we set different FPR budgets for each?+ How is the NOC compensated or measured on investigation rate — is there a structural incentive to lower FPR?
Required
Set the false positive budget the NOC and automated remediation loop can absorb.
Single choice
< 0.5% FPR (closed-loop auto-remediation)
0.5% – 2% FPR (standard NOC tolerance)
2% – 5% FPR (analyst-gated workflows only)
> 5% FPR (research / monitoring mode)
Not yet measured at sector granularity
required
✓ saved
Define SLA breach and outage cost exposure
Why This Matters
Framing anomaly detection as a revenue / penalty protection function changes how the program trades off accuracy against latency — without a dollar-denominated ceiling, ML teams tend to optimize recall in ways the business cannot fund. FCC Part 4 outage reporting (NORS) and DIRS disaster reporting make some incidents regulatorily expensive independent of direct revenue loss, and an AML-style "cost of a missed detection" model should capture both. A miss that avoids NORS reporting is an order of magnitude cheaper than one that triggers a public filing.
Note prompts — click to add
+ What was our total SLA-breach liability and NORS-reportable incident count last year?+ Who owns the P&L line for churn attributable to network-quality degradation?+ Is network reliability an executive-tracked KPI alongside subscriber growth?
Required
Quantify the revenue / regulatory cost of a detection miss that escalates to an outage.
Single choice
< $100K per major incident (small footprint)
$100K – $1M per major incident
$1M – $10M per major incident (Tier-1 MNO scale)
> $10M (national / regulatory exposure)
Not currently measured at the incident level
required
✓ saved
Map FCC Part 4 / NORS and DIRS reporting obligations
Why This Matters
FCC Part 4 (47 CFR Part 4) requires carriers to report outages meeting duration and user-count thresholds through NORS, and DIRS captures disaster-driven reporting during activated events — the anomaly detection model is often the first line of evidence on whether and when a reportable condition started. A model that misses the onset of a reportable outage does not only cost revenue; it creates a regulatory timing gap that is visible in the filing itself. Scoping which sectors and paths are in the reporting envelope is a first-order decision, not an operations afterthought.
Note prompts — click to add
+ Have we mapped which cells carry 911 / public safety routing and therefore mandatory Part 4 exposure?+ Does our detection pipeline emit a NORS-grade timestamped onset signal, or only post-hoc alerting?+ How is DIRS activation signal integrated into our on-call and suppression logic?
Required
Confirm which cells, regions, and service classes carry FCC outage reporting exposure.
Select all that apply
US commercial mobile wireless (Part 4 / NORS)
US public safety / 911 paths (mandatory Part 4)
DIRS disaster reporting footprint (weather / regional)
Non-US — ETSI / national regulator equivalents
Enterprise / private network — no public reporting
Mixed — footprint spans reporting jurisdictions
required
✓ saved
Confirm data residency and sovereignty constraints
Why This Matters
RAN telemetry carries coverage, utilization, and subscriber-location inferences that most national regulators treat as sovereign or strategically sensitive, independent of personal-data residency rules. Streaming this data to a shared cloud for ML training or inference creates both a residency exposure and a commercial-intelligence leak. The residency constraint is effectively a deployment-topology constraint — it is usually cheaper to decide once, at scoping time, than to re-platform later.
Note prompts — click to add
+ Have we mapped every jurisdiction in our footprint and the residency rules that apply to RAN telemetry?+ Are training data and inference runtime held to the same residency boundary, or different ones?+ For managed-service vendors in scope, do we have documented evidence of in-region processing?
Required
Map RAN telemetry and model artifacts to jurisdictional residency requirements.
Select all that apply
EU GDPR — RAN telemetry must remain in EU
UK data residency required
US — GSMA NESAS / FedRAMP where managed service applies
India, Brazil, or other national telecom residency rules
China PIPL / MIIT — on-shore-only RAN data
Customer-owned private 5G — customer tenancy boundary
Cross-border permitted under approved vendors / SCCs
requiredtrinidy
TrinidyRAN telemetry is commercially sensitive and, in many jurisdictions, sovereign data. Trinidy keeps KPI collection, feature engineering, inference, and model training fully within the operator's own perimeter — no cross-border RAN data flow to a shared cloud.
✓ saved
Define closed-loop remediation policy guardrails
Why This Matters
Closed-loop automation is where anomaly detection converts from analytics into operations, but it is also where a mis-scoped model can physically degrade the network. Actions like tilt changes and cell resets are high-blast-radius and should sit behind tight policy guardrails even when auto-applied. ETSI GS ZSM and O-RAN formalize the concept of scoped, reversible intents — the guardrail design is as important as the model quality.
Note prompts — click to add
+ Is every auto-applied action reversible within the same control loop, and is the reversal tested?+ Which actions are policy-gated by time window, region, or cell class, and where is that policy versioned?+ Do we have a hard kill-switch that reverts all automated actions in one command?
Required
Specify which remediation actions the model is permitted to trigger automatically vs. analyst-gated.
Select all that apply
Power adjustment within pre-approved range
Handover threshold tuning
PCI / neighbour list updates
Antenna tilt / beam configuration changes
Cell blacklist / load-balancing redirect
Automated cell reset / restart
Ticket generation only — no automated action
required
✓ saved
Specify deployment topology for the inference plane
Why This Matters
Deployment topology directly determines the achievable latency floor — cell-site edge gives you the full TTI budget, far-edge aggregation takes a few milliseconds off, and cloud deployment is usually incompatible with sub-100ms detection before a single feature is computed. O-RAN formalizes the choice with Near-RT RIC (xApp) and Non-RT RIC (rApp) tiers, and the decision also locks in which vendors and which SMO you are coupled to. Hybrid topologies (on-site inference + central training) are usually the most flexible but require careful feature-parity discipline.
Note prompts — click to add
+ Is the detection pipeline deployed at the tightest latency tier we actually need, or tighter?+ For each vendor-managed option, have we measured the actual end-to-end latency rather than the headline vendor number?+ Is our training topology consistent with our inference topology, or do they pull from different data estates?
Required
Select the physical and logical location of the anomaly detection pipeline.
Single choice
On-site at BBU / DU (cell-site edge)
Regional aggregation (far-edge / C-RAN hub)
Near-RT RIC (O-RAN — within 1s control loop)
Non-RT RIC / SMO (operator cloud or on-prem)
Vendor-managed cloud (Ericsson IAP / Nokia MantaRay / Samsung SMO)
Hybrid: on-site inference + central training
requirededgetrinidy
TrinidyFor sub-100ms anomaly detection and sovereignty constraints, cloud inference is physically and regulatorily incompatible. Trinidy is the on-site inference substrate colocated with BBUs / DUs — GPU and FPGA on the same fabric, with centralized fleet management.
✓ saved