Phase 1 of 6
Scoping & EHR Integration
Define the document types, EHR integration surface, latency envelope, and HIPAA / HTI-2 boundaries that govern every downstream decision.
0/8
Phase Progress
Required Recommended Optional Open-Source Proprietary Trinidy
Document Types & Clinical Scope
Identify clinical document types in scope for extraction
Why This Matters
Over 80% of EHR data sits in free text (Frontiers in Physics, NLP in Healthcare 2024), and document type drives nearly every downstream design choice — negation and section parsing differ materially between a pathology synoptic report and an ED H&P, and AI-scribe output from Nuance DAX Copilot introduces a new documentation distribution that legacy clinical NLP pipelines were not trained on. Attempting to ship a single generic model across all document types is the most common scoping mistake.
Note prompts — click to add
+ Which document types drive the highest downstream value — coding, CDI, quality reporting, or prior-auth?+ What share of our incoming documentation is now ambient-scribe-generated, and is that growing monthly?+ Do we have document-type metadata reliably tagged at ingestion, or will the pipeline need to classify it?Select all unstructured document types the NLP pipeline must process.
Select all that apply
Define primary extraction targets (clinical entities)
Why This Matters
The extraction target set is also the terminology binding set — "diagnoses" alone can mean SNOMED CT (clinical semantics), ICD-10-CM (billing), or both, and mis-scoping this leads to a pipeline that is technically accurate but commercially useless. GatorTron achieves F1 = 0.9122–0.9367 on SDoH concept extraction (PMC / npj Digital Medicine 2023–2025), so even non-billing extractions can be held to a high bar when scoped properly.
Note prompts — click to add
+ Which entity classes tie directly to revenue (CDI, HCC risk adjustment) vs. analytics-only use?+ Do we need to produce both a SNOMED CT and an ICD-10-CM code for each diagnosis, or only one?+ Who owns terminology maintenance (RxNorm, LOINC, ICD-10) for the extraction pipeline?Select the clinical entity classes the pipeline must extract with billing-grade accuracy.
Select all that apply
Specify EHR integration pattern
Why This Matters
CMS-0057-F (the CMS Interoperability and Prior Authorization final rule) and ONC HTI-1 (2023) have cemented HL7 FHIR R4 with US Core as the regulatory default integration surface for new clinical data workflows, and ONC HTI-2 (2024–2025) extends transparency and decision-support oversight to the extracted-entity layer. A pipeline that writes back via a legacy HL7 v2 MDM message without structured FHIR equivalence may satisfy the EHR vendor but fails the HTI-2 transparency audit.
Note prompts — click to add
+ Do we have FHIR R4 write permissions on the target EHR, or only read?+ Are our extracted Condition / MedicationStatement / Observation resources US Core-conformant?+ Is the integration path going to be the same for our pilot site and for our health-system-wide rollout?Select the integration surface into which extracted entities will be written.
Single choice
Define end-to-end extraction latency SLA
Why This Matters
The latency decision maps directly to deployment architecture — sub-5-second SLAs on 2000–5000 token notes force GPU inference colocated with the EHR, while nightly batch tolerates shared CPU capacity. Mis-scoping this is the most common reason NLP pilots stall: the demo runs fine on a 400-token sample, then a real ambient-scribe transcript at 3× the length blows the SLA on the first production call.
Note prompts — click to add
+ What is the p95 token length of our real documents, not our demo samples?+ Which downstream workflows are latency-sensitive (point-of-care, prior-auth) vs. latency-tolerant (quality reporting)?+ Have we stress-tested at 2× peak document volume to find the throughput cliff?Select the latency envelope the pipeline must hold per document.
Single choice
Trinidy — Cloud-routed LLM inference on a 3000-token discharge summary can consume 10–30 seconds round-trip under load, and every second is a second PHI is in flight. NEXUS OS runs the full pipeline — NER, entity linking, LLM extraction — on-node, keeping p95 under 5 seconds even on long documents.
Confirm HIPAA PHI handling boundary
Why This Matters
Clinical notes are 100% PHI by default under HIPAA 45 CFR 164.514, and OCR HIPAA enforcement has accelerated meaningfully in 2024–2025 — routing clinical text through a cloud NLP endpoint places that endpoint and its operator under BAA and audit scope. Fine-tuned model weights trained on PHI are increasingly treated as PHI-derivative artifacts, which constrains where those weights can be stored and who can access them. Scoping the PHI boundary after architecture is set is where most healthcare NLP programs discover their pilot does not survive HIPAA review.
Note prompts — click to add
+ Does our inference path keep PHI inside our existing HIPAA perimeter, or does it extend the perimeter to a new vendor?+ Are our fine-tuned model weights classified as PHI-derivative, and stored accordingly?+ Do our inference logs capture note text, and if so what is our retention and access policy?Map the PHI perimeter for the NLP pipeline — training, inference, logs, and model artifacts.
Select all that apply
Confirm ONC HTI-2 transparency obligations
Why This Matters
ONC HTI-1 (2023) and HTI-2 (2024–2025) impose transparency and Intervention Risk Management (IRM) duties on any AI-driven Predictive DSI embedded in certified health IT, and those duties extend to the extraction pipeline that feeds the DSI — not just to the final scoring model. A pipeline that produces the diagnosis feature consumed by a sepsis predictor inherits DSI source-attribute documentation obligations. Enforcement is active: the ONC DSI guidance explicitly names extraction pipelines as in-scope when their outputs shape clinical decisions.
Note prompts — click to add
+ Is our extraction pipeline feeding any certified DSI today, or will it after integration?+ Do we have source-attribute documentation (training data, validation, known limits) ready for HTI-2 transparency?+ Who in compliance owns the IRM documentation for the extraction layer?Identify applicable ONC HTI-2 (2024–2025) decision-support intervention (DSI) transparency obligations.
Select all that apply
Specify deployment topology
Select the physical / logical deployment target for the NLP inference plane.
Single choice
Trinidy — HIPAA-sovereign inference and HTI-2 auditability are both easier when the NLP pipeline never leaves the institution. NEXUS OS is the on-premises substrate for SLM inference — GPU accelerators colocated with the EHR, no PHI egress, and full decision provenance out of the box.
Confirm EU / cross-border deployment constraints
Why This Matters
For any deployment touching EU patients, the EU AI Act (Regulation 2024/1689) classifies clinical NLP that drives medical decisions as high-risk, which triggers conformity assessment, logging, human oversight, and post-market monitoring obligations that are materially stricter than HIPAA. GDPR Article 9 separately treats all clinical text as special category data, which constrains the lawful bases for processing and imposes data-minimization duties on the extraction pipeline itself.
Note prompts — click to add
+ Do we process any EU patient records today or in the next 24 months?+ Have we classified the pipeline under the AI Act risk tiers, or are we assuming HIPAA coverage is sufficient?+ Is data residency enforced at the inference runtime, or only at the primary data store?If deploying outside the US, identify applicable EU AI Act and GDPR obligations.
Select all that apply