Phase 1 of 6
Scoping & EHR Integration
Define the document types, EHR integration surface, latency envelope, and HIPAA / HTI-2 boundaries that govern every downstream decision.
0/8
Phase Progress
Required Recommended Optional Open-Source Proprietary Trinidy
Document Types & Clinical Scope
Identify clinical document types in scope for extraction
Why This Matters
Over 80% of EHR data sits in free text (Frontiers in Physics, NLP in Healthcare 2024), and document type drives nearly every downstream design choice — negation and section parsing differ materially between a pathology synoptic report and an ED H&P, and AI-scribe output from Nuance DAX Copilot introduces a new documentation distribution that legacy clinical NLP pipelines were not trained on. Attempting to ship a single generic model across all document types is the most common scoping mistake.
Note prompts — click to add
+ Which document types drive the highest downstream value — coding, CDI, quality reporting, or prior-auth?+ What share of our incoming documentation is now ambient-scribe-generated, and is that growing monthly?+ Do we have document-type metadata reliably tagged at ingestion, or will the pipeline need to classify it?
Required
Select all unstructured document types the NLP pipeline must process.
Select all that apply
Progress notes / SOAP notes
History & Physical (H&P)
Discharge summaries
Operative / procedure notes
Pathology reports
Radiology narrative reports
Emergency department notes
Consult notes
AI-scribe / ambient documentation output (Nuance DAX, Abridge, Suki)
Nursing notes and flowsheet free-text
Telephone encounter notes
Scanned / OCR documents (outside records)
requirednlp
✓ saved
Define primary extraction targets (clinical entities)
Why This Matters
The extraction target set is also the terminology binding set — "diagnoses" alone can mean SNOMED CT (clinical semantics), ICD-10-CM (billing), or both, and mis-scoping this leads to a pipeline that is technically accurate but commercially useless. GatorTron achieves F1 = 0.9122–0.9367 on SDoH concept extraction (PMC / npj Digital Medicine 2023–2025), so even non-billing extractions can be held to a high bar when scoped properly.
Note prompts — click to add
+ Which entity classes tie directly to revenue (CDI, HCC risk adjustment) vs. analytics-only use?+ Do we need to produce both a SNOMED CT and an ICD-10-CM code for each diagnosis, or only one?+ Who owns terminology maintenance (RxNorm, LOINC, ICD-10) for the extraction pipeline?
Required
Select the clinical entity classes the pipeline must extract with billing-grade accuracy.
Select all that apply
Diagnoses (ICD-10-CM / SNOMED CT)
Procedures (CPT / HCPCS / ICD-10-PCS)
Medications (RxNorm / NDC)
Lab tests and results (LOINC)
Allergies and adverse reactions
Signs, symptoms, and findings
Anatomic sites
Social determinants of health (SDOH)
Smoking / alcohol / substance use status
Family history
Problem list items for reconciliation
Prior authorization-relevant evidence
requirednlp
✓ saved
Specify EHR integration pattern
Why This Matters
CMS-0057-F (the CMS Interoperability and Prior Authorization final rule) and ONC HTI-1 (2023) have cemented HL7 FHIR R4 with US Core as the regulatory default integration surface for new clinical data workflows, and ONC HTI-2 (2024–2025) extends transparency and decision-support oversight to the extracted-entity layer. A pipeline that writes back via a legacy HL7 v2 MDM message without structured FHIR equivalence may satisfy the EHR vendor but fails the HTI-2 transparency audit.
Note prompts — click to add
+ Do we have FHIR R4 write permissions on the target EHR, or only read?+ Are our extracted Condition / MedicationStatement / Observation resources US Core-conformant?+ Is the integration path going to be the same for our pilot site and for our health-system-wide rollout?
Required
Select the integration surface into which extracted entities will be written.
Single choice
HL7 FHIR R4 (US Core profiles) — read and write via FHIR API
Epic App Orchard / Vendor Services API
Oracle Health (Cerner) Millennium API
HL7 v2 messaging (ADT / MDM / ORU)
Direct database integration (Clarity / HealtheIntent data warehouse)
CDA / C-CDA document ingestion only (no write-back)
Flat-file / batch SFTP export
required
✓ saved
Define end-to-end extraction latency SLA
Why This Matters
The latency decision maps directly to deployment architecture — sub-5-second SLAs on 2000–5000 token notes force GPU inference colocated with the EHR, while nightly batch tolerates shared CPU capacity. Mis-scoping this is the most common reason NLP pilots stall: the demo runs fine on a 400-token sample, then a real ambient-scribe transcript at 3× the length blows the SLA on the first production call.
Note prompts — click to add
+ What is the p95 token length of our real documents, not our demo samples?+ Which downstream workflows are latency-sensitive (point-of-care, prior-auth) vs. latency-tolerant (quality reporting)?+ Have we stress-tested at 2× peak document volume to find the throughput cliff?
Required
Select the latency envelope the pipeline must hold per document.
Single choice
Real-time (< 1 second) — point-of-care decision support
Near-real-time (1–5 seconds) — ambient scribe finalize, prior-auth
Interactive (5–30 seconds) — coder / CDI workflow
Asynchronous (minutes) — batch coding and quality reporting
Nightly batch — registry and population health
Tiered by document type
requiredtrinidy
TrinidyCloud-routed LLM inference on a 3000-token discharge summary can consume 10–30 seconds round-trip under load, and every second is a second PHI is in flight. NEXUS OS runs the full pipeline — NER, entity linking, LLM extraction — on-node, keeping p95 under 5 seconds even on long documents.
✓ saved
Confirm HIPAA PHI handling boundary
Why This Matters
Clinical notes are 100% PHI by default under HIPAA 45 CFR 164.514, and OCR HIPAA enforcement has accelerated meaningfully in 2024–2025 — routing clinical text through a cloud NLP endpoint places that endpoint and its operator under BAA and audit scope. Fine-tuned model weights trained on PHI are increasingly treated as PHI-derivative artifacts, which constrains where those weights can be stored and who can access them. Scoping the PHI boundary after architecture is set is where most healthcare NLP programs discover their pilot does not survive HIPAA review.
Note prompts — click to add
+ Does our inference path keep PHI inside our existing HIPAA perimeter, or does it extend the perimeter to a new vendor?+ Are our fine-tuned model weights classified as PHI-derivative, and stored accordingly?+ Do our inference logs capture note text, and if so what is our retention and access policy?
Required
Map the PHI perimeter for the NLP pipeline — training, inference, logs, and model artifacts.
Select all that apply
PHI remains on-premises for inference (HIPAA 45 CFR 164 covered)
De-identified (45 CFR 164.514 Safe Harbor) before any egress
Limited Data Set under DUA for training
BAA in place with every cloud vendor touching PHI
Model weights considered PHI-adjacent (fine-tuned on notes)
Inference logs contain PHI — retention and audit in scope
Cross-border processing prohibited (US-only)
Air-gapped deployment (no outbound network)
required
✓ saved
Confirm ONC HTI-2 transparency obligations
Why This Matters
ONC HTI-1 (2023) and HTI-2 (2024–2025) impose transparency and Intervention Risk Management (IRM) duties on any AI-driven Predictive DSI embedded in certified health IT, and those duties extend to the extraction pipeline that feeds the DSI — not just to the final scoring model. A pipeline that produces the diagnosis feature consumed by a sepsis predictor inherits DSI source-attribute documentation obligations. Enforcement is active: the ONC DSI guidance explicitly names extraction pipelines as in-scope when their outputs shape clinical decisions.
Note prompts — click to add
+ Is our extraction pipeline feeding any certified DSI today, or will it after integration?+ Do we have source-attribute documentation (training data, validation, known limits) ready for HTI-2 transparency?+ Who in compliance owns the IRM documentation for the extraction layer?
Required
Identify applicable ONC HTI-2 (2024–2025) decision-support intervention (DSI) transparency obligations.
Select all that apply
Pipeline drives a Predictive DSI under ONC HTI-1 §170.315(b)(11)
Extracted entities feed a certified DSI workflow
Used for risk adjustment / HCC capture (CMS oversight)
Used for prior-authorization decisioning (CMS-0057-F)
Used for quality measure calculation (CMS eCQM)
Source attributes and DSI documentation required and maintained
Intervention Risk Management (IRM) practices documented
Not applicable — extraction is analytics-only
required
✓ saved
Specify deployment topology
Required
Select the physical / logical deployment target for the NLP inference plane.
Single choice
On-premises, colocated with EHR (HIPAA-sovereign)
Private cloud / VPC in-region under BAA
Hybrid: on-prem inference + cloud fine-tuning on de-identified data
Managed cloud (AWS HealthLake / Azure Text Analytics for Health / Google Healthcare API) under BAA
Air-gapped research enclave
requirededgetrinidy
TrinidyHIPAA-sovereign inference and HTI-2 auditability are both easier when the NLP pipeline never leaves the institution. NEXUS OS is the on-premises substrate for SLM inference — GPU accelerators colocated with the EHR, no PHI egress, and full decision provenance out of the box.
✓ saved
Confirm EU / cross-border deployment constraints
Why This Matters
For any deployment touching EU patients, the EU AI Act (Regulation 2024/1689) classifies clinical NLP that drives medical decisions as high-risk, which triggers conformity assessment, logging, human oversight, and post-market monitoring obligations that are materially stricter than HIPAA. GDPR Article 9 separately treats all clinical text as special category data, which constrains the lawful bases for processing and imposes data-minimization duties on the extraction pipeline itself.
Note prompts — click to add
+ Do we process any EU patient records today or in the next 24 months?+ Have we classified the pipeline under the AI Act risk tiers, or are we assuming HIPAA coverage is sufficient?+ Is data residency enforced at the inference runtime, or only at the primary data store?
Recommended
If deploying outside the US, identify applicable EU AI Act and GDPR obligations.
Select all that apply
EU AI Act (Regulation 2024/1689) — high-risk medical AI obligations
GDPR Article 9 — special category health data
Data residency in EU required
UK GDPR residency required
Not applicable — US-only deployment
recommended
✓ saved