N1 Healthcare Engineering Roadmap
Launch-Focused Strategy: Ship Stable Platform by January 1st, 2026
โฅ95%
Extraction Accuracy Target
<10%
CHR Hallucinations Target
๐ฏ Launch Strategy: Achieve 95% accurate biomarker extraction with accurate charting/visualization + <10% hallucination CHR generation + Stripe billing on AWS EKS. Ship to production by January 1st, 2026 with high quality. Add diagnosis, genetics, imaging parsers post-launch based on user feedback.
Continuous Deployment Philosophy: We ship to production often. A task is only considered COMPLETE when it's deployed to production and running stably. Every feature goes live as soon as it's ready.
๐จ Production Priority #1: Production must ALWAYS be bug-free. All production bugs take absolute priority over everything else - roadmap tasks, new features, everything. When a production bug is reported, all squads pivot to fix it immediately.
AWS EKS
Kubernetes + Helm
ArgoCD (GitOps)
Microservices
Redis Streams
Python 3.12+
FastAPI
LangChain/Langroid
PostgreSQL
Current โ Future Architecture
EKS Microservices Architecture
API Gateway & Orchestration
API Backend (FastAPI)
Orchestrator Service
LiteLLM Proxy
Firebase Auth Service
Data Extraction Services (7 Specialized Parsers)
Router Service (MinerU)
Simple Biomarker Parser (Open Source Frontier)
Mixed Biomarker Parser (Open Source Frontier)
Complex Biomarker Parser (Commercial)
Diagnosis Parser (Commercial)
Procedure Parser (Commercial)
Genetics Parser (Commercial)
Medical Imaging Parser (Gemini 2.5 Pro + MONAI)
Data Processing & Enrichment
Grouping Service (Biomarker Grouping)
Enrichment Workers
Validation Service
Unit Standardization
Reference Range Checker
Visualization & Charting
Charting Service (Biomarker Trends & Visualization)
CHR Generation Services
CHR Template Service
Functional Workflow
Langroid Workflow
Sequential Workflow
Infrastructure & Data Layer
Redis Streams (Queue)
PostgreSQL (RDS)
S3 Storage
Secrets Manager
Datadog (APM)
Langfuse (LLM Observability)
Current State: Already running on GKE with Helm charts, External Secrets Operator, and automated deployments. EKS Helmfile charts are ready for migration.
End-to-End Data Flow
PDF Upload
(Patient Records)
โ
Orchestrator
(Queue Monitor)
โ
Router Service
(Classification)
โ
7 Parser Services
(Biomarkers/Clinical/Imaging)
โ
Post-Processing
(Validation)
โ
CHR Generation
(Reports)
Processing Stages Detail
Stage 2: Post-Processing
- Unit standardization (SI units)
- Reference range validation
- Biomarker canonicalization (LLM-based)
- Data quality checks
Stage 3: Enrichment
- Health area classification
- Temporal analysis
- Abnormal value detection
Stage 4: CHR Generation
- Data aggregation & grouping
- AI-powered narrative generation
- Evidence-based recommendations
- PDF compilation (Typst/LaTeX)
System Architecture & Data Flow
Complete Visual Overview
Bubble.io Frontend
User Interface
Uploads PDF
โ
Firebase Auth
Authentication
JWT + RBAC
LiteLLM Keys
โ
API Backend
FastAPI + PostgreSQL
stream:document_routing
โ
๐ฏ Orchestrator Service (Central Brain)
Responsibilities:
โข Monitors ALL Redis queue lengths
โข Tracks LLM rate limits (per user/global)
โข Dispatches jobs intelligently
โข Sends real-time updates to Bubble
โข Error notifications (parsing failures, etc.)
โข Auto-retry with backoff
โ
Router Service
MinerU Document Analysis
stream:router_queue
โ
๐งช Biomarker Parser Services (3 queues)
Simple Biomarker
Structured Tables
Open Source Frontier Models
biomarker_simple
Mixed Biomarker
Hybrid Formats
Open Source Frontier Models
biomarker_mixed
Complex Biomarker
Dense Lab Reports
Commercial Models
biomarker_complex
+
๐ฅ Clinical Data Parser Services (3 queues)
Diagnosis Parser
Canonicalization
Commercial Models
diagnosis_queue
Procedure Parser
CPT Codes
Commercial Models
procedure_queue
Genetics Parser
Variants, Genes
Commercial Models
genetics_queue
+
๐ฌ Medical Imaging Parser Service (1 queue)
Medical Imaging Parser
MRI, CT, X-Ray Analysis
Gemini 2.5 Pro / Claude 4.5 Sonnet / GPT-5.1
+ NVIDIA MONAI
imaging_queue
โ
โ๏ธ Post-Processing Pipeline
Grouping Service
Biomarker Grouping
group_medical_records
Unit Standardization
SI Conversion
Reference Range
Validation
Data Validation
Quality Checks
Enrichment
Canonicalization
โ
Biomarker Charting Service
Trends, Timelines, Visualization
100% Accuracy
โ
CHR Generation Services
Functional | Langroid | Sequential
Zero Hallucinations
stream:chr_generation
โ
Redis Streams
Message Queue
Langfuse
LLM Observability
โ
Bubble Webhooks
Real-time Updates
Processing Complete
Errors
Progress
Key Architecture Highlights - Launch Configuration:
- Orchestrator โ Central brain monitoring 3 parser queues, rate limits, and sending Bubble notifications (Built by Arun)
- 3 Parser Services for Launch:
- ๐งช Simple Biomarker Parser - NEW service (open source frontier models)
- ๐งช Mixed Biomarker Parser - NEW service (open source frontier models)
- ๐ฅ Complex Parser - REFACTORED monolith (handles complex biomarkers + diagnosis + procedures + genetics)
- Post-Launch: Extract diagnosis/procedure/genetics into separate services based on user demand
- Grouping Service โ Dedicated Celery worker for biomarker canonicalization and grouping (group_medical_records queue)
- Charting Service โ Dedicated microservice for 100% accurate biomarker visualization (Built by API Squad)
- Billing Service โ Stripe + LiteLLM virtual key management (Built by Arun)
- 4-way Router โ Simple biomarkers โ queue 1, Mixed โ queue 2, Complex/Clinical โ queue 3, Invalid โ reject
- Each service has its own Redis queue โ Independent scaling, fault isolation
- Firebase Auth โ Provides per-user LiteLLM API keys for cost tracking
- Real-time Bubble updates โ Orchestrator sends progress, errors, completion webhooks
Priority I: Data Processing
CRITICAL
Accuracy: 100%
1. Unit Standardization Service
Implement comprehensive SI unit conversion system for all biomarkers
- Create unit conversion library (mg/dL โ mmol/L, etc.)
- Build biomarker-specific conversion rules database
- Integrate with enrichment-biomarkers service
- Add validation for converted values
2 weeks
enrichment-biomarkers
2. Reference Range Checker
Build intelligent reference range validation with age/gender awareness
- Create reference range database (age/gender-specific)
- Implement validation logic in parser-sequential
- Add out_of_range flag automation
- Build alerting for critical values
1.5 weeks
parser-sequential
3. Reference Range Setter
Auto-populate missing reference ranges using LLM + medical knowledge
- Integrate medical reference database
- Build LLM-based range prediction
- Add confidence scoring
- Create manual override UI hooks
1.5 weeks
enrichment-biomarkers
4. Data Validation Pipeline
Comprehensive validation before data storage
- Schema validation (Pydantic models)
- Business rule validation
- Data consistency checks
- Duplicate detection
1 week
api-backend
5. Summary Creation Service
Auto-generate processing summaries for user feedback
- Build summary aggregation logic
- Create statistics calculation (# biomarkers, diagnoses, etc.)
- Generate quality report (coverage, errors)
- Integrate with UI notification system
1 week
api-backend
6. UI Validation Hooks
API endpoints for real-time validation feedback to Bubble frontend
- Create validation webhook endpoints
- Build error message formatting
- Add progress tracking endpoints
- Implement retry mechanisms
1 week
api-backend
Priority II: Quality Validation & Testing
CRITICAL
Continuous Testing
14. Ground Truth Dataset Creation
Build comprehensive ground truth dataset for all parsers
- Curate 100+ diverse medical records
- Manually annotate expected outputs
- Create test fixtures and snapshots
- Version control test data
2 weeks
integration-tests
15. Routing Test Suite
Comprehensive tests for Router service classification
- Test simple vs complex classification
- Test Pro model routing decisions
- Validate cost optimization metrics
- Add performance benchmarks
1 week
parser-router
16. Parser Type-Specific Tests
Dedicated test suites for Simple/Mixed/Image parsers
- Simple parser: structured table tests
- Mixed parser: hybrid format tests
- Image parser: OCR accuracy tests
- Cross-parser consistency tests
2 weeks
integration-tests
17. Diagnosis/Procedure Tests
Validate diagnosis and procedure extraction accuracy
- Test diagnosis canonicalization accuracy
- Test CPT code extraction
- Validate timeline extraction
- Test edge cases (multiple diagnoses, etc.)
1.5 weeks
integration-tests
18. Grouping Test Suite
Test biomarker canonicalization and grouping logic
- Test canonical name matching
- Validate LLM-based grouping decisions
- Test edge cases (similar but different biomarkers)
- Measure grouping accuracy against ground truth
1.5 weeks
enrichment-biomarkers
19. Post-Processing Test Suite
Validate all post-processing transformations
- Test unit conversions (accuracy to 0.01%)
- Test reference range validation
- Test summary generation
- Validate data integrity after processing
1.5 weeks
integration-tests
20. Real-Time Validation Framework
Monitor and validate production data quality in real-time
- Build validation middleware
- Create alerting for anomalies
- Add data quality dashboards
- Integrate with Datadog/CloudWatch
2 weeks
api-backend
21. E2E Test Enhancement
Expand Playwright E2E tests for full workflow coverage
- Add CHR generation E2E tests
- Test error recovery flows
- Add performance tests
- Enhance AI failure analysis
1.5 weeks
e2e-tests
Priority III: CHR Enhancements
HIGH
22. JSON Base Structure Decoupling
Decouple report generation from output format (JSON-first approach)
- Design canonical JSON schema for CHR data
- Refactor all workflows to output JSON first
- Build format converters (JSON โ PDF, DOCX, HTML)
- Enable API access to structured report data
2 weeks
workflow-functional, workflow-generative-*
23. Hallucination Critique Agent
Build LLM critique agent to detect and flag hallucinations
- Design critique prompt templates
- Build fact-checking against source data
- Add confidence scoring
- Create review queue for low-confidence sections
2 weeks
workflow-functional
24. Timeline Critique Agent
Validate temporal accuracy in generated reports
- Build timeline extraction from source data
- Compare generated timelines to source
- Detect temporal inconsistencies
- Flag chronological errors
1.5 weeks
workflow-functional
25. Additional Critique Agents
Build additional quality assurance agents (TBD based on findings)
- Medical terminology consistency checker
- Reference citation validator
- Recommendation appropriateness checker
- Readability and clarity analyzer
2 weeks
workflow-functional
26. Functional Workflow Evaluation
Comprehensive testing for LangGraph functional workflow
- Test stateful generation correctness
- Validate biomarker analysis accuracy
- Test narrative coherence
- Measure generation quality scores
1.5 weeks
workflow-functional
27. Sequential Workflow Evaluation
Test sequential CHR generation quality
- Test outline generation quality
- Validate disease/genetics consolidation
- Test LaTeX compilation reliability
- Measure against ground truth reports
1 week
workflow-generative-sequential
28. Langroid Workflow Evaluation
Validate advanced Langroid CHR generation
- Test chapter planning accuracy
- Validate medical literature integration
- Test evidence-based recommendations
- Measure expert review scores
1.5 weeks
workflow-generative-langroid
29. Routing/Extraction Test Suite
Test CHR data routing and extraction
- Test data aggregation from multiple sources
- Validate record joining logic
- Test CSV vs API data loading
- Ensure data completeness
1 week
workflow-functional
30. Grouping Test Suite (Biomarkers Only)
Test biomarker grouping for CHR generation
- Test LLM-based smart grouping
- Validate health area classification
- Test temporal grouping (same biomarker over time)
- Ensure narrative coherence with groups
1 week
workflow-functional
31. Validation Error Storage
Store and track validation errors for continuous improvement
- Design error storage schema
- Build error categorization system
- Create error analytics dashboard
- Enable error-driven retraining
1 week
api-backend
Priority IV: DevOps & Infrastructure
CRITICAL
EKS Migration
32. EKS Cluster Setup
Deploy production-ready EKS clusters (dev, staging, prod)
- Deploy EKS clusters using Terraform
- Configure node groups (spot + on-demand)
- Setup VPC, subnets, security groups
- Install cluster addons (CSI drivers, metrics server)
1.5 weeks
n1-infrastructure
33. Helm Chart Migration
Deploy all services to EKS using Helmfile
- Test all 14 Helm charts in dev environment
- Configure AWS-specific resources (IRSA for service accounts)
- Setup Cloudflare Tunnel ingress configuration
- Deploy to staging for validation
- Production rollout with blue-green strategy
2 weeks
n1-helm-charts
34. Secrets Management (AWS Secrets Manager)
Configure External Secrets Operator for AWS
- Install External Secrets Operator
- Migrate secrets to AWS Secrets Manager
- Configure SecretStore and ExternalSecret resources
- Test secret rotation
1 week
n1-infrastructure
35. Redis Streams Setup
Deploy Redis with persistence for message queuing
- Deploy Redis StatefulSet with PVC
- Configure AOF persistence
- Setup Redis Sentinel for HA (optional)
- Migrate Celery to Redis Streams
1 week
n1-helm-charts
36. Process Orchestrator Service (Central Brain) - ARUN
Build intelligent orchestrator to manage entire workflow
- Monitor ALL Redis queue lengths (3 parser queues: simple, mixed, complex/clinical)
- Track LLM rate limits (per-user and global via LiteLLM)
- Intelligent job dispatching with priority and backpressure
- Real-time Bubble webhook notifications (progress, errors, completion)
- Error handling: parsing failures, timeout detection, auto-retry with backoff
- Workflow state machine (Preparing โ Routing โ Parsing โ Processing โ Enriching โ CHR โ Complete)
- Dead letter queue (DLQ) management for failed jobs
- Metrics dashboard (queue depths, processing times, error rates)
- Owner: Arun (Week 2)
1 week
New: orchestrator-service (Arun)
37. RDS PostgreSQL Migration
Migrate from Cloud SQL to AWS RDS
- Deploy RDS PostgreSQL with Terraform
- Setup read replicas for HA
- Migrate data from Cloud SQL (DMS or pg_dump)
- Update connection strings in all services
1.5 weeks
n1-infrastructure
38. S3 Storage Migration
Migrate from GCS to S3 for object storage
- Create S3 buckets with lifecycle policies
- Migrate existing PDFs/reports from GCS
- Update storage clients in all services
- Configure CloudFront for CDN (optional)
1 week
All services
39. Continuous Deployment Pipeline (GitHub Actions + ArgoCD)
Automated, frequent deployments to production with GitOps
- CI Pipeline (GitHub Actions):
- Build Docker images on every commit
- Push to ECR (Elastic Container Registry)
- Run automated tests and security scans
- Update Helm chart values with new image tags
- CD Pipeline (ArgoCD - explore):
- Evaluate ArgoCD for GitOps-based deployments
- Auto-sync from Git repo to EKS clusters
- Visual deployment status and health checks
- Automatic rollback on failures
- Canary deployments for zero-downtime releases
- Automated smoke tests before production promotion
- Ship to production multiple times per day
2 weeks
All repos
40. Monitoring & Observability
Comprehensive monitoring for EKS workloads with Datadog
- Deploy Datadog agent across all EKS nodes
- Configure APM (Application Performance Monitoring)
- Setup Langfuse for LLM observability
- Create dashboards for queues, parsers, and CHR workflows
- Build alerting (PagerDuty/Slack integration)
- Setup log aggregation and analysis
1.5 weeks
n1-infrastructure
41. Auto-Scaling Configuration
HPA and Cluster Autoscaler setup
- Configure Horizontal Pod Autoscaler (HPA)
- Setup Cluster Autoscaler for node scaling
- Define resource requests/limits
- Test scaling under load
1 week
n1-helm-charts
42. Disaster Recovery & Backups
Setup backup and DR procedures
- Configure RDS automated backups
- Setup S3 versioning and replication
- Document disaster recovery procedures
- Test recovery scenarios
1 week
n1-infrastructure
Priority V: API & Backend
HIGH
43. Firebase Authentication Service - ARUN
Firebase-based authentication/authorization microservice
- Integrate Firebase Authentication for user auth (JWT tokens)
- Build user permissions management system (RBAC)
- Implement per-user LiteLLM API key provisioning
- Add machine-to-machine authentication for Bubble.io
- Create middleware for FastAPI token validation
- Build admin API for user/permission management
- Owner: Arun (Week 3)
1 week
New: auth-service (Arun)
44. Bubble Integration Middleware
Dedicated service for Bubble.io frontend integration
- Build Bubble-specific API endpoints
- Add request/response transformers
- Implement webhook handlers
- Add rate limiting and validation
1.5 weeks
api-backend
45. Service Integration Layer
Build integration layer for inter-service communication
- Standardize service-to-service API contracts
- Build service discovery mechanism
- Add circuit breakers for resilience
- Implement request tracing (distributed tracing)
2 weeks
All services
46. Rate Limiting & Throttling
Protect APIs from abuse and ensure fair usage
- Implement rate limiting middleware
- Add user-based quotas
- Build throttling for expensive operations
- Add rate limit headers in responses
1 week
api-backend
47. Biomarker Charting Service LAUNCH CRITICAL
Dedicated microservice for perfect biomarker visualization and charting
- Temporal Trend Charts: Line graphs showing biomarker changes over time
- Reference Range Visualization: Normal/abnormal zones with color coding
- Multi-Biomarker Comparison: Side-by-side trend analysis
- Interactive Charts: Zoom, pan, tooltip with exact values
- Export Capabilities: PNG, SVG, PDF for CHR embedding
- Accuracy Validation: 100% accurate data points, correct scales, proper units
- Chart Types: Line charts, bar charts, heatmaps for comprehensive visualization
- Build using Plotly/D3.js or similar charting library
- RESTful API for chart generation requests
- Integrate with CHR generation workflows
2 weeks
New: charting-service
48. Billing System (Stripe + LiteLLM) - ARUN LAUNCH CRITICAL
Billing infrastructure with LiteLLM budgets and component-based cost tracking
- LiteLLM Virtual Keys: Per-user tokens with budget limits based on top-up
- Stripe Top-Up System: One-time payments ($50/$100/$250/$500/$1000)
- Component Cost Tracking: Track and display cost per service (parser, CHR, charting)
- Spend Dashboard: Show line items - e.g., "Parser: $0.50, CHR: $1.20"
- Real-Time Balance: Display current credit balance in user dashboard
- Low Balance Alerts: Email notifications at 20% and 5% remaining
- Budget Sync: Update LiteLLM max_budget when user tops up
- Cost Attribution: Pull spend data from LiteLLM tracking API
- Admin Analytics: Revenue tracking, user spend patterns
- $10 Free Credits: All new users get $10 on signup
- Owner: Arun (Week 3-4)
1 week
New: billing-service (Arun)
๐ณ Billing & Pricing Strategy
LAUNCH CRITICAL
Internal Model: 30% markup on LLM costs (not shown to users). Users top-up their N1 account, we manage budgets via LiteLLM per-user tokens.
Spend Breakdown Dashboard LAUNCH CRITICAL
What Users See: Cost per Component
- Line Items Example:
- Document XYZ_Lab_Results.pdf
- - Parser (Simple): $0.50
- - Post-Processing: $0.10
- - Charting: $0.05
- - CHR Generation (Functional): $1.20
- Total: $1.85
- Cost Breakdown by Service Type (not by model or markup)
- Cost per Document: Show aggregated cost for each processed PDF
- Time-Series Visualization: Daily/weekly/monthly spend trends
- Export CSV: Download detailed billing records
LiteLLM Budget Management (Backend)
Internal: Per-User Virtual Keys
- Virtual Key Creation: Generate unique LiteLLM token per user on signup
- Budget Assignment: Set max_budget based on account top-up amount
- Cost Tracking: LiteLLM tracks spend per virtual key in real-time
- Auto-Cutoff: When budget exhausted, API calls blocked until top-up
- Markup Application: 30% markup applied internally (users see final component cost)
Account Top-Up System
Prepaid Credits via Stripe
- Top-Up Amounts: $50, $100, $250, $500, $1000 (or custom)
- Stripe Integration: One-time payments via Stripe Checkout
- Credit Balance: Display current balance in user dashboard
- Budget Sync: Update LiteLLM virtual key budget on top-up
- Low Balance Alerts: Email notifications at 20% and 5% remaining
- Auto Top-Up: Optional recurring charges when balance hits threshold
Billing Implementation Timeline:
- Week 3-4: Arun & Jasper build Stripe integration + LiteLLM virtual key system
- Week 4: Spend breakdown dashboard with component-based line items
- Launch Day: All users get $10 free credits, top-up system operational
โ ๏ธ Critical Billing Features for Launch:
- LiteLLM per-user virtual keys with budget limits
- Stripe top-up integration ($50/$100/$250/$500/$1000)
- Spend dashboard showing cost per component (parser, CHR, charting, etc.)
- Low balance alerts (email at 20% and 5% remaining)
- $10 free credits for all new users on signup
๐ Launch Prioritization: What to Build Now vs Later
FOCUS ON LAUNCH
๐ฏ Launch Strategy: Build platform with 95% accurate biomarker extraction as the #1 priority. Core biomarker parsers (simple, mixed, complex) + validation + stable EKS infrastructure. Ship to production by January 1st, 2026 with high accuracy and <10% CHR hallucinations, then add advanced parsers.
๐๏ธ Infrastructure (Must Have - 9 tasks)
Can't launch without these:
- โ
Task 32: EKS Cluster Setup
- โ
Task 33: Helm Chart Migration
- โ
Task 34: Secrets Management
- โ
Task 35: Redis Streams Setup
- โ
Task 36: Orchestrator Service
- โ
Task 37: RDS PostgreSQL Migration
- โ
Task 38: S3 Storage Migration
- โ
Task 39: CI/CD Pipeline
- โ
Task 40: Monitoring & Observability
๐งช Launch Testing (Must Have - 5 tasks)
๐ฏ Validate โฅ95% accuracy before launch:
- โ
Task 14: Ground Truth Dataset (100+ verified biomarker cases)
- โ
Task 15: Routing Test Suite (ensure correct parser selection)
- โ
Task 16: Biomarker Parser Tests (validate extraction accuracy โฅ95%)
- โ
Task 18: Grouping Test Suite (verify canonicalization)
- โ
Task 19: Post-Processing Test Suite (unit conversion to 0.01%)
โ Success Criteria: โฅ95% accuracy against ground truth dataset with <10% hallucinations.
๐ API & Integration (Must Have - 3 tasks)
Connect Bubble frontend:
- โ
Task 43: Firebase Auth Service
- โ
Task 44: Bubble Integration Middleware
- โ
Task 48: Billing System - Arun & Jasper
๐ CHR Generation (Must Have - 2 tasks)
๐ฏ Launch-critical: Accurate reports with minimal hallucinations:
- โ
Task 47: Charting Service - 100% accurate biomarker visualization (see above)
- โ
Use existing Functional/Langroid/Sequential workflow
- โ
<10% hallucinations: Minimize unsupported claims
- โ
Fact-checking layer: Validate claims against source biomarkers
- โ
Accurate temporal tracking of biomarker changes
- โ
Proper citation of data sources in reports
- โ
Focus on making one workflow production-stable
โ ๏ธ Target: โฅ95% extraction accuracy + 100% charting accuracy + <10% hallucinations. Statements should be traceable to source data.
๐ฏ Launch Success Criteria:
- โฅ95% biomarker extraction accuracy validated against ground truth dataset
- 100% biomarker charting accuracy - trends, timelines, and reference ranges
- <10% CHR hallucinations - most statements traceable to source data
- All 3 biomarker parsers (simple, mixed, complex) production-ready
- Unit standardization accurate to 0.01%
- <5% parsing failures on valid medical documents
- 0 critical bugs at launch - all critical issues resolved
- Stable EKS infrastructure with monitoring
- Stripe billing system operational
- Timeline: January 1st, 2026 - 5 weeks of focused execution
โ๏ธ Advanced Processing (Post-Launch - 2 tasks)
Nice-to-have optimizations:
- โณ Task 3: Reference Range Setter (auto-populate)
- โณ Task 5: Summary Creation Service
๐ค CHR Enhancements (Post-Launch - 10 tasks)
Quality improvements after launch:
- โณ Task 22: JSON Base Structure Decoupling
- โณ Task 23: Hallucination Critique Agent
- โณ Task 24: Timeline Critique Agent
- โณ Task 25: Additional Critique Agents
- โณ Tasks 26-31: CHR Workflow Evaluations
๐ Advanced Testing (Post-Launch - 3 tasks)
Incremental quality improvements:
- โณ Task 17: Diagnosis/Procedure Tests
- โณ Task 20: Real-Time Validation Framework
- โณ Task 21: E2E Test Enhancement
๐ก๏ธ Infrastructure Optimization (Post-Launch - 2 tasks)
Scale and resilience:
- โณ Task 41: Auto-Scaling Configuration
- โณ Task 42: Disaster Recovery & Backups
๐ API Enhancements (Post-Launch - 2 tasks)
Advanced features:
- โณ Task 45: Service Integration Layer
- โณ Task 46: Rate Limiting & Throttling
Post-Launch Strategy: 22 enhancement tasks that add advanced parsers (diagnosis, genetics, imaging), quality critique agents, and optimization features. Build incrementally based on user feedback.
5-Week Launch Timeline (Focused)
Week 1: Infrastructure Foundation & Parser Split Begins
- DevOps: EKS cluster setup, Helm charts, Secrets (Tasks 32-34)
- DevOps: Redis setup with 3 queues (simple, mixed, complex/clinical)
- Data Extraction: Begin parser split - extract Simple & Mixed parsers (Task 8)
- Quality Validation: Ground truth dataset creation - 100+ biomarker cases (Task 14)
- API: Begin Charting Service development (Task 47) - LAUNCH CRITICAL
- Pre/Post-Processing: Unit standardization service design (Task 1)
- CHR: Validate existing workflow-functional for <10% hallucinations
- Front-End: Design billing UI mockups
Week 2: Orchestrator + Router + Parsers
- Arun: Build Orchestrator Service - queue monitoring, webhooks, rate limits (Task 36)
- Data Extraction: Complete parser split + 4-way router (Tasks 8, 9)
- Data Extraction: Helm charts for 3 parser services
- API: Complete Charting Service (Task 47) - 100% visualization accuracy
- Pre/Post-Processing: Unit standardization + reference range checker (Tasks 1-2)
- Quality Validation: Routing test suite (Task 15)
- DevOps: Deploy parsers + orchestrator to dev EKS
- CHR: Integrate charting service with workflow-functional
- Front-End: Begin Bubble webhook integration
Week 3: Authentication & Billing Begin
- Arun: Firebase Authentication + Begin Billing System (Tasks 43, 48)
- Data Extraction: Biomarker parser optimization + "Nothing Extracted" fix (Tasks 13, 7)
- Pre/Post-Processing: UI validation hooks + data validation pipeline (Tasks 6, 4)
- Quality Validation: Biomarker parser tests + enrichment tests (Tasks 16, 18-19)
- API: Bubble integration middleware (Task 44)
- DevOps: RDS migration + S3 setup (Tasks 37-38)
- CHR: CHR quality validation - hallucination testing
- Front-End: Progress tracking UI + validation feedback components
Week 4: Billing + CI/CD + Integration
- Arun: Complete Billing - LiteLLM virtual keys + spend tracking (Task 48)
- API: Service integration layer + rate limiting (Tasks 45-46)
- DevOps: CI/CD pipeline (GitHub Actions โ ECR) + monitoring (Tasks 39-40)
- Data Extraction: Performance optimization + edge case handling
- Pre/Post-Processing: Reference range setter (Task 3)
- Quality Validation: Real-time validation framework (Task 20)
- CHR: Final CHR testing + report quality validation
- Front-End: Billing UI + top-up flow + spend dashboard
Week 5: Bug Fixes, Testing & Launch
- ๐ ALL SQUADS: Fix all critical bugs - 0 critical bugs for launch
- Quality Validation: E2E tests + accuracy validation (โฅ95% biomarker extraction)
- Quality Validation: Charting accuracy (100%), CHR hallucinations (<10%)
- Arun: Final billing system testing + LiteLLM budget sync validation
- API: Load testing + API performance optimization
- DevOps: Production deployment + health checks + auto-scaling
- Data Extraction: Final parser validation + error handling
- CHR: Production CHR generation testing
- Front-End: User acceptance testing + UI polish
- All Squads: Launch readiness review + go/no-go decision
- ๐ LAUNCH DAY - January 1st, 2026!
Parallel Execution: All 7 squads work concurrently with clear ownership. Arun builds orchestrator (Week 2) + billing (Weeks 3-4). API squad delivers charting service. Data Extraction splits parsers. Quality Validation ensures 95% accuracy. Week 5 focuses on bug fixes and launch validation.
Launch-First Mindset: Ship biomarker extraction + CHR generation + Stripe billing in 5 weeks. Then iterate with diagnosis, genetics, imaging parsers and quality enhancements based on real user feedback.
โ ๏ธ Production Bug Protocol: All production bugs take absolute priority. When production breaks, all squads stop current work and pivot to fix it immediately. Production must always be stable and bug-free.
Post-Launch Timeline (Weeks 6-10)
Iterative Enhancement Based on User Feedback
Week 6: Advanced Parsers Foundation
- Data Extraction: Begin diagnosis parser development (Task 8 - diagnosis split)
- Data Extraction: Begin procedure parser development (Task 8 - procedure split)
- Pre/Post-Processing: Reference range setter implementation (Task 3)
- CHR: JSON base structure decoupling (Task 22)
- Quality Validation: Diagnosis/Procedure test suite creation (Task 17)
Week 7: Genetics & Imaging Parsers
- Data Extraction: Genetics parser development (Task 8 - genetics split)
- Data Extraction: Medical imaging parser (Task 8 - imaging split with MONAI)
- Data Extraction: Diagnosis parser optimization (Task 11)
- Data Extraction: Procedure parser optimization (Task 12)
- CHR: Hallucination critique agent (Task 23)
Week 8: CHR Quality & Testing
- CHR: Timeline critique agent (Task 24)
- CHR: Additional critique agents (Task 25)
- CHR: Functional workflow evaluation (Task 26)
- CHR: Sequential workflow evaluation (Task 27)
- Quality Validation: Real-time validation framework (Task 20)
- Quality Validation: E2E test enhancement (Task 21)
Week 9: Infrastructure Optimization
- Data Extraction: Token pre-calculation service (Task 10)
- Pre/Post-Processing: Summary creation service (Task 5)
- API: Service integration layer (Task 45)
- API: Rate limiting & throttling (Task 46)
- DevOps: Auto-scaling configuration (Task 41)
- DevOps: Disaster recovery & backups (Task 42)
Week 10: Final Polish & Testing
- CHR: Langroid workflow evaluation (Task 28)
- CHR: Routing/extraction test suite (Task 29)
- CHR: Grouping test suite (Task 30)
- CHR: Validation error storage (Task 31)
- All Squads: Performance optimization based on production metrics
- All Squads: User feedback integration & bug fixes
Parallel Execution Continues: All 7 squads continue working in parallel. Data Extraction leads on new parsers, CHR enhances quality, Quality Validation validates everything, and API/DevOps optimize infrastructure.
Iterative Enhancement Philosophy: Post-launch features are built based on real user feedback and production metrics. Squads maintain full flexibility to reprioritize based on what users actually need most.
User-Driven Roadmap: After launch, we let data guide us. If users need genetics more than diagnosis, we pivot. If medical imaging becomes critical, we accelerate it. The post-launch timeline is a starting point, not a rigid plan.
Squad Structure & Resource Allocation
7 Specialized Squads for Parallel Execution
Squad 2: Pre/Post-Processing
Focus: Data transformation, validation, enrichment, standardization
Tasks: 1, 2, 3, 4, 5, 6, 7 (7 tasks)
Services: enrichment-biomarkers, validation-service
Key Responsibilities:
- Unit standardization (SI conversion)
- Reference range checking & setting
- Data validation pipeline
- Summary creation
- UI validation hooks
Squad 3: Quality Validation
Focus: Test suite development, ground truth validation, quality assurance
Tasks: 14, 15, 16, 17, 18, 19, 20, 21 (8 tasks)
Services: integration-tests, e2e-tests
Key Responsibilities:
- Ground truth dataset curation
- Parser-specific test suites
- Real-time validation framework
- E2E test enhancement
Squad 4: API
Focus: API development, authentication, billing, charting service
Tasks: 43, 44, 45, 46, 47, 48 (6 tasks)
Services: api-backend, auth-service (Firebase), charting-service, billing-service
Key Responsibilities:
- Charting Service (Task 47 - LAUNCH CRITICAL): Biomarker visualization, trends, timelines
- Firebase authentication & RBAC
- Bubble integration middleware
- Stripe billing system (Task 48)
- Service integration layer
- Rate limiting & throttling
Squad 5: CHR
Focus: Report generation, critique agents, workflow evaluations
Tasks: 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 (10 tasks - Post-Launch)
Services: workflow-functional, workflow-generative-langroid, workflow-generative-sequential
Key Responsibilities:
- JSON base structure decoupling
- Hallucination critique agents
- Timeline validation
- Workflow evaluations
- <10% hallucination target
Squad 6: Front-End
Focus: User interface, UX design, frontend integration
Responsibilities:
- Bubble.io frontend development
- Webhook integration for real-time updates
- Validation feedback UI
- Progress tracking components
- Error display and recovery flows
- Data quality dashboards
- User testing and feedback collection
Squad 7: DevOps
Focus: Infrastructure, deployment, monitoring, orchestration
Tasks: 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42 (11 tasks)
Services: n1-infrastructure, n1-helm-charts, orchestrator-service
Key Responsibilities:
- EKS cluster setup & migration
- Helm chart deployment
- CI/CD pipeline (GitHub Actions + ArgoCD)
- Orchestrator Service (Task 36): Queue monitoring, rate limits, webhooks
- Monitoring & observability (Datadog)
- Auto-scaling & disaster recovery
Squad Allocation Strategy: 7 specialized squads work in parallel with clear ownership. API squad handles charting service (launch-critical), DevOps builds orchestrator, Quality Validation ensures 95% accuracy, and all squads collaborate for January 1st, 2026 launch.
Key Risks & Mitigation Strategies
Risk: EKS Migration Complexity
Impact: Potential downtime during migration
Mitigation:
- Parallel run GKE + EKS during migration
- Blue-green deployment strategy
- Comprehensive rollback procedures
- Gradual traffic shifting (10% โ 50% โ 100%)
Risk: Data Accuracy Regression
Impact: New parser changes may reduce accuracy
Mitigation:
- Comprehensive test suite with ground truth (Task 14)
- A/B testing new parsers vs existing
- Real-time validation monitoring (Task 20)
- Automated regression testing in CI
Risk: LLM Cost Escalation
Impact: Increased usage may spike costs
Mitigation:
- Token pre-calculation service (Task 10)
- Smart routing to minimize Pro model usage (40-70% reduction already achieved)
- Caching for repeated queries
- Cost monitoring dashboards
Risk: Timeline Slippage
Impact: 8-week timeline may be aggressive
Mitigation:
- Prioritize critical path items (EKS, parsers, testing)
- Weekly sprint reviews and adjustments
- Buffer time for unexpected issues (add 2 weeks)
- Defer non-critical items (API versioning, documentation)
Risk: Squad Coordination Overhead
Impact: 6 squads need tight coordination
Mitigation:
- Daily standups (async via Slack)
- Clear API contracts between services
- Shared documentation (Notion/Confluence)
- Weekly architecture sync meetings
Risk: CHR Hallucinations
Impact: Generated reports may contain inaccuracies
Mitigation:
- Critique agents for validation (Tasks 23-25)
- Human-in-the-loop review for critical sections
- Fact-checking against source data
- Medical expert review process
Success Metrics & KPIs
โฅ95%
Biomarker Extraction Accuracy
100%
Biomarker Charting Accuracy
<0.01%
Unit Conversion Error Rate
<5%
Parsing Failure Rate (valid docs)
โ ๏ธ Non-Negotiable Launch Requirements:
- โฅ95% biomarker extraction accuracy (validated against ground truth dataset)
- 100% charting accuracy: trends, timelines, and reference ranges must be flawless
- <5% parsing failures on valid medical documents
- Unit conversion accuracy to 0.01% or better
100%
Fact Traceability to Source
<10min
Report Generation Time
โ ๏ธ CHR Launch Requirements:
- <10% hallucinations - most statements backed by source biomarker data
- High fact traceability - cite sources for key claims
- Accurate biomarker charts embedded in reports
- 95% temporal tracking and trend analysis accuracy
<500ms
API Response Time (P95)
-30%
LLM Cost Reduction (via routing)
100+
Ground Truth Test Cases
100%
Critical Path Test Coverage
Immediate Next Steps (Week 1)
Monday: Kick-off meeting with all squads. Review roadmap and assign tasks.
Monday: Setup CI/CD pipeline for continuous deployment to production
Monday-Tuesday: Setup AWS accounts, EKS cluster provisioning (Squad 5)
Tuesday: Begin ground truth dataset curation (Squad 3)
Wednesday: Start unit standardization service development (Squad 2)
Wednesday: Begin simple parser extraction (Squad 1)
Thursday: Helm charts testing in dev EKS environment (Squad 5)
Friday: Week 1 sprint review + first production deployment celebration!
โก Deployment Cadence: Starting Week 1, we deploy to production daily. Every completed feature goes live immediately. No waiting for "release windows" - we ship continuously!
๐ฅ Production-First Mindset: Production stability is non-negotiable. Any production bug immediately becomes Priority 0 - all hands on deck until it's resolved. We maintain a bug-free production environment at all costs.
Questions to Address Before Starting
Architecture Decisions
- Adopt n1-k8s Redis Streams architecture or continue with Celery?
- Single EKS cluster or multiple (dev/staging/prod)?
- Keep GKE as fallback or full migration to EKS?
- ArgoCD for GitOps CD or stick with GitHub Actions-only pipeline?
CHR Strategy
- Which CHR workflow to prioritize (Functional/Langroid/Sequential)?
- Typst vs LaTeX for PDF generation going forward?
- JSON-first output for all workflows?
Quality Standards
- Define acceptable accuracy thresholds for each data type
- Manual review process for low-confidence extractions?
- Who validates ground truth datasets?
Resource Allocation
- Can we allocate 12-15 engineers for 8 weeks?
- Budget for AWS infrastructure costs?
- LLM API budget for increased usage?
Ready to Start: Current codebase is well-structured, microservices are defined, Helm charts are ready, and Terraform infrastructure exists. We're in a strong position to execute this roadmap.
Executive Summary
Launch-First Strategy: Ship biomarker extraction + CHR generation + Stripe billing on AWS EKS by January 1st, 2026. Then iterate with advanced parsers (diagnosis, genetics, imaging) and quality enhancements based on user feedback.
Phase 1: Launch (Jan 1, 2026) - โฅ95% Extraction Accuracy + 100% Charting + <10% Hallucinations
Core Biomarker Processing
- โ
โฅ95% accurate extraction - 3 biomarker parsers
- โ
100% charting accuracy - trends, timelines, ranges
- โ
Unit standardization & validation (0.01% accuracy)
- โ
Reference range checking (age/gender-specific)
- โ
Intelligent routing & orchestration
- โ
UI validation hooks for Bubble
- โ
Stripe billing integration
Infrastructure & Testing
- โ
AWS EKS with Helm deployment
- โ
CI/CD + ArgoCD pipeline
- โ
Firebase authentication
- โ
Ground truth validation (โฅ95% target)
- โ
<10% CHR hallucinations
- โ
<5% parsing failures
- โ
Comprehensive test coverage
- โ
Datadog monitoring
Phase 2: Post-Launch Enhancements
Advanced Parsers
- โณ Diagnosis canonicalization
- โณ Procedure extraction (CPT codes)
- โณ Genetics parser
- โณ Medical imaging (MRI, CT, X-Ray)
Quality Enhancements
- โณ Hallucination critique agents
- โณ Timeline validation
- โณ JSON-first CHR architecture
- โณ Auto-scaling & disaster recovery
The Numbers
Strong Foundation: Already running on GKE with Helm, EKS Helmfile charts ready, comprehensive E2E tests in place, and 40-70% LLM cost reduction achieved through intelligent routing. We launch from strength, not from zero.
Let's build the future of personalized healthcare together.