Status Page Evidence

Evidence of Cloudflare-based status page implementation and availability monitoring for Maelstrom AI services

Public

Status: pre-launch. This evidence reflects implemented code and deployed infrastructure. Provii is not yet serving end-user production traffic, so production operational metrics and audit history are not yet available.

Status Page Evidence

Author: Maelstrom AI Date: 2025-11-08 Control Covered: UC-170 (Incident Communication Plan) Gap Closed: GAP-M001 (Status Page for Service Transparency)


Executive Summary

Maelstrom AI operates a real time status monitoring page at status.provii.app that provides transparent service health information to customers, partners, and internal teams. The status worker monitors all critical Provii services using direct worker-to-worker communication (service bindings) and provides both a web-based dashboard and a programmatic API endpoint.

Key Features:

  • Real-time health checks for 4 production and sandbox services
  • Auto-refresh every 60 seconds (web UI)
  • Response time monitoring and HTTP status code display
  • Modern, accessible UI with colour-coded status indicators
  • Public API endpoint (/api/status) for programmatic access
  • Rate limiting and caching to protect upstream services
  • Cost. $0/month (Cloudflare Workers free tier)

Implementation Status: ✅ DEPLOYED (production since deployment of provii-status)


Gap Analysis Reference

Gap ID: GAP-M001 Original Status: Not Implemented Updated Status: ✅ IMPLEMENTED

Gap Requirements vs. Actual Implementation

RequirementGap SpecificationActual ImplementationStatus
Status PageBasic status indicatorsReal-time health checks with auto-refreshEXCEEDS
Service MonitoringManual updatesAutomated 60-second health checksEXCEEDS
Cost$500/year (StatusPage.io)$0/month (free tier)EXCEEDS
API AccessRecommendedPublic /api/status endpointIMPLEMENTED
Uptime TrackingHistorical metrics desiredNot yet implemented (future enhancement)🔄 PLANNED
Incident TimelinePost-mortem capabilityNot yet implemented (future enhancement)🔄 PLANNED

Summary: The implemented solution exceeds the original gap requirements in functionality while delivering it at zero cost versus the budgeted $500/year.


Deployment Information

Production Details

URL: https://status.provii.app Platform: Cloudflare Workers Deployment Method: Automated via wrangler deploy Configuration: provii-status/wrangler.toml Documentation: provii-status/README.md

Route Configuration:

[env.production]
name = "production-status"
workers_dev = false
routes = [
  { pattern = "status.provii.app/*", zone_id = "<REDACTED>" }
]

Monitored Services

The status page monitors 4 critical Provii services:

  1. Production Verify (verify.provii.app)
  • Endpoint: /v1/health
  • Service: Verifier API (production)
  • Criticality: HIGH - Age verification for all relying parties
  1. Production Issuer (issuer.provii.app)
  • Endpoint: /health
  • Service: Issuer API (production)
  • Criticality: HIGH - Credential issuance for end users
  1. Sandbox Verify (sandbox-verify.provii.app)
  • Endpoint: /v1/health
  • Service: Verifier API (sandbox/testing)
  • Criticality: MEDIUM - Testing environment for integrators
  1. Sandbox Issuer (sandbox-issuer.provii.app)
  • Endpoint: /health
  • Service: Issuer API (sandbox/testing)
  • Criticality: MEDIUM - Testing environment for credential issuance

Health Check Method: Worker-to-worker service bindings (direct internal communication, no external HTTP calls)


Features and Capabilities

Web Dashboard (/)

User Interface:

  • Clean, modern dark theme with gradient branding
  • Colour-coded status indicators:
  • Green (pulsing): Healthy service
  • Red (pulsing): Unhealthy service
  • Orange (pulsing): Error or timeout
  • Purple. Not configured
  • Gray. Checking
  • Real-time metrics display:
  • HTTP status code
  • Response time (milliseconds)
  • Last check timestamp
  • Error messages (if applicable)

Auto-Refresh:

  • Frequency: Every 60 seconds
  • Browser-based automatic updates
  • Last updated timestamp displayed

Accessibility:

  • Responsive design (mobile-friendly)
  • Semantic HTML
  • Clear visual indicators
  • Screen-reader compatible

API Endpoint (/api/status)

Method: GET /api/status Response Format: JSON array of service status objects CORS: Enabled (public access) Cache: 60-second cache with stale-while-revalidate

Example Response:

[
  {
    "name": "Production Verify",
    "binding": "PRODUCTION_VERIFY",
    "status": "healthy",
    "statusCode": 200,
    "responseTime": 145,
    "timestamp": "2025-11-08T10:30:00.000Z"
  },
  {
    "name": "Production Issuer",
    "binding": "PRODUCTION_ISSUER",
    "status": "healthy",
    "statusCode": 200,
    "responseTime": 89,
    "timestamp": "2025-11-08T10:30:00.000Z"
  }
]

Status Values:

  • healthy: Service responding with 2xx status
  • unhealthy: Service responding with non-2xx status
  • error: Service not responding or throwing errors
  • timeout: Health check timed out
  • deleted: Worker has been deleted
  • not_configured: Service binding not configured
  • checking: Health check in progress

Technical Implementation

Architecture

Worker-to-Worker Communication:

  • Uses Cloudflare Service Bindings (not HTTP fetch)
  • Direct internal communication between workers
  • No external network calls required
  • Lower latency, higher reliability

Service Bindings Configuration (wrangler.toml):

[[env.production.services]]
binding = "PRODUCTION_VERIFY"
service = "production-verify"

[[env.production.services]]
binding = "PRODUCTION_ISSUER"
service = "production-issuer"

[[env.production.services]]
binding = "SANDBOX_VERIFY"
service = "sandbox-verify"

[[env.production.services]]
binding = "SANDBOX_ISSUER"
service = "sandbox-issuer"

Performance Optimisations

Caching:

  • 60-second cache TTL (protects upstream services)
  • Stale-while-revalidate strategy (always fast responses)
  • Proactive cache refresh (10 seconds before expiry)

Rate Limiting:

  • 30 requests per minute per client IP
  • In-memory rate bucket tracking
  • Automatic cleanup of expired buckets
  • 429 status with Retry-After header

Concurrent Health Checks:

  • All services checked in parallel (Promise.allSettled)
  • Individual service failures don’t block others
  • Timeout protection per service

Operational Metrics

Availability

Target: Best-effort availability; no contractual SLA applies at the free tier. Cloudflare Workers infrastructure availability data is published by Cloudflare at their status page. Measurement: Cloudflare Workers Logs (shipped to Grafana Loki) Monitoring: Automatic via Cloudflare platform

Performance

Health Check Frequency: Every request triggers fresh check (if cache expired) Cache Duration: 60 seconds Typical Response Time: <100ms (cached), <500ms (uncached) Rate Limit: 30 requests/minute per IP

Cost

Infrastructure Cost: $0/month (Cloudflare Workers free tier) Compute: Included in free tier (100,000 requests/day) Bandwidth: Included in free tier Storage: None required (stateless)

Annual Savings vs. Budget: $500/year (vs. StatusPage.io estimate in gap analysis)


Incident Communication Usage

During Service Outages

Internal Team:

  1. Check status.provii.app for real time health status
  2. Use /api/status endpoint for programmatic monitoring
  3. Confirm which services are affected
  4. Monitor response times and error messages

External Customers:

  1. Direct users to status.provii.app for current status
  2. Reference status page in incident communications
  3. Provide /api/status API for automated monitoring

Communication Templates (from /trust/security/business-continuity.mdx):

  • Include status page link in all incident notifications
  • Reference current status in updates
  • Use API data for automated alerting (future enhancement)

Status Page References in BCP

Evidence Location: /trust/security/business-continuity.mdx

Customer Communication Section (lines 444-448):

  • Primary: Status page (status.provii.app)
  • Backup: Email to registered contacts
  • Social media: X/Twitter @proviiwallet for major outages

Communication Timeline Commitments (lines 324-327):

  • P0/P1 incidents: Status update within 15 minutes
  • P2 incidents: Update within 24 hours if customer-facing
  • Regular updates: Every 30min-1hr during active incidents

Future Enhancements

The status worker README documents planned improvements:

Planned Features (not yet implemented):

  • Historical uptime tracking with Workers KV
  • Incident history timeline
  • Webhook alerts (Slack/Discord integration)
  • Status badges for README files
  • RSS/JSON feed for programmatic access
  • 24h/7d uptime percentages

Current Scope: Real-time status monitoring (no historical data) Enhancement Timeline: To be determined based on business needs


Control Mapping

UC-170: Incident Communication Plan

Requirement: Establish communication procedures for incidents including status pages Implementation: ✅ IMPLEMENTED

Evidence:

  • Status page deployed at status.provii.app
  • Real-time health monitoring for all critical services
  • Public API endpoint for programmatic access
  • Integration with BCP communication plan

Status: Partially Implemented → Enhanced (status page component now complete)

Remaining Gaps (other UC-170 components):

  • Incident history timeline (planned, not required for basic compliance)
  • Automated customer notifications (manual process currently acceptable)

Compliance Impact

Standards Affected

ISO 27001:2022 A.5.12 (Incident Management):

  • ✅ Status page provides transparent incident communication
  • ✅ Real-time status reduces customer inquiries during incidents
  • ✅ API enables automated monitoring and alerting

ISO 27701:2019 (Transparency):

  • ✅ Public visibility into service availability
  • ✅ Proactive communication during outages
  • ✅ Demonstrates commitment to transparency

Best Practices (Service Transparency):

  • ✅ Industry-standard status page implementation
  • ✅ Exceeds basic requirements with real time monitoring
  • ✅ Cost-effective solution (zero ongoing cost)

Gap Closure

GAP-M001: Status Page for Service Transparency Original Priority: MEDIUM Original Timeline: Q2 2025 (April) Actual Completion: Pre-emptively implemented (provii-status deployment) Original Budget: $500/year Actual Cost: $0/month ROI: 100% cost savings + enhanced functionality


Testing and Validation

Functional Testing

Health Check Accuracy:

  • ✅ Correctly identifies healthy services (200 status)
  • ✅ Correctly identifies unhealthy services (non-200 status)
  • ✅ Handles service timeouts gracefully
  • ✅ Detects deleted or misconfigured workers

UI/UX Testing:

  • ✅ Auto-refresh works correctly (60-second interval)
  • ✅ Visual indicators display properly (colour-coded)
  • ✅ Response times and timestamps accurate
  • ✅ Error messages displayed when applicable

API Testing:

  • ✅ JSON response format correct
  • ✅ CORS headers present (public access)
  • ✅ Caching headers correct (60-second TTL)
  • ✅ Rate limiting enforced (30 req/min)

Integration Testing

Service Bindings:

  • ✅ Production Verify binding works
  • ✅ Production Issuer binding works
  • ✅ Sandbox Verify binding works
  • ✅ Sandbox Issuer binding works

BCP Integration:

  • ✅ Status page accessible during simulated outages
  • ✅ Communication templates reference status page
  • ✅ API usable for automated monitoring

Documentation References

Primary Documentation

Technical Implementation:

  • provii-status/README.md
  • provii-status/src/index.js
  • provii-status/wrangler.toml

Compliance Documentation:

  • /trust/security/business-continuity.mdx (lines 444-448)
  • /trust/compliance/requirements/unified-control-matrix.md (UC-170)
  • /trust/security/gap-analysis.md

Related Evidence:

  • /trust/compliance/evidence/business-continuity/bc-dr-evidence.md
  • /trust/security/incident-response.mdx

Conclusion

The status page at status.provii.app addresses the identified requirements of GAP-M001 and exceeds the original gap specification:

Original Gap Requirements:

  • ❌ Not implemented
  • ❌ Planned: StatusPage.io ($500/year)
  • ❌ Basic status indicators
  • ❌ Manual updates

Actual Implementation:

  • Deployed and operational
  • Cloudflare Workers ($0/month)
  • Real-time health checks with auto-refresh
  • Automated monitoring every 60 seconds
  • Public API for programmatic access
  • Modern UI with response time metrics

Gap Status: ✅ CLOSED (November 2025) Control Status: UC-170 Partially Implemented → Enhanced (status page component complete) Compliance Impact: ISO 27001 A.5.12, ISO 27701 transparency requirements ADDRESSED (aligned to; certification being pursued)


Evidence Collection Complete Date: 2025-11-08 Author: Maelstrom AI Status: GAP-M001 CLOSED