Skip to main content

Availability & Resilience

This page describes how Breeze remains available under normal load, recovers from component failures, and responds to incidents.

High availability

ComponentResilience
MongoDB Atlas3-node replica set with automatic failover. Loss of any single node is transparent to the application.
Azure App Service (backend)Containerized, restartable. Auto-scale is configured to add and remove instances based on load.
Azure Blob StorageAzure-managed redundancy within Norway East.
Azure Cache for RedisAzure-managed cache used for token validation and rate-limit enforcement. Treated as a critical-path dependency; cache-tier resilience is provided by the managed service.
Vercel (frontend)Multi-region CDN; static assets served from edge.

There are no single points of failure within the database or storage tiers. The backend application can be redeployed without data loss.

Backups and point-in-time recovery

The MongoDB Atlas cluster is configured with continuous backup and point-in-time restore. Backups are:

  • Encrypted at rest with the same key material as the primary cluster.
  • Stored in the same region (Norway) as the primary, in line with the data-residency commitment.
  • Restorable to any point within the configured retention window (per the Atlas backup configuration on the cluster).

Restoration is performed by the Sotera operations team in response to a verified incident or a customer support request requiring data recovery.

Monitoring

The following monitoring is in place:

  • MongoDB Atlas alerts on cluster events: failover, scaling actions, replica-set state changes, slow queries, storage thresholds, and replication lag.
  • Azure Monitor on the App Service: CPU, memory, request latency, HTTP error rates, and platform health.
  • Sentry on the application: error spikes, regression in key flows, unhandled exceptions.
  • Uptime monitoring of public endpoints.

Alerts are routed to the engineering on-call rotation.

Incident response

Sotera operates an internal incident-response process that covers detection, triage, containment, eradication, recovery, and post-incident review. Incidents that may have affected customer data trigger a notification to the responsible customer or partner contact in line with the applicable Data Processing Agreement and, where required, GDPR Article 33 obligations.

For details on a specific incident commitment or breach-notification timeframe, refer to your DPA with Sotera or with your partner.

Change management

Production changes go through:

  1. Code review by at least one engineer other than the author.
  2. Automated checks on every pull request (TypeScript type checking, build validation).
  3. Deployment via CI/CD — manual ad-hoc deploys are not part of the normal flow.
  4. Staged rollout through a pre-production environment before reaching production.