Availability & Resilience
This page describes how Breeze remains available under normal load, recovers from component failures, and responds to incidents.
High availability
| Component | Resilience |
|---|---|
| MongoDB Atlas | 3-node replica set with automatic failover. Loss of any single node is transparent to the application. |
| Azure App Service (backend) | Containerized, restartable. Auto-scale is configured to add and remove instances based on load. |
| Azure Blob Storage | Azure-managed redundancy within Norway East. |
| Azure Cache for Redis | Azure-managed cache used for token validation and rate-limit enforcement. Treated as a critical-path dependency; cache-tier resilience is provided by the managed service. |
| Vercel (frontend) | Multi-region CDN; static assets served from edge. |
There are no single points of failure within the database or storage tiers. The backend application can be redeployed without data loss.
Backups and point-in-time recovery
The MongoDB Atlas cluster is configured with continuous backup and point-in-time restore. Backups are:
- Encrypted at rest with the same key material as the primary cluster.
- Stored in the same region (Norway) as the primary, in line with the data-residency commitment.
- Restorable to any point within the configured retention window (per the Atlas backup configuration on the cluster).
Restoration is performed by the Sotera operations team in response to a verified incident or a customer support request requiring data recovery.
Monitoring
The following monitoring is in place:
- MongoDB Atlas alerts on cluster events: failover, scaling actions, replica-set state changes, slow queries, storage thresholds, and replication lag.
- Azure Monitor on the App Service: CPU, memory, request latency, HTTP error rates, and platform health.
- Sentry on the application: error spikes, regression in key flows, unhandled exceptions.
- Uptime monitoring of public endpoints.
Alerts are routed to the engineering on-call rotation.
Incident response
Sotera operates an internal incident-response process that covers detection, triage, containment, eradication, recovery, and post-incident review. Incidents that may have affected customer data trigger a notification to the responsible customer or partner contact in line with the applicable Data Processing Agreement and, where required, GDPR Article 33 obligations.
For details on a specific incident commitment or breach-notification timeframe, refer to your DPA with Sotera or with your partner.
Change management
Production changes go through:
- Code review by at least one engineer other than the author.
- Automated checks on every pull request (TypeScript type checking, build validation).
- Deployment via CI/CD — manual ad-hoc deploys are not part of the normal flow.
- Staged rollout through a pre-production environment before reaching production.