Availability and Recovery
Plan availability and recovery by matching your environment to the appropriate deployment topology, high availability baseline, Global Cluster Disaster Recovery path, and backup or restore procedure.
For detailed sizing, backup, restore, and disaster recovery procedures, follow the linked topic-specific pages. The guidance below focuses on planning choices and support boundaries.
TOC
Deployment TopologyHigh Availability BaselineGlobal Cluster Disaster RecoveryBackup And Restore PathsRecovery ChecklistDeployment Topology
Choose topology by scenario, not by a single production minimum table.
For installation planning, see Plan and Prerequisites.
High Availability Baseline
For production-like environments, use 3 control plane nodes as the HA baseline. A 5 control plane node topology can improve scale and reliability for larger environments, but it is not a universal hard requirement for every production deployment.
Infra nodes or custom role nodes are useful for isolating platform components or high-load components, but they are not a universal requirement unless a sizing tier or component document says so. For Extra Large global cluster sizing, follow the sizing guidance for dedicated infra nodes.
Global Cluster Disaster Recovery
Global Cluster Disaster Recovery protects the platform management entry point and global control-plane services when the Primary global cluster becomes unavailable.
For 4.3, Global DR has the following scope:
- It uses Primary and Standby
globalclusters. - It relies on real-time synchronization of resource state stored in the Primary
globalcluster etcd, except excluded namespaces. - It restores the platform entry point and
globalcontrol-plane services by switching DNS or VIP access to the Standby cluster. - Primary and Standby should follow the validated path of aligned versions, patches, component versions, and key configuration.
Do not treat Global DR as full platform data DR, application data DR, automatic failover, or an SLA-backed RPO/RTO commitment. Global DR does not cover registry data, chartmuseum data, other component data, application data, or resources excluded from etcd synchronization.
For the procedure and supported scenarios, see Global Cluster Disaster Recovery.
Backup And Restore Paths
Recovery is composed by scenario. Each mechanism protects a specific data domain and has its own procedure, prerequisite, and limitation.
Application backup can protect namespaces, Kubernetes resources, and persistent volume data according to the backup configuration. It does not support every storage or application data pattern. For example, hostPath PersistentVolumes are not supported by the documented application backup path, and database workloads should follow data-service-specific backup guidance.
Recovery Checklist
Use this checklist to choose follow-up work. It does not replace the linked procedures:
- Choose Single Cluster, Multi-Cluster, or Single Node during installation planning.
- Size the
globalcluster and workload clusters from the scalability guidance. - Decide whether Global Cluster Disaster Recovery is required before installing Core.
- Configure backups for etcd, registry data, monitoring data, logging data, and applications according to the data domains you need to recover.
- Run regular recovery checks and failover drills where your operational process requires them.
- Verify platform access,
globalservices, connected cluster access, and component-level recovery after failover or restore.
For the next documentation path, see Learn More.