🔎 Understanding Oracle RAC Voting Disk and Split-Brain Resolution
Oracle RAC (Real Application Clusters) ensures high availability and data integrity even when multiple nodes access the same database. A crucial part of this is the Voting Disk, which works closely with the Cluster Synchronization Services (CSS) daemon.
In this blog, we will explore:
- What is a Voting Disk?
- Function of CSS and Voting Disk
- What is Split-Brain?
- How Oracle resolves Split-Brain
- Real-world example with resolution
📀 What is a Voting Disk in Oracle RAC?
The Voting Disk is a shared disk file used by Oracle Clusterware to monitor and manage cluster node membership.
- It stores heartbeat information of all nodes.
- Determines which nodes are active and should remain in the cluster.
- Nodes failing to update are considered unhealthy and may be evicted.
🔄 Function of CSS with Voting Disk
The Cluster Synchronization Services (CSS) process ensures synchronization and node membership integrity in the cluster.
- Each node's
CSSD
daemon writes heartbeat data to the voting disk. - If a node cannot write its heartbeat or loses connectivity, it risks eviction.
- CSS uses the voting disk to decide which nodes form the majority (quorum).
- Only the nodes forming quorum survive and continue running.
⚔️ What is Split-Brain?
Split-Brain occurs when two or more nodes lose communication but continue operating, which can lead to data corruption if both try accessing shared storage independently.
Oracle RAC uses voting disks and quorum logic to ensure only one side of a partitioned cluster survives.
🦐 Split-Brain Resolution Using Voting Disk
When a communication failure occurs between nodes:
- CSS on each node uses voting disk access to determine quorum.
- The partition with the majority of votes stays active.
- The other nodes are evicted to protect data integrity.
🔍 Example: 3-Node RAC Cluster Split-Brain
Assume we have a 3-node RAC setup:
- Node1, Node2, Node3
- 3 Voting Disks shared among nodes
Failure Scenario: Node3 becomes isolated due to interconnect failure.
Outcome: Node1 and Node2 maintain communication and access the majority of voting disks.
- Node1 & Node2: Form majority (2 of 3). Remain active.
- Node3: Cannot communicate. Lacks quorum. Gets evicted.
Decision Summary Table
Component | Purpose |
---|---|
Voting Disk | Tracks heartbeats, helps determine active cluster nodes |
CSS Daemon | Handles node membership and eviction |
Split Brain | Resolved using quorum logic via voting disks |
🛡 Best Practices
- ✅ Use an odd number of voting disks (3 or 5) to avoid tie scenarios.
- ⚡ Ensure voting disks are placed on highly available shared storage (e.g., ASM).
- 📅 Regularly monitor cluster health and evictions using
crsctl query css votedisk
andcrsctl status res -t
.
🔬 Conclusion
The collaboration between the CSS daemon and voting disk is fundamental to Oracle RAC's high availability design.
- Voting Disks record heartbeats and decide node membership.
- CSS ensures cluster consistency, evicting unhealthy or split nodes.
- Split-Brain is automatically resolved using majority logic to prevent data corruption.
Maintain a healthy voting disk configuration and let Oracle handle the rest!
Comments
Post a Comment