Skip to main content

🧠 Understanding Oracle RAC Voting Disk and Split-Brain with Real-World Example

🔎 Understanding Oracle RAC Voting Disk and Split-Brain Resolution

Oracle RAC (Real Application Clusters) ensures high availability and data integrity even when multiple nodes access the same database. A crucial part of this is the Voting Disk, which works closely with the Cluster Synchronization Services (CSS) daemon.

In this blog, we will explore:

  • What is a Voting Disk?
  • Function of CSS and Voting Disk
  • What is Split-Brain?
  • How Oracle resolves Split-Brain
  • Real-world example with resolution

📀 What is a Voting Disk in Oracle RAC?

The Voting Disk is a shared disk file used by Oracle Clusterware to monitor and manage cluster node membership.

  • It stores heartbeat information of all nodes.
  • Determines which nodes are active and should remain in the cluster.
  • Nodes failing to update are considered unhealthy and may be evicted.

🔄 Function of CSS with Voting Disk

The Cluster Synchronization Services (CSS) process ensures synchronization and node membership integrity in the cluster.

  • Each node's CSSD daemon writes heartbeat data to the voting disk.
  • If a node cannot write its heartbeat or loses connectivity, it risks eviction.
  • CSS uses the voting disk to decide which nodes form the majority (quorum).
  • Only the nodes forming quorum survive and continue running.

⚔️ What is Split-Brain?

Split-Brain occurs when two or more nodes lose communication but continue operating, which can lead to data corruption if both try accessing shared storage independently.

Oracle RAC uses voting disks and quorum logic to ensure only one side of a partitioned cluster survives.

🦐 Split-Brain Resolution Using Voting Disk

When a communication failure occurs between nodes:

  • CSS on each node uses voting disk access to determine quorum.
  • The partition with the majority of votes stays active.
  • The other nodes are evicted to protect data integrity.

🔍 Example: 3-Node RAC Cluster Split-Brain

Assume we have a 3-node RAC setup:

  • Node1, Node2, Node3
  • 3 Voting Disks shared among nodes

Failure Scenario: Node3 becomes isolated due to interconnect failure.

Outcome: Node1 and Node2 maintain communication and access the majority of voting disks.

  • Node1 & Node2: Form majority (2 of 3). Remain active.
  • Node3: Cannot communicate. Lacks quorum. Gets evicted.

Decision Summary Table

Component Purpose
Voting Disk Tracks heartbeats, helps determine active cluster nodes
CSS Daemon Handles node membership and eviction
Split Brain Resolved using quorum logic via voting disks

🛡 Best Practices

  • ✅ Use an odd number of voting disks (3 or 5) to avoid tie scenarios.
  • ⚡ Ensure voting disks are placed on highly available shared storage (e.g., ASM).
  • 📅 Regularly monitor cluster health and evictions using crsctl query css votedisk and crsctl status res -t.

🔬 Conclusion

The collaboration between the CSS daemon and voting disk is fundamental to Oracle RAC's high availability design.

  • Voting Disks record heartbeats and decide node membership.
  • CSS ensures cluster consistency, evicting unhealthy or split nodes.
  • Split-Brain is automatically resolved using majority logic to prevent data corruption.

Maintain a healthy voting disk configuration and let Oracle handle the rest!

Comments

Popular posts from this blog

🚀 Automating Oracle Database Patching with Ansible: A Complete Guide

Oracle database patching has long been the bane of DBAs everywhere. It's a critical task that requires precision, expertise, and often results in extended maintenance windows. What if I told you that you could automate this entire process, reducing both risk and downtime while ensuring consistency across your Oracle estate? 💡 In this comprehensive guide, I'll walk you through a production-ready Ansible playbook that completely automates Oracle patch application using OPatch. Whether you're managing a single Oracle instance or hundreds of databases across your enterprise, this solution will transform your patch management strategy! 🎯 🔥 The Challenge: Why Oracle Patching is Complex Before diving into the solution, let's understand why Oracle patching is so challenging: 🔗 Multiple dependencies : OPatch versions, Oracle Home configurations, running processes ⚠️ Risk of corruption : Incorrect patch application can render databases unusable ⏰ Downtime requirements : Da...

Oracle RAC Switchover & Switchback: Step-by-Step Guide

 Ensuring business continuity requires regular Disaster Recovery (DR) drills. This guide covers the Switchover and Switchback process between Primary (DC) and Standby (DR) databases . Pre-checks Before Performing Switchover Before starting the activity, ensure there are no active sessions in the database. If any are found, share the session details with the application team, get their confirmation, and terminate the sessions. Primary Database Name: PRIMARY Standby Database Name: STANDBY  Identify Active Sessions set lines 999 pages 999 col machine for a30 col username for a30 col program for a30 compute sum of count on report break on report select inst_id,username,osuser,machine,program,status,count(1) "count" from gv$session where inst_id=1 and program like 'JDBC%' group by inst_id,username,osuser,machine,program,status order by 1,2; select inst_id,username,osuser,machine,program,status,count(1) "count" from gv$session where inst_id=2 and program lik...

Mastering Oracle RAC with SRVCTL Commands

Oracle Real Application Clusters (RAC) provide high availability, scalability, and manageability for databases. One of the most powerful tools for managing RAC databases is srvctl , a command-line utility that allows administrators to control various database services. This blog explores essential srvctl commands to help you efficiently manage Oracle RAC environments. 1. Checking Database Configuration and Status  List all available databases on the host:                  srvctl config database   Check the status of a specific database and its instances:                    srvctl status database -d <database_name>   Retrieve detailed status information about a database, including its instances and states:                    srvctl status database -d <database_name> -v 2. Stopping and Starting Databases   ...