In a RAC environment, Oracle “appends” a class B IP Address in the subnet 169.254.0.0 to the same network interface on which the private IP Address sits. This IP address is the HAIP and is used for multicast communication among the cluster nodes. In a recent upgrade of Oracle 10.2.0.4 to 126.96.36.199 on AIX 6.1, we have been involved in; we encountered a situation where we found it impossible to start both instances of a two-node cluster simultaneously. On drilling down we found that the HAIP resource was offline on the first node.
./crsctl stat res ora.cluster_interconnect.haip -init
STATE=OFFLINE on pdc-rac-node1
We also identified the following repeating error in the cluster log:
/usr/oracle/GRID_HOME/bin/orarootagent.bin(6357518)]CRS-5027:Agent failed to initialize ARP devices required for starting the HAIP resource
After a series of troubleshooting steps we arrived at the fact that for some reason, the interface on both nodes in the cluster did not have the private IP Address bound. We would see the address on this interface when we issued ifconfig -a but when examining the interface using smitty we found that the Network Address field was blank for that interface and the interface actually read DOWN!
We took two steps to eventually work around this issue using smitty interface:
- Manually applied the private IP address
- Added the HAIP as an alias on the interface as an alias
- Rebooted the server
Once this was done we were able to bring HAIP online and start the instances on multiple nodes. We did also turn on a certain environment variable which Oracle sent but I am not so sure it contributed much. Hope this helps someone.