Upgrade/Install of Grid Infrastructure to 22.214.171.124 is failingBy Martin | September 29th, 2010 | Category: 11gR2, Linux, Unix | 13 comments
I have come across a quite difficult issue when trying to install/upgrade to Grid Infrastructure 126.96.36.199 on Linux. We have performed the following trials:
- Installation of 188.8.131.52 to have Cluster Files (OCR/Votedisk) outside of ASM, then upgrade to 184.108.40.206. After that, we installed all the latest patches (PSU 220.127.116.11.2). Firstly we considered running RDBMS 18.104.22.168 in a 11.2 Grid Infrastructure cluster. However, there are quite a few issues when running this mixed setup. For most of them, patches or workarounds do exist. (see Note 948456.1) However, during failover testing, we realized that the remote database instance does not survive the failure of the public lan of the local node. This is fixed in 22.214.171.124 which motivated us to upgrade.
- We tried to upgrade our GI 126.96.36.199.2 installation to 188.8.131.52. Everything went fine until it failed when running rootupgrade.sh on the last (second) node. Here, the second node could not join the cluster. Important to know is that 184.108.40.206 brings the new feature HAIP (High available IP), which allows to have multiple private interconnect interfaces (each with a different private interconnect network ip) to be used for failover/load balancing by Grid Infrastructure. For this feature, Oracle uses Multicast on network “220.127.116.11”. This is stated in the updated version of the README for Oracle Database 11.2: http://download.oracle.com/docs/cd/E11882_01/readmes.112/e17129/toc.htm#CHDIEHCH
ACFS-9309: ADVM/ACFS installation correctness verified. Failed to start Oracle Clusterware stack Failed to start Cluster Synchorinisation Service in clustered mode at /opt/oracle/grid11/18.104.22.168/grid/crs/install/crsconfig_lib.pm line 1016. /opt/oracle/grid11/22.214.171.124/grid/perl/bin/perl -I/opt/oracle/grid11/126.96.36.199/grid/perl/lib - I/opt/oracle/grid11/188.8.131.52/grid/crs/install /opt/oracle/grid11/184.108.40.206/grid/crs/install/rootcrs.pl execution failed
- The next try was to remove everything and directly install 220.127.116.11 with cluster files stored inside of ASM. This worked fine and we then proceeded in patching with latest PSU 18.104.22.168.2. Then, we gave it another try to upgrade to 22.214.171.124. The same problem occurred again at rootupgrade.sh on second node.
- Then we decided to remove everything again and try a direct installation of 126.96.36.199 without any upgrade. This also failed at root.sh on last node.
With the hints of several colleagues from OTN forum (http://forums.oracle.com/forums/messageview.jspa?messageID=5393319&stqc=true) we realized that it might be related to multicast setup. We verified that multicast works with the script from MetaLink Note 413783.1. However this script does not take into account that we want multicasting via private interconnect interface instead of public LAN.
I realized that it might be necessary to add a network route to our system to direct multicast to the private interconnect network:
e.g.: /sbin/route add -net 188.8.131.52 netmask 240.0.0.0 dev bond1
After doing that, the multicast test script from MOS succeeded.
starting receiver:[grid11@Node2 ]$ java MultiCastTestReceive
starting sender:[grid11@Node1 ]$ java MultiCastTestSend Sent 10 bytes to /184.108.40.206
message received by receiver:Received data from: /10.128.128.1:13139 with length: 10
Please note that in this case 10.128.128.1 is private interconnect ip. Although it is currently just a strong suspicion, I am quite positive that this was the problem and we will consider retrying the upgrade if time allows and will report the results.
Unfortunately, Oracle support was of no help at all with this problem.