Upgrade/Install of Grid Infrastructure to 11.2.0.2 is failing
By Martin | September 29th, 2010 | Category: 11gR2, Linux, Unix | 13 commentsI have come across a quite difficult issue when trying to install/upgrade to Grid Infrastructure 11.2.0.2 on Linux. We have performed the following trials:
- Installation of 11.1.0.6 to have Cluster Files (OCR/Votedisk) outside of ASM, then upgrade to 11.2.0.1. After that, we installed all the latest patches (PSU 11.2.0.1.2). Firstly we considered running RDBMS 11.1.0.7 in a 11.2 Grid Infrastructure cluster. However, there are quite a few issues when running this mixed setup. For most of them, patches or workarounds do exist. (see Note 948456.1) However, during failover testing, we realized that the remote database instance does not survive the failure of the public lan of the local node. This is fixed in 11.2.0.2 which motivated us to upgrade.
- We tried to upgrade our GI 11.2.0.1.2 installation to 11.2.0.2. Everything went fine until it failed when running rootupgrade.sh on the last (second) node. Here, the second node could not join the cluster. Important to know is that 11.2.0.2 brings the new feature HAIP (High available IP), which allows to have multiple private interconnect interfaces (each with a different private interconnect network ip) to be used for failover/load balancing by Grid Infrastructure. For this feature, Oracle uses Multicast on network “230.0.1.0”. This is stated in the updated version of the README for Oracle Database 11.2: http://download.oracle.com/docs/cd/E11882_01/readmes.112/e17129/toc.htm#CHDIEHCH
ACFS-9309: ADVM/ACFS installation correctness verified. Failed to start Oracle Clusterware stack Failed to start Cluster Synchorinisation Service in clustered mode at /opt/oracle/grid11/11.2.0.2/grid/crs/install/crsconfig_lib.pm line 1016. /opt/oracle/grid11/11.2.0.2/grid/perl/bin/perl -I/opt/oracle/grid11/11.2.0.2/grid/perl/lib - I/opt/oracle/grid11/11.2.0.2/grid/crs/install /opt/oracle/grid11/11.2.0.2/grid/crs/install/rootcrs.pl execution failed
- The next try was to remove everything and directly install 11.2.0.1 with cluster files stored inside of ASM. This worked fine and we then proceeded in patching with latest PSU 11.2.0.1.2. Then, we gave it another try to upgrade to 11.2.0.2. The same problem occurred again at rootupgrade.sh on second node.
- Then we decided to remove everything again and try a direct installation of 11.2.0.2 without any upgrade. This also failed at root.sh on last node.
With the hints of several colleagues from OTN forum (http://forums.oracle.com/forums/messageview.jspa?messageID=5393319&stqc=true) we realized that it might be related to multicast setup. We verified that multicast works with the script from MetaLink Note 413783.1. However this script does not take into account that we want multicasting via private interconnect interface instead of public LAN.
I realized that it might be necessary to add a network route to our system to direct multicast to the private interconnect network:
e.g.: /sbin/route add -net 224.0.0.0 netmask 240.0.0.0 dev bond1
After doing that, the multicast test script from MOS succeeded.
starting receiver:
[grid11@Node2 ]$ java MultiCastTestReceivestarting sender:
[grid11@Node1 ]$ java MultiCastTestSend Sent 10 bytes to /230.0.1.0message received by receiver:
Received data from: /10.128.128.1:13139 with length: 10
Please note that in this case 10.128.128.1 is private interconnect ip. Although it is currently just a strong suspicion, I am quite positive that this was the problem and we will consider retrying the upgrade if time allows and will report the results.
Unfortunately, Oracle support was of no help at all with this problem.
