Upgrade/Install of Grid Infrastructure to 188.8.131.52 is failingBy Martin | September 29th, 2010 | Category: 11gR2, Linux, Unix | 13 comments
I have come across a quite difficult issue when trying to install/upgrade to Grid Infrastructure 184.108.40.206 on Linux. We have performed the following trials:
- Installation of 220.127.116.11 to have Cluster Files (OCR/Votedisk) outside of ASM, then upgrade to 18.104.22.168. After that, we installed all the latest patches (PSU 22.214.171.124.2). Firstly we considered running RDBMS 126.96.36.199 in a 11.2 Grid Infrastructure cluster. However, there are quite a few issues when running this mixed setup. For most of them, patches or workarounds do exist. (see Note 948456.1) However, during failover testing, we realized that the remote database instance does not survive the failure of the public lan of the local node. This is fixed in 188.8.131.52 which motivated us to upgrade.
- We tried to upgrade our GI 184.108.40.206.2 installation to 220.127.116.11. Everything went fine until it failed when running rootupgrade.sh on the last (second) node. Here, the second node could not join the cluster. Important to know is that 18.104.22.168 brings the new feature HAIP (High available IP), which allows to have multiple private interconnect interfaces (each with a different private interconnect network ip) to be used for failover/load balancing by Grid Infrastructure. For this feature, Oracle uses Multicast on network “22.214.171.124”. This is stated in the updated version of the README for Oracle Database 11.2: http://download.oracle.com/docs/cd/E11882_01/readmes.112/e17129/toc.htm#CHDIEHCH
ACFS-9309: ADVM/ACFS installation correctness verified. Failed to start Oracle Clusterware stack Failed to start Cluster Synchorinisation Service in clustered mode at /opt/oracle/grid11/126.96.36.199/grid/crs/install/crsconfig_lib.pm line 1016. /opt/oracle/grid11/188.8.131.52/grid/perl/bin/perl -I/opt/oracle/grid11/184.108.40.206/grid/perl/lib - I/opt/oracle/grid11/220.127.116.11/grid/crs/install /opt/oracle/grid11/18.104.22.168/grid/crs/install/rootcrs.pl execution failed
- The next try was to remove everything and directly install 22.214.171.124 with cluster files stored inside of ASM. This worked fine and we then proceeded in patching with latest PSU 126.96.36.199.2. Then, we gave it another try to upgrade to 188.8.131.52. The same problem occurred again at rootupgrade.sh on second node.
- Then we decided to remove everything again and try a direct installation of 184.108.40.206 without any upgrade. This also failed at root.sh on last node.
With the hints of several colleagues from OTN forum (http://forums.oracle.com/forums/messageview.jspa?messageID=5393319&stqc=true) we realized that it might be related to multicast setup. We verified that multicast works with the script from MetaLink Note 413783.1. However this script does not take into account that we want multicasting via private interconnect interface instead of public LAN.
I realized that it might be necessary to add a network route to our system to direct multicast to the private interconnect network:
e.g.: /sbin/route add -net 220.127.116.11 netmask 240.0.0.0 dev bond1
After doing that, the multicast test script from MOS succeeded.
starting receiver:[grid11@Node2 ]$ java MultiCastTestReceive
starting sender:[grid11@Node1 ]$ java MultiCastTestSend Sent 10 bytes to /18.104.22.168
message received by receiver:Received data from: /10.128.128.1:13139 with length: 10
Please note that in this case 10.128.128.1 is private interconnect ip. Although it is currently just a strong suspicion, I am quite positive that this was the problem and we will consider retrying the upgrade if time allows and will report the results.
Unfortunately, Oracle support was of no help at all with this problem.