Upgrade/Install of Grid Infrastructure to 11.2.0.2 is failing

I have come across a quite difficult issue when trying to install/upgrade to Grid Infrastructure 11.2.0.2 on Linux. We have performed the following trials:

  • Installation of 11.1.0.6 to have Cluster Files (OCR/Votedisk) outside of ASM, then upgrade to 11.2.0.1. After that, we installed all the latest patches (PSU 11.2.0.1.2). Firstly we considered running RDBMS 11.1.0.7 in a 11.2 Grid Infrastructure cluster. However, there are quite a few issues when running this mixed setup. For most of them, patches or workarounds do exist. (see Note 948456.1) However, during failover testing, we realized that the remote database instance does not survive the failure of the public lan of the local node. This is fixed in 11.2.0.2 which motivated us to upgrade.
  • We tried to upgrade our GI 11.2.0.1.2 installation to 11.2.0.2. Everything went fine until it failed when running rootupgrade.sh on the last (second) node. Here, the second node could not join the cluster. Important to know is that 11.2.0.2 brings the new feature HAIP (High available IP), which allows to have multiple private interconnect interfaces (each with a different private interconnect network ip) to be used for failover/load balancing by Grid Infrastructure.  For this feature, Oracle uses Multicast on network “230.0.1.0”. This is stated in the updated version of the README for Oracle Database 11.2: http://download.oracle.com/docs/cd/E11882_01/readmes.112/e17129/toc.htm#CHDIEHCH
ACFS-9309: ADVM/ACFS installation correctness verified.
Failed to start Oracle Clusterware stack
Failed to start Cluster Synchorinisation Service in clustered mode at
/opt/oracle/grid11/11.2.0.2/grid/crs/install/crsconfig_lib.pm line 1016.
/opt/oracle/grid11/11.2.0.2/grid/perl/bin/perl -I/opt/oracle/grid11/11.2.0.2/grid/perl/lib -
I/opt/oracle/grid11/11.2.0.2/grid/crs/install
/opt/oracle/grid11/11.2.0.2/grid/crs/install/rootcrs.pl execution failed
  • The next try was to remove everything and directly install 11.2.0.1 with cluster files stored inside of ASM. This worked fine and we then proceeded in patching with latest PSU 11.2.0.1.2. Then, we gave it another try to upgrade to 11.2.0.2. The same problem occurred again at rootupgrade.sh on second node.
  • Then we decided to remove everything again and try a direct installation of 11.2.0.2 without any upgrade. This also failed at root.sh on last node.

With the hints of several colleagues from OTN forum (http://forums.oracle.com/forums/messageview.jspa?messageID=5393319&stqc=true) we realized that it might be related to multicast setup. We verified that multicast works with the script from MetaLink Note 413783.1. However this script does not take into account that we want multicasting via private interconnect interface instead of public LAN.

I realized that it might be necessary to add a network route to our system to direct multicast to the private interconnect network:

e.g.: /sbin/route add -net 224.0.0.0 netmask 240.0.0.0 dev bond1

After doing that, the multicast test script from MOS succeeded.

starting receiver:

[grid11@Node2 ]$ java MultiCastTestReceive

starting sender:

[grid11@Node1 ]$ java MultiCastTestSend
Sent 10 bytes to /230.0.1.0

message received by receiver:

Received data from: /10.128.128.1:13139 with length: 10

Please note that in this case 10.128.128.1 is private interconnect ip. Although it is currently just a strong suspicion, I am quite positive that this was the problem and we will consider retrying the upgrade if time allows and will report the results.

Unfortunately, Oracle support was of no help at all with this problem.

13 comments
Leave a comment »

  1. Hi Martin,

    It really strikes me, that they have changed way too much in this patchset. They have moved away from them being bug fixes and have introduced a whole raft of new features, HAIP being just one of many.

    Since upgrading to 11.2.0.1 I have had issues with both PSU’s so was expecting problems with the patchset. I really feel the release quality/testing has gone downhill badly.

    cheers,

    jason.

  2. Hi Jason,

    thank you for visiting.

    I completely agree. Introducing such a feature with such big implications and potential problems with only 10 lines of information in the readme is not acceptable.

    I can feel your pain regarding the GI PSUs and the “auto” feature which does not work with different unix accounts for rdbms and GI.

    It seems to me that the release of 11.2.0.2 was mainly driven by OOW deadline rather than finished product testing. Especially since it was announced for October and then released mid-september.

    Ciao,
    Martin

  3. I just saw that Joel Goodman of Oracle University EMEA has posted about his experience of upgrading GI for a standalone server and database. Especially interesting is that the upgrade failed to modify paths in /etc/init.d.init.ohasd and /etc/init.d/ohasd. According to the bug description of bug 10167269 this only appears in standalone GI installations.
    See the details at http://dbatrain.wordpress.com/2010/10/11/try-11-2-0-2-you-wont-feel-out-of-place/

    Regards,
    Martin

  4. Hi Marin,

    We are experiencing the same issue. Do I need to add 230.0.1.0 to the routing table or 224.0.0.0?

    thank you
    Hari

  5. Hello Hari,

    you should add the whole multicast network address:

    /sbin/route add -net 224.0.0.0 netmask 240.0.0.0 dev bond1

    However, I still did not have a chance to verify if this is the solution to our problem. I am currently waiting for an upgrade window for this system. Additionally, I have heard that Oracle is working on a patch to change the address 230.0.1.0 to a different one because there were some compatibility issues with specific switches, but this is not official yet.

    I have received a check script (written in C) to validate if multicast is working on the system before installing or upgrading 11.2.0.2. If you wish, I could send you this script.

    Regards,
    Martin

  6. The script (mcast1) is attached to MetaLink Note 1212703.1.

    Regards,
    Martin

  7. It turned out that a host route is not necessary, but there are issues with the multicast address oracle has chosen.

    Please see this article for news about multicast:
    http://www.ora-solutions.net/web/2010/10/28/grid-infrastructure-11-2-0-2-multicast-patch-news/

    Regards,
    Martin

  8. Martin
    Thanks very much for you suggestion, Finally we were able to install 11.2.0.2. Our problem happened to be on the Cisco switch (6500 series), multicast was disabled on the switch. The network configuration consist of 2 RAC nodes connected to two different switches. We enabled multicast on the Private Internconnect subnet and we didn’t even need to add routing information.

    Steps We did….
    1. Make sure multicast is enabled on private interconnect ethernet devices.
    ifconfig eth4
    ifconfig eth5

    2. Enable to multicast on the switches (I will post more details on CISCO configuraiton later).

    3. Download and test multicast support using mcast1 (c program ) as described in the Metalink Note 1212703.1

    4. Install Grid Infrastructure.

    Thanks
    Hari

  9. Can you please post details on how to enable to multicast on the Cisco switches ?

    Thanks in advance,

    Miladin

  10. Miladin,

    i will try to find out with our network department. But anyway, as multicast works fine for the new multicast IP, we will just wait for the patch before upgrading.

    Best regards,
    Martin

  11. Thanks Martin that would be great.

    Actually I posted this question for Hari.

    Thanks,

    Miladin

  12. […] sur plusieurs blogs dont celui de Julian Dyke, celui de Miladin Modrakovic et celui de Martin Decker. Pour en savoir plus, voici 3 documents intéressants à lire […]

  13. […] sur plusieurs blogs dont celui de Julian Dyke, celui de Miladin Modrakovic et celui de Martin Decker. Pour en savoir plus, voici 3 documents intéressants à lire […]

Leave Comment