Quantcast
Channel: Remote Database Admin Blog » Database Administration
Viewing all articles
Browse latest Browse all 32

Grid and ASM Upgrade on 11.2.0.1 to 11.2.0.3 Hang – Resolved

$
0
0

Environment :

Operating System – Solaris 10 64 Bit
2 Node RAC
more than 10 Databases running on the Nodes

Scenario:

We executed runcluvfy to verify and confirm both nodes pre-requestis are meet.

We started installing 11.2.0.3 into different home as recommended by oracle using runInstaller.

While running runInstaller all the databases were up and running on both nodes.

Now runInstaller asked us to execute rootupgrade.sh on both nodes. (we were asked to execute this script by root user from the folder which have write permisson by CRS user)

rootupgrade.sh excuted on Node 1 and it started executing well and started ASM upgarde.

We are closely watching upgrade log created on node “CRS_HOME/cfgtoollogs” folder.

Also monitoring $GRID_HOME/log/prac-prd05/alertl<node01>.log available on the same location

Noticed in logs it started ASM upgrade. As part of this process rootupgrade.sh is trying to stop the CRS on node and issued the command “crsctl stop crs -f” which we can see from the above logs.

We waited for an hour as this RAC was having more huge databases, but there is no move from there and able to see more repeated message from alert_<node1>.log as below

“[gpnpd(9231)]CRS-2332:Error pushing GPnP profile to “mdns:service:gpnp._tcp.local.://<node2>:64774/agent=gpnpd,cname=<clustername>,host=<node2>,pid=7067/gpnpd h:<node2> c:<ClusternName>”

This made us feel like Cluster Upgrade is Hung…. So as usual we checked multiple logs and googling made us to contact Oracle.

Solution:

Oracle asked as to execute and proceed with clean shutdown from outside

We tried executing all the options from outside

Node01(root) # ./crsctl stop crs
CRS-2797: Shutdown is already in progress for ‘Node01′, waiting for it to complete

Node01(root) # ./crsctl stop has
CRS-2797: Shutdown is already in progress for ‘Node01′, waiting for it to complete

Node01(root) # ./crsctl stop crs -f
CRS-2797: Shutdown is already in progress for ‘Node01′, waiting for it to complete

Updated oracle and got confirmation from oracle to restart only Node01.

We restarted both nodes one by one and confirmed crstl stop and start cleanly shutting down and starting up.

Now executed ./rootupgrade.sh it went well and upgraded both nodes.

Note : rootupgrade took 45 mins on first node and completed.

On the second node rootupgarde.sh failed with message saying

‘ROOTCRS_STACK’ checkpoint has failed
Running as user oracrs: $CRS_HOME/bin/cluutil -ckpt -oraclebase /oragridcrs/oracle/orabase -writeckpt -name ROOTCRS_STACK -state FAIL

We stopped cluster manually and re-executed the rootupgarde.sh

Addtional Information:

Oracle also confirmed there is a bug related with this message as follows

Bug:9336825 – Repeated error “CRS-2332:Error pushing GPnP profile to “mdns:service:gpnp._tcp.local.://racnode1:16739/agent=gpnpd,cname=crs,host=racnode1,pid=17182/gpnpd h:racnode1 c:crs”" in clusterware alert<nodename>.log, fixed in 11.2.0.2 bundle2, 11.2.0.3

Please have a look on Things to Consider Before Upgrading to 11.2.0.2 Grid Infrastructure/ASM (Doc ID 1312225.1)

If you read Bug:9336825 more closely and you may see the following:
“problem can be hit on a single cluster – if one gpnpd is coming up, and another gpnpd on the same cluster restarts in about that time. Problem is hit when restart of gpnpd elsewhere on cluster happens between discovery and profile updates — “  <=== such as can be the case during a Grid Infrastructure upgrade.

If required communicate with Oracle support and enable debug mode if they want to have detailed logging.

export ORA_INSTALL_DEBUG=TRUE
./rootupgrade.sh
export ORA_INSTALL_DEBUG=


Viewing all articles
Browse latest Browse all 32

Trending Articles