Wednesday 23 November 2011

New Service Guard 11.20 Series: Halting a Node or the Cluster while Keeping Packages Running(LAD)

New Service Guard 11.20 Series: Halting a Node or the Cluster while Keeping Packages Running(LAD)


Halting a Node or the Cluster while Keeping Packages Running (Live Application Detach)
There may be circumstances in which you want to do maintenance that involves halting a node, or the entire cluster, without halting or failing over the affected packages. Such maintenance might consist of anything short of rebooting the node or nodes, but a likely case is networking changes that will disrupt the heartbeat. New command options in Serviceguard A.11.20 (collectively known as Live Application Detach (LAD)) allow you to do this kind of maintenance while keeping the packages running. The packages are no longer monitored by Serviceguard, but the applications continue to run. Packages in this state are called detached packages. When you have done the necessary maintenance, you can restart the node or cluster, and normal monitoring will resume on the packages.
There are loads of cases in which you can't use LAD, in fact there are so many that I can't list them here, I would recommend taking a look at the list in chapter.7 of the service guard Doc.
Ok, so in our example, we have a cluster running with only one heartbeat network configured(not meeting minimum requi
rments , but is just a test), because of changes in our network infrastructure, we are going to loose communication in the heartbeat network during and hour, after that we have to change our heartbeat config to another subnet, we are going to use LAD so one node in the cluster doesn't panic when it looses all it's heartbeat communications.
root@vmcluste:/etc/cmcluster> cmviewcl
CLUSTER STATUS
vmcluster up
NODE STATUS STATE
vmcluster1 up running
PACKAGE STATUS STATE AUTO_RUN NODE
pkg2 up running enabled vmcluster1
NODE STATUS STATE
vmcluster2 up running
PACKAGE STATUS STATE AUTO_RUN NODE
pkg1 up running enabled vmcluster2
Our heartbeat config
root@vmcluste:/etc/cmcluster> cmviewcl -v -f line | grep heartbeat
node:vmcluster1|interface:lan1|ip_address:192.168.99.1|heartbeat=true
node:vmcluster2|interface:lan1|ip_address:192.168.99.2|heartbeat=true
We need to change it to the 172.27.1.0 network, after the network guys get link back in the new vlan they created for the heartbeat.
So first of all we shutdown the cluster detaching the packages, our cluster will be in down state and our packages in detached state, in the detached state the packages keep running they just aren't monitored by SG.
root@vmcluste:/etc/cmcluster> cmhaltcl -d
Detaching package pkg1.
Detaching package pkg2.
Disabling all packages from starting on nodes to be halted.
Warning: Do not modify or enable packages until the halt operation is completed.
This operation may take some time.
Waiting for nodes to halt ..... done
Successfully detached package pkg1.
Successfully detached package pkg2.
Successfully halted all nodes specified.
Halt operation complete.
root@vmcluste:/etc/cmcluster> cmviewcl
CLUSTER STATUS
vmcluster down
NODE STATUS STATE
vmcluster1 down unknown
PACKAGE STATUS STATE AUTO_RUN NODE
pkg2 detached detached enabled vmcluster1
NODE STATUS STATE
vmcluster2 down unknown
PACKAGE STATUS STATE AUTO_RUN NODE
pkg1 detached detached enabled vmcluster2
If we check we still have the package virtual ips running and the package fs mounted, and our super app running(in this case a ping..)
root@vmcluste:/> bdf | grep data
/dev/vgpkg2/lvdata1
65536 2149 59433 3% /data1pkg2
root@vmcluste:/> netstat -ni
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
lan1 1500 192.168.99.0 192.168.99.1 365950 0 368612 0 0
lan0 1500 11.0.0.0 11.0.0.51 727096 0 567120 0 0
lo0 32808 127.0.0.0 127.0.0.1 13633 0 13633 0 0
lan0:1 1500 192.168.0.0 192.168.0.97 614 0 1205 0 0
lan0:2 1500 11.0.0.0 11.0.0.54 0 0 0 0 0
root@vmcluste:/> ps -ef | grep -i ping
root 11220 1 0 14:40:22 ? 0:02 ping 11.0.0.22
ok, so now we can tell our network people they can start working, and kill comms in the HB network....
we are going to do it ourselfs in this case xP.
# hpvmnet -S localnet -h
hpvmnet: Halt the vswitch 'localnet'? [n/y]: y
# hpvmnet
Name Number State Mode NamePPA MAC Address IPv4 Address
======== ====== ======= ========= ======== ============== ===============
localnet 1 Down Shared N/A N/A
vmnetpro 3 Up Shared lan0 0x00156004e156 11.0.0.22
#
we can now check on our servers that we have no heartbeat network working:
root@vmcluste:/etc/cmcluster> netstat -ni | grep lan1
lan1* 1500 192.168.99.0 192.168.99.2 464719 0 462011 0 0
root@vmcluste:/etc/cmcluster> lanscan | grep lan1
0/0/2/0 0xEECAE1760A59 1 UP lan1 snap1 2 ETHER Yes 119
root@vmcluste:/etc/cmcluster> nwmgr --diag link -A dest=0xEECAE1760A59 -c lan1
lan1 Interface State = DOWN
Probable Cause for State = Cable disconnect
don't you just miss linkloop ??, it still works but better get used to the new stuff.
ok, so we can check and everything is still working fine...
root@vmcluste:/> bdf | grep data
/dev/vgpkg2/lvdata1
65536 2149 59433 3% /data1pkg2
root@vmcluste:/> ps -ef | grep -i ping
root 11220 1 0 14:40:22 ? 0:02 ping 11.0.0.22
root@vmcluste:/>
once the mantainance has finished we get the network going again:
# hpvmnet -b -S localnet
# hpvmnet
Name Number State Mode NamePPA MAC Address IPv4 Address
======== ====== ======= ========= ======== ============== ===============
localnet 1 Up Shared N/A N/A
vmnetpro 3 Up Shared lan0 0x00156004e156 11.0.0.22
And start the re-attach the cluster, just running cmruncl:
root@vmcluste:/etc/cmcluster> cmruncl
cmruncl: Validating network configuration...
cmruncl: Network validation complete
Re-attaching package pkg1.
Re-attaching package pkg2.
Waiting for cluster to form ..... done
Successfully re-attached package pkg1.
Successfully re-attached package pkg2.
Cluster successfully formed.
Check the syslog files on all nodes in the cluster to verify that no warnings occurred during startup.
root@vmcluste:/etc/cmcluster> cmviewcl
CLUSTER STATUS
vmcluster up
NODE STATUS STATE
vmcluster1 up running
PACKAGE STATUS STATE AUTO_RUN NODE
pkg2 up running enabled vmcluster1
NODE STATUS STATE
vmcluster2 up running
PACKAGE STATUS STATE AUTO_RUN NODE
pkg1 up running enabled vmcluster2
root@vmcluste:/etc/cmcluster>
to change the heartbeat network configuration without stopping what i'm going to do is send the heartbeat down the 11.0.0.0 production network, while I change the config of the 192.168.99.0 HB network I have now.
I edit the cmcluster.ascii
......................
NODE_NAME vmcluster1
NETWORK_INTERFACE lan0
HEATBEAT_IP 11.0.0.51 -----> change it from stationary to heartbeat
# NETWORK_INTERFACE lan1 -----> remove our HB lan from the cluster
# HEARTBEAT_IP 192.168.99.1
.................
NODE_NAME vmcluster2
NETWORK_INTERFACE lan0
HEARTBEAT_IP 11.0.0.52 -------> change it from stationary to heartbeat
# NETWORK_INTERFACE lan1
# HEARTBEAT_IP 192.168.99.2 -----> remove our HB lan from the cluster
..................
#SUBNET 192.168.99.0 ---------------> remove subnet
# IP_MONITOR OFF
We apply the config it complains a bit, because we are not meeting the minimun required net configuration:
root@vmcluste:/etc/cmcluster> cmapplyconf -v -k -C cluster.ascii
Begin cluster verification...
...........
SUBNET 192.168.99.0 is being removed while cluster is running.
NETWORK_INTERFACE lan1 is being deleted from node vmcluster1 while cluster is running.
NETWORK_INTERFACE lan1 is being deleted from node vmcluster2 while cluster is running.
Minimum network configuration requirements for the cluster have
not been met. Minimum network configuration requirements are:
- 2 or more heartbeat networks OR
- 1 heartbeat network with local switch (HP-UX Only) OR
- 1 heartbeat network using APA with 2 trunk members (HP-UX Only) OR
- 1 heartbeat network using bonding (mode 1) with 2 slaves (Linux Only)
Maximum configured packages parameter is 300.
Configuring 2 package(s).
Modifying configuration on node vmcluster1
Modifying configuration on node vmcluster2
Modifying the cluster configuration for cluster vmcluster
..................
Check on the HB config:
root@vmcluste:/etc/cmcluster> cmviewcl -v -f line | grep heartbeat
node:vmcluster1|interface:lan0|ip_address:11.0.0.51|heartbeat=true
node:vmcluster2|interface:lan0|ip_address:11.0.0.52|heartbeat=true
.............................
NODE_NAME vmcluster1
NETWORK_INTERFACE lan0
HEARTBEAT_IP 11.0.0.51
NETWORK_INTERFACE lan1
HEARTBEAT_IP 192.168.100.1 ----------------------> add new vlan subnet HB network
...........................
NODE_NAME vmcluster2
NETWORK_INTERFACE lan0
HEARTBEAT_IP 11.0.0.52
NETWORK_INTERFACE lan1
HEARTBEAT_IP 192.168.100.2 ----------------------> add new vlan subnet HB network
..........................
SUBNET 192.168.100.0 ---------------> add new subnet
IP_MONITOR OFF
Cmapplyconf and check our HB:
SUBNET 192.168.100.0 is being added while cluster is running.
NETWORK_INTERFACE lan1 is being added to node vmcluster1 while cluster is running.
NETWORK_INTERFACE lan1 is being added to node vmcluster2 while cluster is running.
root@vmcluste:/etc/cmcluster> cmviewcl -v -f line | grep heartbeat
node:vmcluster1|interface:lan0|ip_address:11.0.0.51|heartbeat=true
node:vmcluster1|interface:lan1|ip_address:192.168.100.1|heartbeat=true
node:vmcluster2|interface:lan0|ip_address:11.0.0.52|heartbeat=true
node:vmcluster2|interface:lan1|ip_address:192.168.100.2|heartbeat=true
Ok, we have had a loss of our only HB network for and hour, we have changed the subnet of our HB network without stopping the cluster nor the applications we have running in the cluster. You can do some nice things with the new versions of service guard, It's a pity that 95% of the clusters I see at work are still 11.14/15/16 on 11.11 ...

No comments:

Post a Comment