SYSADMIN: Service Guard

Showing posts with label Service Guard. Show all posts

Thursday, 24 November 2011

Add a Lun to a VG in a SG cluster 11.11/23

Once disk presented to host,
# pvcreate /dev/rdsk/cxtxdx
# vgextend /dev/vgname /dev/dsk/cxtxdx
# lvcreate -L /dev/vgname/lvname
# newfs -F vxfs -o largefiles /dev/vgname/rlvname
# cp -p /etc/cmcluster/pkgname/pkg.cntl /etc/cmcluster/pkgname/pkg.cntl.org
# vi /etc/cmcluster/pkgname/pkg.cntl
LV[33]=/dev/vgname/lvname; FS[33]=/oracle/ABN/sapdata20; FSMOUNT_OPT[33]="-o convosync=direct,mincache=direct,delaylog,nodatainlog"
(Add newly create Logical volume to pkg.cnt.file)

# vgexport -p -s -v -m /tmp/vgname.map /dev/vgname
# rcp /tmp/vgname.map nodee2:/tmp/vgname.map
# cp -p /etc/cmcluster/pkgname/pkg.cntl /etc/cmcluster/pkgname/pkg.cntl.node2
# rcp /etc/cmcluster/pkgname/pkg.cntl.node2 nodee2:/tmp/
(mount the mount point manually on the node where the vg is active now)

# mount -o convosync=direct,mincache=direct,delaylog,nodatainlog /dev/vgname/lvolname /oracle/ABN/sapdata20

On Adoptive Node :-
# vgexport /dev/vgname
# mkdir /dev/vgname
# ll /dev/vg*/group
# mknode /dev/vgname/group c 64 0x0X0000
# vgimport -p -s -v -m /tmp/vgname.map /dev/vgname -> (preview) if no error proceed below by removing -p
# vgimport -s -v -m /tmp/vgname.map /dev/vgname
# cp /tmp/pkg.cntl.node2 /etc/cmcluster/pkgname
# cp -p /etc/cmcluster/pkgname/pkg.cntl /etc/cmcluster/pkgname/pkg.cntl.org
# mv /etc/cmcluster/pkgname/pkg.cntl.node2 /etc/cmcluster/pkgname/pkg.cntl

Change IP Addresses for service guard cluster

Before you change the IP Address of your Server you must have all new ips ex. Server ip and package ip then you may go for this changes.

Take backup of these directory and files

1./etc/cmcluster
2./etc/hosts
3./etc/rc.config.d/netconf

modify these files

1./etc/hosts # modify ipaddress

2./etc/rc.config.d/netconf # modify ipaddress and subnet

3./etc/cmcluster/cluster.conf # modify HEARTBEAT_IP

4./etc/cmcluster/packge/cipackage.conf # change SUBNET XX.XX.XX.XX

5./etc/cmcluster/packge/dbpackage.conf # change SUBNET XX.XX.XX.XX

6./etc/cmcluster/packge/cipackage.cntl # change IP[0]=XX.XX.XX.XX
change SUBNET[0]=XX.XX.XX.XX

7./etc/cmcluster/packge/cipackage.cntl # change IP[0]=XX.XX.XX.XX
change SUBNET[0]=XX.XX.XX.XX

8.rcp cluster.conf into /etc/cmcluster
rcp cluster.conf into other node same location

9.rcp these conf,cntl and config file into /etc/cmcluster/packge/
rcp ciVRP.conf,dbVRP.conf,ciVRP.cntl,dbVRP.cntl

10. restart the net for new ip
/sbin/init.d/net stop
/sbin/inti.d/net start

check all ipaddress which we have changed and check with linkloop also

11.cmcheckconf -v -C /etc/cmcluster/cluster.conf -P cipackage.conf -P dbpackage.conf
it should come with no error then go for the next step

12.cmapplyconf -v -C /etc/cmcluster/cluster.conf -P cipackage.conf -P dbpackage.conf

Scenario Questions : in HP UX

Senario1: The disk/volume groups that are going to be shared between nodes in a cluster neccessitate a different series of standard config files that normally deal with and manage disk/volumes/filesystems.
Which standard config files are affected and why?

Answer :

A. /etc/lvmrc - this startup script needs to be modified to NOT activate all volume groups at startup time

B. /etc/fstab - filesystems that will be shared between nodes must NOT be listed in the fstab file.

Scenario 2: The Primary lan card fails on one of the nodes in the cluster. HP replaces the card, and it has maintained its instance number and associated device files. The ip address remains the same. Will the node be able to rejoin the cluster with a simple cmrunnode command?

If not - Why? and what commands must you run or changes do you need to make before it can join the cluster?

Answer:
A. Servicguard maintains the MAC address of all configured LAN cards in the cluster binary file.

B. You must re-run a cmapplyconf using the existing cluster ascii file.

which Daemons controls MC service Guard ??

Totally 8 daemons are controlling the service guard config

* /usr/lbin/cmclconfd :ServiceGuard Configuration Daemon
* /usr/lbin/cmcld :ServiceGuard Cluster Daemon
* /usr/lbin/cmlogd :ServiceGuard Syslog Log Daemon
* /usr/lbin/cmlvmd :Cluster Logical Volume Manager Daemon
* /usr/lbin/cmomd :Cluster Object Manager Daemon - logs to /var/opt/cmom/cmomd.log
* /usr/lbin/cmsnmpd :Cluster SNMP subagent (optionally running)
* /usr/lbin/cmsrvassistd :ServiceGuard Service Assistant Daemon
* /usr/lbin/cmtaped :ServiceGuard Shared Tape Daemon

Each of these daemons logs to the /var/adm/syslog/syslog.log file

cmclconfd - gathers cluster info ie network and vol grp info started in /etc/inetd.conf
cmcld - determines cluster membership. Package Mgr, Cluster Mgr, and Network Mgr run as parts of cmcld.
cmlogd - used by cmcld to write syslog messages.
cmlvmd - keeps track of Volume group info.
cmomd - provides info to client about the cluster. /etc/inetd.conf.
cmsnmpd - produces MIB for snmp
cmsrvassitd - fork and exec scripts for the cluster.
cmtaped- keeps track of shard tape devices.

Wednesday, 23 November 2011

LOCK disk Initialization in MC service Guard

To find a LOCK disk /LOCK VG in cluster:

#grep LOCK /etc/cmcluster/cmclconfig.ascii

FIRST_CLUSTER_LOCK_VG /dev/vgcllock
FIRST_CLUSTER_LOCK_PV /dev/dsk/c2t0d1

(OR)

Run the below command

# cmviewconf | grep –e "Node name" –e lock

flags: 12 (single cluster lock)
first lock vg name: /dev/vglock
second lock vg name: (not configured)
Node name: node1
first lock pv name: /dev/dsk/c0t4d4
first lock disk interface type: c720
Node name: node2
first lock pv name: /dev/dsk/c0t5d4
first lock disk interface type: c720

====================================================

IF you dont have the vgcfgbackup then how to initialize the lock disk again in cluster
(or)

How to reinitialize the cluster lock disk(s) using cmapplyconf:

• Halt the entire cluster.
# cmhaltcl –f

• Perform the following command from all nodes in the cluster to remove the cluster
flag from cluster lock VG(s).
# vgchange –c n

• Activate cluster lock VG(s) on one node only:
# vgchange –a y

• Perform cmapplyconf on the node where you activated the cluster lock VG(s). The
cluster flag is added back to the VG automatically.
# cmapplyconf –C

• Perform vgcfgbackup to backup the cluster lock information:
# vgcfgbackup

• Deactivate cluster lock VG(s):
# vgchange –a n

• Run vgcfgbackup on all other cluster nodes also:
# vgchange –a r
# vgcfgbackup
# vgchange –a n

• Restart the cluster:
# cmruncl

New Service Guard 11.20 Series: Halting a Node or the Cluster while Keeping Packages Running(LAD)

New Service Guard 11.20 Series: Halting a Node or the Cluster while Keeping Packages Running(LAD)

Halting a Node or the Cluster while Keeping Packages Running (Live Application Detach)
There may be circumstances in which you want to do maintenance that involves halting a node, or the entire cluster, without halting or failing over the affected packages. Such maintenance might consist of anything short of rebooting the node or nodes, but a likely case is networking changes that will disrupt the heartbeat. New command options in Serviceguard A.11.20 (collectively known as Live Application Detach (LAD)) allow you to do this kind of maintenance while keeping the packages running. The packages are no longer monitored by Serviceguard, but the applications continue to run. Packages in this state are called detached packages. When you have done the necessary maintenance, you can restart the node or cluster, and normal monitoring will resume on the packages.
There are loads of cases in which you can't use LAD, in fact there are so many that I can't list them here, I would recommend taking a look at the list in chapter.7 of the service guard Doc.
Ok, so in our example, we have a cluster running with only one heartbeat network configured(not meeting minimum requi
rments , but is just a test), because of changes in our network infrastructure, we are going to loose communication in the heartbeat network during and hour, after that we have to change our heartbeat config to another subnet, we are going to use LAD so one node in the cluster doesn't panic when it looses all it's heartbeat communications.
root@vmcluste:/etc/cmcluster> cmviewcl
CLUSTER STATUS
vmcluster up
NODE STATUS STATE
vmcluster1 up running
PACKAGE STATUS STATE AUTO_RUN NODE
pkg2 up running enabled vmcluster1
NODE STATUS STATE
vmcluster2 up running
PACKAGE STATUS STATE AUTO_RUN NODE
pkg1 up running enabled vmcluster2
Our heartbeat config
root@vmcluste:/etc/cmcluster> cmviewcl -v -f line | grep heartbeat
node:vmcluster1|interface:lan1|ip_address:192.168.99.1|heartbeat=true
node:vmcluster2|interface:lan1|ip_address:192.168.99.2|heartbeat=true
We need to change it to the 172.27.1.0 network, after the network guys get link back in the new vlan they created for the heartbeat.
So first of all we shutdown the cluster detaching the packages, our cluster will be in down state and our packages in detached state, in the detached state the packages keep running they just aren't monitored by SG.
root@vmcluste:/etc/cmcluster> cmhaltcl -d
Detaching package pkg1.
Detaching package pkg2.
Disabling all packages from starting on nodes to be halted.
Warning: Do not modify or enable packages until the halt operation is completed.
This operation may take some time.
Waiting for nodes to halt ..... done
Successfully detached package pkg1.
Successfully detached package pkg2.
Successfully halted all nodes specified.
Halt operation complete.
root@vmcluste:/etc/cmcluster> cmviewcl
CLUSTER STATUS
vmcluster down
NODE STATUS STATE
vmcluster1 down unknown
PACKAGE STATUS STATE AUTO_RUN NODE
pkg2 detached detached enabled vmcluster1
NODE STATUS STATE
vmcluster2 down unknown
PACKAGE STATUS STATE AUTO_RUN NODE
pkg1 detached detached enabled vmcluster2
If we check we still have the package virtual ips running and the package fs mounted, and our super app running(in this case a ping..)
root@vmcluste:/> bdf | grep data
/dev/vgpkg2/lvdata1
65536 2149 59433 3% /data1pkg2
root@vmcluste:/> netstat -ni
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
lan1 1500 192.168.99.0 192.168.99.1 365950 0 368612 0 0
lan0 1500 11.0.0.0 11.0.0.51 727096 0 567120 0 0
lo0 32808 127.0.0.0 127.0.0.1 13633 0 13633 0 0
lan0:1 1500 192.168.0.0 192.168.0.97 614 0 1205 0 0
lan0:2 1500 11.0.0.0 11.0.0.54 0 0 0 0 0
root@vmcluste:/> ps -ef | grep -i ping
root 11220 1 0 14:40:22 ? 0:02 ping 11.0.0.22
ok, so now we can tell our network people they can start working, and kill comms in the HB network....
we are going to do it ourselfs in this case xP.
# hpvmnet -S localnet -h
hpvmnet: Halt the vswitch 'localnet'? [n/y]: y
# hpvmnet
Name Number State Mode NamePPA MAC Address IPv4 Address
======== ====== ======= ========= ======== ============== ===============
localnet 1 Down Shared N/A N/A
vmnetpro 3 Up Shared lan0 0x00156004e156 11.0.0.22
#
we can now check on our servers that we have no heartbeat network working:
root@vmcluste:/etc/cmcluster> netstat -ni | grep lan1
lan1* 1500 192.168.99.0 192.168.99.2 464719 0 462011 0 0
root@vmcluste:/etc/cmcluster> lanscan | grep lan1
0/0/2/0 0xEECAE1760A59 1 UP lan1 snap1 2 ETHER Yes 119
root@vmcluste:/etc/cmcluster> nwmgr --diag link -A dest=0xEECAE1760A59 -c lan1
lan1 Interface State = DOWN
Probable Cause for State = Cable disconnect
don't you just miss linkloop ??, it still works but better get used to the new stuff.
ok, so we can check and everything is still working fine...
root@vmcluste:/> bdf | grep data
/dev/vgpkg2/lvdata1
65536 2149 59433 3% /data1pkg2
root@vmcluste:/> ps -ef | grep -i ping
root 11220 1 0 14:40:22 ? 0:02 ping 11.0.0.22
root@vmcluste:/>
once the mantainance has finished we get the network going again:
# hpvmnet -b -S localnet
# hpvmnet
Name Number State Mode NamePPA MAC Address IPv4 Address
======== ====== ======= ========= ======== ============== ===============
localnet 1 Up Shared N/A N/A
vmnetpro 3 Up Shared lan0 0x00156004e156 11.0.0.22
And start the re-attach the cluster, just running cmruncl:
root@vmcluste:/etc/cmcluster> cmruncl
cmruncl: Validating network configuration...
cmruncl: Network validation complete
Re-attaching package pkg1.
Re-attaching package pkg2.
Waiting for cluster to form ..... done
Successfully re-attached package pkg1.
Successfully re-attached package pkg2.
Cluster successfully formed.
Check the syslog files on all nodes in the cluster to verify that no warnings occurred during startup.
root@vmcluste:/etc/cmcluster> cmviewcl
CLUSTER STATUS
vmcluster up
NODE STATUS STATE
vmcluster1 up running
PACKAGE STATUS STATE AUTO_RUN NODE
pkg2 up running enabled vmcluster1
NODE STATUS STATE
vmcluster2 up running
PACKAGE STATUS STATE AUTO_RUN NODE
pkg1 up running enabled vmcluster2
root@vmcluste:/etc/cmcluster>
to change the heartbeat network configuration without stopping what i'm going to do is send the heartbeat down the 11.0.0.0 production network, while I change the config of the 192.168.99.0 HB network I have now.
I edit the cmcluster.ascii
......................
NODE_NAME vmcluster1
NETWORK_INTERFACE lan0
HEATBEAT_IP 11.0.0.51 -----> change it from stationary to heartbeat
# NETWORK_INTERFACE lan1 -----> remove our HB lan from the cluster
# HEARTBEAT_IP 192.168.99.1
.................
NODE_NAME vmcluster2
NETWORK_INTERFACE lan0
HEARTBEAT_IP 11.0.0.52 -------> change it from stationary to heartbeat
# NETWORK_INTERFACE lan1
# HEARTBEAT_IP 192.168.99.2 -----> remove our HB lan from the cluster
..................
#SUBNET 192.168.99.0 ---------------> remove subnet
# IP_MONITOR OFF
We apply the config it complains a bit, because we are not meeting the minimun required net configuration:
root@vmcluste:/etc/cmcluster> cmapplyconf -v -k -C cluster.ascii
Begin cluster verification...
...........
SUBNET 192.168.99.0 is being removed while cluster is running.
NETWORK_INTERFACE lan1 is being deleted from node vmcluster1 while cluster is running.
NETWORK_INTERFACE lan1 is being deleted from node vmcluster2 while cluster is running.
Minimum network configuration requirements for the cluster have
not been met. Minimum network configuration requirements are:
- 2 or more heartbeat networks OR
- 1 heartbeat network with local switch (HP-UX Only) OR
- 1 heartbeat network using APA with 2 trunk members (HP-UX Only) OR
- 1 heartbeat network using bonding (mode 1) with 2 slaves (Linux Only)
Maximum configured packages parameter is 300.
Configuring 2 package(s).
Modifying configuration on node vmcluster1
Modifying configuration on node vmcluster2
Modifying the cluster configuration for cluster vmcluster
..................
Check on the HB config:
root@vmcluste:/etc/cmcluster> cmviewcl -v -f line | grep heartbeat
node:vmcluster1|interface:lan0|ip_address:11.0.0.51|heartbeat=true
node:vmcluster2|interface:lan0|ip_address:11.0.0.52|heartbeat=true
.............................
NODE_NAME vmcluster1
NETWORK_INTERFACE lan0
HEARTBEAT_IP 11.0.0.51
NETWORK_INTERFACE lan1
HEARTBEAT_IP 192.168.100.1 ----------------------> add new vlan subnet HB network
...........................
NODE_NAME vmcluster2
NETWORK_INTERFACE lan0
HEARTBEAT_IP 11.0.0.52
NETWORK_INTERFACE lan1
HEARTBEAT_IP 192.168.100.2 ----------------------> add new vlan subnet HB network
..........................
SUBNET 192.168.100.0 ---------------> add new subnet
IP_MONITOR OFF
Cmapplyconf and check our HB:
SUBNET 192.168.100.0 is being added while cluster is running.
NETWORK_INTERFACE lan1 is being added to node vmcluster1 while cluster is running.
NETWORK_INTERFACE lan1 is being added to node vmcluster2 while cluster is running.
root@vmcluste:/etc/cmcluster> cmviewcl -v -f line | grep heartbeat
node:vmcluster1|interface:lan0|ip_address:11.0.0.51|heartbeat=true
node:vmcluster1|interface:lan1|ip_address:192.168.100.1|heartbeat=true
node:vmcluster2|interface:lan0|ip_address:11.0.0.52|heartbeat=true
node:vmcluster2|interface:lan1|ip_address:192.168.100.2|heartbeat=true
Ok, we have had a loss of our only HB network for and hour, we have changed the subnet of our HB network without stopping the cluster nor the applications we have running in the cluster. You can do some nice things with the new versions of service guard, It's a pity that 95% of the clusters I see at work are still 11.14/15/16 on 11.11 ...

Disable service guard in HP UX : How to ??

If for some reason you want to disable Serviceguard on a system, you can do so by commenting out the following entries in /etc/inetd.conf:

#vi /etc/inetd.conf

hacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -p
hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c

Then save and exit. And then run the below command

#/usr/sbin/inetd -c #force inetd to re-read inetd.conf

Then diable the auto start for cluster service also

#Vi /etc/rc.config.d/cmcluster

AUTOSTART_CMCLD = 0

#cmquerycl -n #If the command fails, you have successfully disabled SG

Cluster node configuration methods

These are the available clustering node configurations

1. Active/Active – Traffic is directed to another existing node or balanced across all other nodes

2. Active/Passive – A complete new node is served when primary node fails

3. N+1 – Serving a complete new node in the place of primary node and the new node in the cluster must be capable of handling any services which the primary node handled with/without any special additional software.

4. N+M – Here more than one dedicated nodes are served for handling failovers.
This may require high cost and maintenance.

5. N-to-1 – Here the backup node is serving temporarily until the primary node is brought back online. Then the running services are transferred to primary node.

6. N-to-N — A combination of Active/Active and N+M clusters, N to N clusters.

Tuesday, 22 November 2011

Enable / Disable Node switching in service guard cluster: How to ??

Consider its a 2 node cluster. If package got failed in nodeA then it should start on nodeB automatically means we need to enable node switching on both the node and we should enable AUTO_RUN option also.
(package name : prodpkg , server name : nodeA and nodeB )
root# cmviewcl -v -p prodpkg

PACKAGE STATUS STATE AUTO_RUN NODE
prodpkg halt down disabled nodeA

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up disabled NODEA
Alternate up disabled NODEB

Note : Before enable node switching on both the node we should not enable AUTO_RUN.

#cmmodpkg -v -n nodeA -e prodpkg

#cmmodpkg -v -n nodeB -e prodpkg

Now on both the node package switching has enabled ..now u can start the package on primary node and then start the AUTO_RUN.

root# cmviewcl -v -p prodpkg

PACKAGE STATUS STATE AUTO_RUN NODE
prodpkg halt down disabled nodeA

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled NODEA
Alternate up enabled NODEB

Now we can start the package on nodeA and then we can enable the AUTO_RUN.

#cmrunpkg -v -n nodeA prodpkg

#cmmodpkg -e prodpkg

root# cmviewcl -v -p prodpkg

PACKAGE STATUS STATE AUTO_RUN NODE
prodpkg halt down enabled nodeA

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled NODEA
Alternate up enabled NODEB

To disable the AUTO_RUN run the below command

#cmmodkg -d prodpkg

To disable package switching on one node

#cmmodpkg -v -n nodeB -d prodpkg

Adding new node to cluster - Steps for interview point

Steps for adding new node on running cluster

1. Set up on all nodes, the file /etc/cmcluster/cmclnodelist to include the new node name.

2.Get the most up-to-date ASCII configuration file (cmgetconf).

3.Query all nodes, including the new node, in the cluster (cmquerycl).

4.Compare the ASCII files obtained from cmgetconf and cmquerycl.

5.Update the ASCII configuration file obtained from cmquerycl.

6.Check the new ASCII configuration file (cmcheckconf).

7.Compile and distribute the new binary cluster configuration file (cmapplyconf).

8.Start cluster services on the new node (cmrunnode).

9.Check for any problems with cmviewcl and the logfile /var/adm/syslog/syslog.log.

Commands

a.#cmgetconf -v -c clustername /etc/cmcluster/cluster.ascii

b.Edit the /etc/cmcluster/cluster.ascii file and add the new node info

c.#cmquerycl -v -c cluster_name -n node1 -n node2 cluster.ascii

d.cmcheckconf -v -C cluster.ascii

e.cmcheckconf -v -C cluster.ascii

f.cmrunnode -v node2 (new node)

Configure stand by LAN for node on running Cluster : How to ??

consider cluster is having 2 nodes - nodeA and nodeB. and i added stand by lan for nodeB.

>>Login to the server nodeA

STEP 1 : remove the node from all package configuration
#cmviewcl -v

#cmgetconf -v -p pkgname pkngname.conf_new

Then remove the below line from the conf file

#vi pkgname.conf_new

NODE_NAME nodeB

save and exit

# cmcheckconf -v -P pkngname.conf_new

# cmapplyconf -v -P pkngname.conf_new

#cmviewcl -v -p pkngname.conf_new -----> verify the node is removed from pkg config

=================================================

STEP 2: Halt the node nodeB and remove the node from cluster

#cmviewcl -v

#cmhaltnode -v nodeB

#cmgetconf -v -c clustername cluster.ascii

Edit the cluster.ascii file and remove the node - nodeB from that file

#NODE_NAME nodeB
NETWORK_INTERFACE lan0
HEARTBEAT_IP 10.161.1.58
NETWORK_INTERFACE lan2
HEARTBEAT_IP 10.160.7.72

save and exit

#cmcheckconf -v -C cluster.ascii

#cmapplyconf -v -C cluster.ascii

#cmviewcl -v

==================================================================================================

STPE 3 : Add the node nodeB to cluster with standby lan.

#cmgetconf -v -c cluster cluster.ascii

#edit the cluster.ascii and then add node nodeB with standby lan

NODE_NAME nodeB
NETWORK_INTERFACE lan0
HEARTBEAT_IP 10.161.1.58
NETWORK_INTERFACE lan2
HEARTBEAT_IP 10.160.7.72
NETWORK_INTERFACE lan3

save and exit.

#cmcheckconf -v -C cluster.ascii

#cmapplyconf -v -C cluster.ascii

Then start the cluster on nodeB

#cmrunnode -v nodeB

#cmviewcl -v

==================================================================================================

STPE 4 : Need to add the node to all the package where this node already configured.

#cmgetconf -v -p pkgname pkgname.conf_new

Then add the below line from the conf file

#vi pkgname.conf_new

NODE_NAME nodeb

# cmcheckconf -v -P pkgname.conf_new

# cmapplyconf -v -P pkgname.conf

#cmviewcl -v -p pkgname -----> verify the node is removed from pkg config