Friday 27 June 2014

Before using PowerPath, we wanted a way to check/fix primary fibre (even fiber) storage paths that were active across our interswitch link.

We can go in to vCenter and click on poperties for a datastore (or lun if it's an RDM) and change the active path:

But how do you do this automatically for hundreds of luns, or just check to see if they're wrong.  NetApp has a command to run on the filer to tell you if traffic is going down the wrong path, but these ESX commands will tell you if they're set wrong, even if no traffic is going down them (like on servers in a cluter that aren't running the VMs).

It's not simple, but a few key commands will get the info needed:

list all your luns by NAAid:
naa_list=`esxcfg-mpath -l | grep "naa\." | grep "Device: " | cut -d: -f2 | sort -u`

loop through all of them to check the primary path currently assigned, and whether it matches what it should be.

for naa_identifier in `echo $naa_list` ; do
   echo $naa_identifier
   echo do other commands here
done



Find the RunTime Name (VHBA-Controller-Target-LUNID) which is the active path:

# esxcli nmp fixed getpreferred --device  naa.60a98000375334364a2b42436c754239

or

# esxcli nmp fixed getpreferred --device  ${naa_identifier}

which gives:
vmhba1:C0:T3:L200

Is that on the correct initiator (SAN HBA)?

# esxcfg-mpath -L -P  vmhba1:C0:T3:L200
vmhba1:C0:T3:L200 state:active naa.60a98000375334364a2b42436c754239 vmhba1 0 3 200 NMP active san fc.20000024ff03607f:21000024ff03607f fc.500a09808d310651:500a09819d310651


See in bold above, that's the unique part of our NAAid which shows which filer owns the storage.

If it's right, you've confirmed so, if it's wrong, you need to get the correct RunTime Name and change the active path to match it.  You'll need to find a few other things first:

lun_ID=`esxcli nmp fixed getpreferred --device ${naa_identifier}  | cut -d: -f4 | tr -d "L" `
naa_identifier (see above)
correct_pripathwwn (this is one thing you have to figure out and set as variables in your script based on your unique identifiers reported by ESX for your SAN HBAs)

esxcfg-mpath -m | grep -i ${naa_identifier}

vmhba2:C0:T3:L200 vmhba2 fc.2000001b329cdc1d:2100001b329cdc1d fc.500a09808d310651:500a09829d310651 naa.60a98000375334364a2b42436c754239
vmhba1:C0:T3:L200 vmhba1 fc.20000024ff03607f:21000024ff03607f fc.500a09808d310651:500a09819d310651 naa.60a98000375334364a2b42436c754239
vmhba2:C0:T2:L200 vmhba2 fc.2000001b329cdc1d:2100001b329cdc1d fc.500a09808d310651:500a09818d310651 naa.60a98000375334364a2b42436c754239
vmhba1:C0:T2:L200 vmhba1 fc.20000024ff03607f:21000024ff03607f fc.500a09808d310651:500a09828d310651 naa.60a98000375334364a2b42436c754239
[root@VHB24 scripts]#


see how similar this info above is to the screen grab of what vCenter shows?  it's all four paths for our lun, with the full RunTime Name (VHBA-Controller-Target-LUNID) for eath of the paths to the lun.

With all that, you can get the bit needed to change the active path with this command:
correct_pripath=`[root@VHB24 scripts]# esxcfg-mpath -m | grep -i naa.60a98000375334364a2b42436c754239 | grep -i vmhba[1-2]:C0:T[0-9]:L${lun_ID} | grep ${correct_pripathwwn}`

so you can fix the active path with:
#esxcli nmp fixed setpreferred  --device ${naa_identifier} --path  ${correct_pripath}

We do have to run this on each of our hosts, but if running the script takes 20 minutes per host, imagine how long it would take to manually fix these paths on once for each of our 200 luns on our dozen hosts! 

Anyone want to put this in a powershell script?  I'm a bit embarrassed about putting my final script here as it's messy and not optimised beautifully.

The RDM part of my script looks a little like this:



-KC