Wednesday 12 December 2012

ovftools, VMware Converter & downgrade ESX5 OVF to ESX4

Logged a call with VMware and they came back saying we couldn't downgrade an OVF of ESX5 to run on out ESXi4 vSphere environment.  Of course I didn't get their response until the next day.  And by then I had it working.  I'm sure they'd say this solution isn't supported, but it works for me.  As it's making copies of the original OVF files, I can't see any harm in giving it a try:

1.  Download & install ovftool then run:

#ovftool virtual-appliance.ovf virtual-appliance.vmx  (to inflate the vmx, vmdk, etc.)



2.  Download and install VMware Converter Standalone (I grabbed the latest version as of this post, which is 5.0.1 build-875114).

(choose the advanced install option)



That's it.  This version of the VM Hardware is 7, which is ideal for our ESX4 environment.  Job Done.

In the process of researching this, I saw how powerful ovftool is, but couldn't get it working with these OVA files as some folks.

Wednesday 20 June 2012

Another BIOS setting for IBM X Series ESX Host

I keep getting these errors:


0x806F050CErrorMemory device X (DIMM XStatus) correctable ECC memory error logging limit reached [Note X = 1-12]


The suggestions above aren't all helpful as it takes a long time for these errors to occur, so moving the memory to another slot to confirm whether the problem is with the DIMM or slot is impractical.

A colleague helpfully remembered a problem on HP hosts that sounded similar. He got me looking and I found this BIOS/IMM setting:

Changing "Normal" mode to "Performance" mode affects the way that the DIMMS are refreshed.  This results in a DIMM temperature message occurring at a 10 degree lower temperature.

This article is not about my X3650, but IBM has verbally confirmed it applies to my server:


Change Thermal Mode setting (preferred method)
  1. Boot the blade into the F1 "System Configuration and Boot Management" screen. Highlight "System Settings." Press Enter and select Memory. Select Thermal Mode and change the setting to "Performance."
  2. Press the Esc key twice to get to "System Configuration and Boot Management" and then selectSave Settings and Exit Setup.
  3. Follow the instructions on the next screen to exit the "Setup Utility."
  4. Power the blade off for the changes to take effect and restart.
Changing "Normal" mode to "Performance" mode affects the way that the Dual In-Line Memory Modules (DIMMs) are refreshed. This results in a DIMM temperature warning message occurring at a 10 degree lower temperature. This causes no impact in most industry standard data centers.


Again, I don't have a blade but I seemed to have guessed correctly that they run the same code on the X Series.  Odd I haven't found much about this online.  It should be in a best practices document for IBM servers, maybe even a vSphere document.  Props to "VTSUkanov" for finding and posting about this on the VMware forums.


OSSV Pre-exec (and Post-exec scripts)

NetApp Management Console:
    Protection:
     Overview (select the Policy like OSSV that you copied from a template)
   Edit
     Nodes and Connections
       Primary Data tab
section:

Backup Script

entries:
Path: c:\temp\ossv_vl112_test.bat
Run As: (left blank)

Oddly it runs this script twice, once before and once after.  Silly, they should have, like every other proper backup software, a field for "pre" and a field for "post".  Very unprofessional of NetApp to not document this better, me thinks.  Or Does it run four times?

My script echoed that the variable DP_BACKUP_STATUS is set to four different things, each of the four times the script gets ran from my DFM OSSV backup job:


DP_BACKUP_STATUS=DP_BEFORE_TRANSFERS
DP_BACKUP_STATUS=DP_AFTER_TRANSFERS
DP_BACKUP_STATUS=DP_AFTER_BACKUP_REGISTRATION
DP_BACKUP_STATUS=DP_BEFORE_PRIMARY_SNAPSHOTS

Maybe it's different when it's scheduled verses ran with "Protect Now"

New version of DFM must have changed variables as they used to expect:


DP_BACKUP_STATUS=DP_BEFORE_SNAPSHOTS


and the post will have

DP_BACKUP_STATUS=DP_AFTER_SNAPSHOTS


Ah, thanks to Marlon on the NetApp forums

c:\DFM_scripts\ssh_ossv_hostname_pre.sh (runs the ssh to quiesce database)

c:\DFM_scripts\

Still, I don't understand what this bit from the  OSSV FAQ:  is on about though:

Q: Does the pre/post scripting capability in DFM work with OSSV?
A: Yes, you can use the DFM pre/post script ability to run commands on the host prior
to, or following an OSSV transfer.  The scripts are installed on the DFM server using
a “zip” file.  The “zip” file must contain the script (in PERL), and a XML File named
package.xml.  The package.xml file must include packaging information (version, file
name…) and the privileges needed to run the script.  Once the “zip” file has been
created, it can be imported into DFM and ran either manually or via a schedule set in
DFM.

Limitations, Limitations, Limitations:

There's only one pre-exec, post-exec script field for each backup job, even though it makes sense to backup about 10 OSSV clients per job.  Plus the backup job only runs a script on the DFM server, and obviously the OSSV clients need their databases quiesced.  That means you need to setup ssh and an ssh-key relationship between the DFM server and the OSSV client and get the DFM script launch a script remotely on the OSSV client by way of ssh--whew!

I install Cygwin for the ssh, ssh-key and a cygwin shell script to kick off the Linux script on the OSSV client from the DFM script, by way of OSSV in DFM. Again, whew!

Another limitation is that the OSSV backup environment variables only seem to track which stage of the OSSV backup job initiated it.  Nothing I can find about the name of the backup job, the OSSV client currently being backed up, or anything else I can put in my scripts to differentiate.  What we need to avoid is the same script running on all clients, needlessly and repeatedly.

CygWin & Windows Not Playing Nicely

You might use PowerShell to avoid some of this, but does powershell do ssh?  I had loads of problems if I copied my Cygwin linux shell by using windows (drag/drop in Explorer or from DOS command) as it changes the file format line endings (carriage return/line feed).  Same with if a DOS batch file is edited by VI or copied from Cygwin shell.  Ouch!   This ate much of my time.  Don't let it get you.

Monday 18 June 2012

I'm on slide 28 of 56 of the Networker Overview eLearning course

I'm on slide 28 of 56 of the Networker Overview eLearning course.  I had to take a break to avoid going crazy, so I thought I'd post my thoughts about this format and EMC's use of it.

When you login to the account with the credits for an eCourse which has been booked, the EMC site has a link under education for you to click on to get to the course.  Weirdly, when  I clicked on that it just took me back to the main page of the site again (but it seems I was logged in with a temporary system generated username, perhaps in a virtual session of some kind).  Then there was no pointer or tip or explanation but I guessed to navigate to the same area again.  This time it started "Saba" in my browser (I only tried FireFox) and gave me access to the course and accompanying PDF for me to download.

Ok, the content is worthwhile.  They have information here that is more detailed and exhaustive than what I can find on their forums. The info is slightly more user friendly than the manuals.

But they really missed an opportunity to make this great and the envy of the IT world of training courses!  Come on EMC you're big enough, you have enough money.  You should be able to make these eCourses shine and rock!

What Sux:
The voice actors hired to read out the text of the materials obviously don't know or care anything about the material they are reading about.  They sound like robots.  It's not quite as bad as text to speech programs where you get a robot voice reading our your word with no clue to the inflection or emphasis. But there isn't much more humanity here.

It's surprisingly like a bunch of power point slides with someone reading out the text. and waiting for you to click next so they can read out the next slide, and so on.  They actually say, "This module covers the topics on this slide.  Please take the time to familiarise yourself with the topics on this slide".  Man, that's cheating. 
I


What They Should Do

They should aim to make these as good as real courses with real instructors.  They should add some personal anecdotes to help get the concept across, add some humour or at least humanity!

They should have someone draw some diagrams on a whiteboard and let us see what he's drawing while hearing him explain it.  Again, aim for the best part of real instructor led courses and see how you can come as close as possible.

They should take some useful questions from the dozens and dozens of courses that have been held already, and interject them into the course.  Questions and answers.  See it as a way to review the material or explain it in a new way.  This takes thought and brains, but it's what would set apart anyone who did courses like this from rubbish like what I'm enduring right now.

VMware has surpassed their mother company EMC with their VMWorld presentations which are available online.  You can hear the presenter, a real human, someone with experience in the product and passion about their topic. While listening to them you can see the slides they're showing.  In a way it's lower tech as I've not see any animations, but the bits I've mentioned in this paragraph far surpass the differences between a flash animation in the EMC vCourses and simple powerpoint + audio used in VMWorld presentations.

Friday 15 June 2012

Snapshots, Deduplication and QTrees

Having not been on a NetApp course yet, it makes it "fun" trying to understand how these three concepts work together:  Snapshots, Deduplicaiton and QTrees (not to mention Volumes and SnapVault)

So here's my notes and a place to write anything I might figure out.

https://kb.netapp.com/support/index?page=content&id=1010363

https://library.netapp.com/ecm/ecm_get_file/ECMM1278402 (PDF)

OSSV FAQ not as easy to find as I'd expect. It's got some good stuff like Windows System State backup (registry, AD, etc.) and excluding files/paths in OSSV Backups and:

Q: Anytime I restore even a single file, I have to perform a full baseline or
reinitialize of my primary file system?
A: No.  If you run a full D/R restore, you need to re-initialize.  If you drag/drop a file,
then it should behave reasonably. In that case, it's just as if the user
created/modified a file.  


Q: Is OSSV 2.2 able to adress backing up Operating Systems?
A: OSSV 2.2 can backup Windows 2000/2003 but not the Unix platoforms.


Q: How does OSSV actually transfer data from primary to secondary
system?
A: Data is moved via TCP/IP network using TCP port 10566. The communications
protocol is QSM (based on Qtree-SnapMirror). This is not to be confused with NDMP
protocol. NDMP is used by NDMP-based management applications (DFM) for
management and control of the SnapVault primary and secondary systems. The
NDMP TCP port is 10000.


So, how do snapshots keep straight the different hosts' data being SnapVaulted since the snapshots are at the volume level?  I different snapshot must be created for each host/qtree, but how many SnapVault relationships can exist on one volume at a time?

Just posted a question on the NetApp forums as I'm not having any luck creating OSSV relationships like the diagram above.

Wednesday 13 June 2012

Troubleshooting Checklist

We can't guess which order will get to the solution quickest, but here's a stab at some things to remember:

1).  What does the error say (I can jump to conclusions and miss the clues right in front of my eyes)
      If it says the hostfile is missing an entry, then it might be.

2)   What has changed since things were working?  Look for the culprit to be that thing that was changed 5 minutes ago or last week before trying every single link in the chain. 

3)   Which component could have the root cause?  If other network connections are fine then it's not the entire network (but it might be one port or one module, one switch or one data center that owns the problem.

4)   Draw a picture.  It's all about isolating what is not wrong, when you rule out all but one thing, the one remaining thing is the culprit!  So draw a picture to make it clear the way the components connect and to see visually.  Remember #1 above, try to not make assumptions.  That's where we usually miss the cause of the problem--it hides within the sensible and understandable but incorrect assumptions!

5)  Two heads are better than one.  It might just be that explaining and drawing the problem to a colleague, which forces you to explain it simply, think clearly will lead you to the "Aha!" moment.  Or they might see something you've missed or get lucky where you're unlucky.

This can all be very frustrating.  Take a step back, try to not get angry.


People wouldn't like you when you get angry.

Thursday 31 May 2012

Microsoft Crappy VSS

Mark Bellows on the EMC forums posted some good stuff about toubleshooting some bad backup software with this great list that happens to show how much M$ VSS sucks too:


VSS snapshot creation may fail after a LUN resynchronization on a computer that is running Windows 7 or Windows Server 2008 R2 - November 25, 2009
http://support.microsoft.com/kb/976099
Backup fails with VSS Event ID 12292 and 11 on Windows Server 2008 and Windows Server 2008 R2 - January 20, 2010
http://support.microsoft.com/kb/2009513
No VSS writers are listed when you run the “vssadmin list writers” command in Windows Server 2008 R2
http://support.microsoft.com/kb/2009550
http://support.microsoft.com/kb/2009533
Windows 2008 R2 64-bit backup failed
http://solutions.emc.com/emcsolutionview.asp?id=esg108377
System State backup using Windows Server Backup fails with error: System writer is not found in the backup
http://support.microsoft.com/kb/2009272
A VSS hardware snapshot database keeps growing with duplicated ...
(959476) - ... VSS) requestor instances with VSS hardware provider to delete snapshots in Windows Server 2008 ... Important Windows Vista and Windows Server 2008 hotfixes ...
http://support.microsoft.com/kb/959476
A snapshot may become corrupted when the Volume Shadow Copy ...
(975688) - Fixes a problem in Windows 7 and in Windows Server 2008 R2 in which a snapshot may become corrupted when the VSS snapshots providers takes more than 10 ...
http://support.microsoft.com/kb/975688
A virtual machine online backup fails in Windows Server 2008 R2 ...
This issue occurs because the Hyper-V Volume Shadow Copy Service (Hyper-V VSS ... Important Windows 7 hotfixes and Windows Server 2008 R2 hotfixes are included in ...
http://support.microsoft.com/kb/2521348
You cannot safely remove volumes after you perform a VSS backup ...
(2487341) - ... files after you perform a Volume Shadow Copy Service (VSS) backup operation in Windows Server 2008 SP2. ... Windows Vista hotfixes and Windows Server 2008 hotfixes ...
http://support.microsoft.com/kb/2487341

Also, please note that not all of these fixes come with Windows update and are used on a case by case situation.

Now, from the client side, one thing you can do is to check on the status of VSS from the command line using vssadmin.

Here are a couple of MS KB articles to get you started:

Vssadmin usage

Manage Volume Shadow Copy Service from the Vssadmin Command-Line

How to enable the Volume Shadow Copy service's debug tracing features in Microsoft Windows Server 2003 and Windows 2008

Oh - one other thing you do from both the server and the client (I would start with the client to see if it is showing any errors) is to render the daemon.raw file.

To do this, make a copy of the /nsr/logs/daemon.raw file - I typically rename the copy to be daemon_YYYYmonthnameDD.raw.

From the command line, navigate to the directory the copied daemon.raw file is in, then run:

nsr_render_log daemon_YYYYmonthnameDD.raw >daemon_YYmonthnameDD.log

ie. nsr_render_log daemon_2012Jan27.raw >daemon_2012Jan27.log

You can then open the file through windows explorer with note pad or word pad and read the contents.  It may take a little time to get to the portion you need, but know when the backup started will help.

NetApp OSSV Notes

So, Open System Snap Vault is NetApp's answer to backing up everything that you can't backup with SMVI (VMware VMs) or Snap Manager for Exchange/SQL Server/SharePoint.  Here's a table a notes about the different backup methods I'm finding out about from NetApp.  Quick overview from NetApp here.  Note, FAQ says that Linux OS backup is not supported.

It's software you install on a physical or virtual machine that does NetApp block level backup of any data (NetApp or otherwise) to a NetApp SnapVault node.

I thought it simply used NDMP protocol, but it seems it sends the data with QSM (Qtree Snap Mirror) and seems to setup the relationship or do the backup management with NDMP:


NDMP port (default value is 10000)
FILESERVER port-10555.
QSMSERVER port-10566.

Components:
  • NetApp Host Agent (install this first)
  • Host Agent Plugin (comes with Host Agent)
  • NetApp OSSV

Host Agent

Windows: agentsetup-2-7-win32.exe
Linux: ./agentsetup-2-7-linux.bin

Unpacking files needed for the installation ...
Beginning the installation ...

Starting agent.
        You may now point your browser to
            http://127.0.0.1:4092/welcome
        to configure the agent software.

This is NTAPagent service (which runs /opt/NTAPagent/ntap_agent) on linux 
or ntap_agent.exe on Windoze





Like this diagram shows, set a password for the "admin" login on the Host Agen application on your backup client.  This is unhelpfully referred to as the Mgt API Password here.










Install Host Agent Plugin

Windows runs this automatically: 
c:\Program Files\netapp\snapvault\manageability\InstallHostAgentPlugins.exe

But for Linux you need to run:
./ossv/manageability/InstallHostAgentPlugins.sh
before you can get DFM to connect to OSSV on your backup client.

OSSV
License OSSV on your SnapVault servers with these generic licenses available from NetApp Now website.

Install OSSV on your backup client
uses port 10000 for NDMP backup between client and NetApp
configure NDMP user/password and SnapVault filers (names separated by commas)

Runs on most OS'es (Windows, Linux, Solaris, etc.) but you can only create luns from Windows OSSV.

Seems to be different versions of OSSV for Windows 2003 and Windows 2008, but same for Host Agent. 

Gotcha: SnapVault relationship wouldn't create on my Linux backup client until:
1.  stop service on linux backup client
service snapvault stop
2. backup and edit /usr/snapvault/config/snapvault.cfg:

changing:

[QSM:Check Access List]
Type= CheckBox
value= TRUE

to:


[QSM:Check Access List]
Type= CheckBox
value= FALSE



3.  And starting service again.  
service snapvault start
This was much easier than getting the syntax for the svsetstanza command.  (^;

Installing OSSV on Linux



mv ossv_linux_v3.0.1.tar.gz ossv_linux_v3.0.1.tar
tar -xvf ossv_linux_v3.0.1.tar
cd ossv
[root@linux ossv]# ./install
Installer invoked in /home/kevin/ossv
Using default /tmp as the temporary directory
Expanding distribution file
OSSV
3_0_1_2011FEB17_RC
Have you read and agreed to the terms of the license?
(y = yes, n = no, d = display license) (y n d) [d] : y
Please enter the path where you would like
the SnapVault directory to be created [/usr/snapvault] :
Enter the User Name to connect to this machine
via the NDMP protocol : backups
Please enter the password to connect to this machine
via the NDMP protocol :
Confirm password:
Enter the NDMP listen port [10000] :
Enter the hostname or IP address of the SnapVault secondary
storage system(s) allowed to backup this machine.
Multiple hostnames or IP addresses must be comma seperated.
> : SnapVaultHost
NetApp Host Agent is recommended for managing OSSV.
OSSV could not detect NetApp Host Agent on this system.
If you install this software later, you can install OSSV plugins using
/usr/snapvault/manageability/InstallHostAgentPlugins.sh
Writing ndmp password to password file
checkinstall running
CHOSEN_CLASSES=ossvcore
PKG_BASE=/usr/snapvault
This is a new installation - not an upgrade
Trace Directory = /usr/snapvault/trace
Temp  Directory = /usr/snapvault/tmp
preinstall running
Installing ossvcore
Copying uninstallation scripts
postinstall running
HOST_OS=Linux
Creating Trace Directory /usr/snapvault/trace
NV_UPGRADE=FALSE
Creating database directory /usr/snapvault/db
Installing libraries
libsvdb.so
libsvgui.so
libsvndmp.so
libsvplugin.so
libsv.so
libsvxctl.so
libsvxpm.so
Creating symbolic links
Installing base npk's
Checking for components in /home/kevin/ossv/packages
Installing additional npk's
Install normal package
Successfully installed '/home/kevin/ossv/packages/ossv2300.npk'
Checking for extra components in /home/kevin/ossv/extrapackages
Copying non-installable additional npk's
Setting gathered configuration values
Installing OSSV Services
Will link to '/usr/snapvault/etc/startup.sh'
Starting OSSV Services
Checking install validity
SnapVault home directory: '/usr/snapvault'
SnapVault database directory: '/usr/snapvault/db'
SnapVault temporary directory: '/usr/snapvault/tmp'
SnapVault Database and Temporary directories have 71% space left (47182Mb)
SnapVault service is running
SnapVault listener is running
Snapvault NDMP interface on port 10000:
   Vendor     : Netapp
   Product    : SnapVault
   Version    : 3_0_1_2011FEB17_RC
   Host       : maserati
   Host Id    : CED5936CC94D727724CFDEE23D77C48F
   OS Type    : Linux
   OS Version : 2.6.9-5.ELsmp
   IPV6       : Ndmp Server is responding correctly
   IPV4       : Ndmp Server is responding correctly
Snapvault QSM interface on port 10566:
   IPV6       : QSM Server is responding correctly
   IPV4       : QSM Server is responding correctly
Validating filesystems:
   Mount point / (/dev/mapper/VolGroup00-LogVol00) is suitable for backup
   Mount point /proc (none) is a special mount, unsuitable for backup
   Mount point /sys (none) is a special mount, unsuitable for backup
   Mount point /dev/pts (none) is a special mount, unsuitable for backup
   Mount point /proc/bus/usb (usbfs) is a special mount, unsuitable for backup
   Mount point /boot (/dev/cciss/c0d0p1) is suitable for backup
   Mount point /dev/shm (none) is a special mount, unsuitable for backup
   Mount point /proc/sys/fs/binfmt_misc (none) is a special mount, unsuitable for backup
   Mount point /var/lib/nfs/rpc_pipefs (sunrpc) is a special mount, unsuitable for backup
NetApp Host Agent is installed on this system.
Check Succeeded
Installation appears valid
Installation completed successfully
[root@linux ossv]#




Bits & Bobs:
LREP: allows out-of-band data transfers to keep load off normal channel, can be used to transfer data to portable disk for physical transfer to another location.

Enable OSSV compression on SnapVault if required.


DFM setup:

Go to OSSV tab in NetApp Mgt Console on DFM server
Click "Add" 
have NDMP username and password ready
password needs entering as it is shown from the command on the SnapVault server from the ndmpd command:

ndmpd password backup
where NDMP user is called "backup"

CLI snapvault check:
test206:C:\                                              SnapVault:/vol/sva_OSSV_test/test206_OSSV_test_test206_C__
         Uninitialized  -          Transferring fes206:SystemState                                      SnapVault:/vol/sva_OSSV_test/test206_OSSV_test_fes206_SystemState
         Uninitialized  -          Transferring
SnapVault> snapvault status "SnapVault:/vol/sva_OSSV_test/test206_OSSV_test_test206_C__"
Snapvault is ON.
Source                Destination                                                State          Lag        Status
test206:C:\            SnapVault:/vol/sva_OSSV_test/fes206_OSSV_test_test206_C__  Uninitialized  -          Transferring
SnapVault>snapvault status -l "destination volume/path"




Wednesday 30 May 2012

NetApp Backup Offerings

I'm just starting to get to know the complicated and varied ways to do backups in the NetApp world.  Here's some notes and a table to try to lay it out concisely:

Product NetApp DFM NetApp SnapManager for SQL/Exchange/SharePoint OSSV
Managed by itself backup clients DFM
Depends on NetApp SnapDrive DFM/SnapVault
SnapVault on NetApp
NetApp Host Agent on backup client
NetApp Host Agent Plugin on backup client
Can Backup: any NetApp storage NetApp Host Agent Windows or Linux, SAN or local disks
Oracle
strengths centrally managed, very quick and saves lots of storage space thanks to storage snapshots saves storage and gives very fast backups of SAN data via storage snapshots Doesn't need to be NetApp storage

Data owners can manage backups

try LREP for migrating data to new location

Can do file/folder level backups

file/folder backup exclusion supported
weaknesses not sure how to run pre-post exec scripts on backup clients for individual backups. doesn't allow DFM to manage
How to run pre-post scripts?
does not backup CIFS or NFS

Tuesday 15 May 2012

vCLI (vSphere PowerShell) Primer

Connect-VIServer -Server vsphere_vCenter
vihostupdate –server HOSTNAME –install –bundle c:\folder\name_of_file.zip

to install drivers, etc. but host needs to be in maint mode

get-vm | Get-CDDrive | select @{'n' = 'Name'; 'e' = { $_.Parent.Name} }, @{'n' = 'IsConnected'; 'e' = {$_.ConnectionState.Connected}}, @{'n' = "ISOFile"; 'e' = {$_.ISOPath}} | where {$_.IsConnected}

write-output "disconnecting Idle vCenter session connnections"

(following examples used by ESX5 Deployment Tool)
PowerCLI> Get-DeployRuleSet
PowerCLI> Get-VMHost vh1t.mits.apmn.org | Get-VMHostAttributes

M$ PowerShell:

Thursday 2 February 2012

lun won't release on ESX4

We found this when we noticed the esx host disk latency was through the roof (40,000 miliseconds).
None of the VMs on this host had latency above 24 miliseconds.  Why?  Because the luns wasn't assigned to any VMs.  It was a rogue lun that had been removed but due to a bug in ESX it wouldn't release.  vCenter showed the lun size as zero and no paths.

Rescanning the luns on the hosts didn't fix the problem either (try a few times).

Come to find out the lun was "All Paths Down" (APD):

/var/log/vmkernel
Feb  1 18:01:02 VHB23 vmkernel: 144:09:20:43.989 cpu10:4265)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60a9800057396d5a4a6f687531535934" - issuing command 0x41027f1ccc40 
Feb  1 18:01:02 VHB23 vmkernel: 144:09:20:43.989 cpu10:4265)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60a9800057396d5a4a6f687531535934" - failed to issue command due to Not found (APD), try again...


verify the lun number via vCenter of with "esxcfg-mpath -l"

VMware says this is fixed with ESX5 but we had no alternative than to VMotion the VMs and reboot the hosts.  That sorted it.

I need a script to send an altert when/if this happens again.

Anyone want to share a vCLI version of this script:

________________________________________________

--draft only for now--
grep "APD" /var/log/vmkernel > /dev/null
if [ $? != 1 ] ; then\
sendmail.script mailsever mailaddress -s "investigate All Path Down error on `uname -n`"
--draft only for now--

schedule via cron
________________________________________________

Friday 13 January 2012

NetApp vol copy of snapshot for vSphere

Using Data from a NetApp snapshot via vol copy

make sure the destination volume is not being used,
if mapped to VMware delete VMFS and unmap and rescan from vCenter

offline the volume

verify the snapshot needed with snap list command

vol copy -s start snapshot_wanted source_vol_with_snap dest_vol_alredy_offlined

online the destination volume and you'll see message from NetApp CLI warning that lun has been offlined as it is copy and therefore has duplicate information as the source:

Fri Jan 13 14:51:52 GMT [SnapVaultA: lun.newLocation.offline:warning]: LUN /vol/sv_lun_svcopy/sv1/lun96 has been
taken offline to prevent map conflicts after a copy or move operation.

go to FilerView and change the path and lun number of the lun for the dest. volume.

Now you can online the volume and map it out with the correct, non-duplicate, lun number.

rescan storage on vCenter
add Storage and choose "Assign a new signature" when adding dest. lun into vCenter.
find the VMFS on your dest. lun which will have it's old name but after prefix "snap-xxxxx" and rename it appropriately.

add the VMDK on the VMFS on the lun to your VM and you're laughing!


Wednesday 11 January 2012

Snapvault Backups

These instructions are in NetApp's docs, but they're kinda vague, especially when it's time to do the "snapvault snap create" commmand--which volume should be named? SnapVault copy or Source? I've logged this with NetApp and our expert consultants and neither got to the correct answer easily or quickly.

1. Create lun for destination on snapvault

2. set schedules for retention, but disable snapshots as in this case they'll be ran from NetApp client via script which has ssh access.
netapp1> snapvault snap sched src_vol src_vol_weekly
snapvault> snapvault snap sched src_vol src_vol_weekly

3. Ensure remote access between MetroCluster source and Snapvault Destination is good.

4. Establish initial baseline transfer between both volumes on both heads.
SnapVault> snapvault start -S netapp1:/vol/volume sv_volume

5. connect schedules
SnapVault> snapvault snap sched -x blah:/blah blah

SnapVaultA> snapvault snap sched
...
xfer sv_vol sv_vol_snap_sv_weekly 12@- preserve=default,warn=0

netapp1> snapvault snap sched
create vol sv_vol_snap_sv_weekly 4@-

Note: we keep 4 copies of the snapshot on the source/primary/MetroCluster and 12 copies on the destination/secondary/SnapVault

6. create new snapshots and verify date/time stamps

SnapVault> snapvault snap create sv_vol sv_vol_snap_sv_weekly

NOTE: this next command is the bit that goes in your script on the NetApp client right after quiescing/stopping your database and right before starting it up again:

7. remember always pull TO snapvault, don't push:
SnapVault> snapvault update netapp1:/vol/vol/sv1
Transfer started.
Monitor progress with 'snapvault status' or the snapmirror log.

SnapVault> snapvault snap create sv_vol sv_vol_snap_sv_weekly
NetApp> snapvault snap create vol sv_vol_snap_sv_weekly


8. Verify snapshots (look for Status and Contents fields below)

SnapVault> snapvault status -l SnapVault:/vol/sv_vol/sv1
Snapvault is ON.
Source: netapp1:/vol/volume/qtree
Destination: SnapVault:/vol/sv_vol/sv1
Status: Transferring
Progress: 4856300 KB
State: Snapvaulted
Lag: 03:25:19
Mirror Timestamp: Wed Jan 11 09:10:45 GMT 2012
Base Snapshot: SnapVault(1573980687)_sv_vol_snap_sv_weekly
Current Transfer Type: Update
Current Transfer Error: -
Contents: Transitioning
Last Transfer Type: Update
Last Transfer Size: 4735288 KB
Last Transfer Duration: 00:01:02
Last Transfer From: netapp1:/vol/_sv_vol_snap_sv_weekly/qtree

SnapVault> snapvault status -l SnapVault:/vol/sv_vol/sv1
Source: netapp1:/vol/sv_vol_snap_sv_weekly/qtree
Destination: SnapVault:/vol/sv_vol_snap_sv_weekly/sv1
Status: Idle
Progress: -
State: Snapvaulted
Lag: 00:04:00
Mirror Timestamp: Wed Jan 11 12:34:05 GMT 2012
Base Snapshot: SnapVault(1573980687)_sv_vol_snap_sv_weekly-base.0
Current Transfer Type: -
Current Transfer Error: -
Contents: Replica
Last Transfer Type: Update
Last Transfer Size: 5165432 KB
Last Transfer Duration: 00:01:54
Last Transfer From: netapp1:/vol/sv_vol_snap_sv_weekly/qtree

The destination will say, "snapvaulted" under Status when completed.

Resetting Lost ESXi password

There's one blog post I've found about booting your ESXi host to a bootable Linux CD and hacking the root password, but VMware says the only supported solution is to do a recovery install.

The nice folks at VCE have suggested something else:

You need the full Enterprise license in order to use host profiles for this fix. We did this on ESXi 4.1 update 1 with Virtual Distributed switches and SAN.

1. Login to VC as Administrator and create a new host profile using another ESXi host.
2. Edit the Host profile and change the "Administrator password" to a fixed password.
3. Next "Attach" and "Apply" the modified profile to the host that you don’t know the password for.
4. You should now be able to login via the console or SSH however the change is only temporary.
5. From ESXi command line execute the backup script to make changes persistent - ~ # /sbin/auto-backup.sh
7. Finally put ESX into Maintenance mode, Reboot and verify new login credentials.

Thanks again to VCE!