Monday 9 September 2013

Quick & Dirty: Replacing NetApp Disk

This doesn't include steps for a MetroCluster, I'll add that later if I do that
1.      Verify failed disk
SnapVault> vol status -f shows failed disks
  also, 
  SnapVault> disk show 0a.02.03
 Shows failed disk  
    where 02 is the shelf , see LED number on front, left (shelf 1 is top, 2 is middle, 3 is bottom)
    where 03 is bay (see printed numbers associated with each disk location)

The disk is amber instead of green and the shelf indicates a fault as well.  To make the disk blink to be sure it's the correct one, run

SnapVault> priv set advanced
SnapVault*> blink on 0a.02.03
SnapVault*> blink off 0a.02.03

2.       Now you need to physically replace disk
3.       Assign the newly replaced disk so it becomes a hotspare:
SnapVault> disk assign 0a.02.03

4.  Verify all is well
SnapVault> disk show 0a.02.03
    to verify it's assigned as spare
SnapVault> vol status -f shows failed disks
 Note: OnCommand GUI can do some of this too under "Storage", "Disks"

Monday 2 September 2013

deleting backups from Networker

Deleting records from Networker is easy enough, but you have to use the CLI:

(For me it's always helpful to start DOS with a "Run as Admin" option)

Then you "just" delete each backup, by SSID, from the Networker server like this:

1.  (NetworkerDos)# nsrmm -d -y -S 123456789

In human, that's something like "networker meadiamanagement command delete, answer yes to any prompts (like, are you sure?) and the SSID is 123456789

But you have to find the SSIDs of the backup jobs you want to delete first.
And, of course, be careful as you don't want to delete the wrong backups!

My process was to list all backup records (SSIDs) more than 8 months old:
(NetworkerDos)# mminfo -q "savetime<01/01/2013" > c:\temp\delete-2012\ssids-2012-only.txt

In human, that's something like, "give me info from the media mgt database where records are before January 1st 2013.  

The < (less than sign) logic isn't really intuitive or obvious at all, so see this post to understand it.

You can add the backup clients as well with:
(NetworkerDos)# mminfo -q "client=client.domain.org,client=client2.domain.org,savetime<01/31/2013" > c:\temp\delete-2012\ssids-2012-clients-list-1.txt


The way my backups retention is configured I had to get a list of all backup jobs for two sets of clients (those backed up weekly, then those backed up daily).  So I had to build two lists of these clients, and run the command twice, one for those backed up weekly and the other for the clients backed up daily:

Next is some user intensive bits that I couldn't automate as nicely as I'd have liked to.  I had lists of every backup job with the usual information including the SSID.  But I needed to remove from the list the backups I didn't want to delete.  And those were the all but the first backup of every Friday of the month (for weekly backups).  For the daily backups I wanted to keep all the Friday backups.

I needed Linux or Cygwin to manipulate these files as I still haven't learned powershell.

So I looked at a calendar of 2012 and made a list of the backups with the dates that I wanted to keep.  For example, I saw that May 4th 2012 was one of the dates that I didn't want to delete from my backups.  So I grep'ed all the backups from that date out of my delete-ssid.cmd script file:

$ grep -v "04/05/2012" ssids-2012-weekly-delete-list-removed.txt > ssids-2012-weekly-4-May-removed.txt


And after doing this for all the dates I wanted to keep, I did some spot-checking  before building a dos batch file to do the nsrmm -d command on all the reccords I wanted to remove.


It took my dedicated (physical) IBM X3650  2x2.39 GHz, 32 GB RAM Windows 2008 R2 Networker server  about 10 hours to remove some 12000 records.

2.  Then you run nsrim -X (I guess a Networker database check).  This took less than an hour, if memory serves...

3.  Finally, you're ready to run the Data Domain "clean".  You can do it from the CLI but it's works fine from the GUI too.  It's a 12 step process with most of the steps being building a list, step 11 being copying, and the last step seems to be doing some checks.  This step took up to 12 hours to run.

References:

cygwin echo with a tab separators: $ echo -e "test \t\tabcdetest"

http://nsrd.moab.be/
nsrvalley.com 
Data Domain Overview of Cleaning Phases, Document ID:1071