SANs and ESX guests

So, learned a valuable lesson today: If you want to restart a SAN array, FOR GOODNESS SAKE power off any VM guests using the volumes it hosts.

To flesh out the details, we are moving all our volumes off the loaner PS5000 and onto our new PS6000. I'm only seeing one interface being used, so figured that was a config error. Ensured all the eth interfaces were up and had addresses, and spoke to tech support about it. They said a restart of the array might help things.

Well, they didn't mention shutting down attached guests first! I knew that you shouldn't, but it didn't click that our file server was using that volume, and should have been powered off first. I restart the array, and try to move a volume again, but it's still only using one eth interface, albeit a different one this time.

It turns out, from another tech support rep, that when moving volumes the PS doesn't see that as a priority, and therefore only uses one eth interface to do so. Argh!

So I spent the morning cleaning up the disaster that ensued. No file server = no My Docs, and no My Docs = hung logons, can't save files, etc etc etc. Since it was a VM, the ESX host needed to be rebooted, as that's the only way to clear something like this up. I tried to shut down the VM on it's own, but it timed out...after 20 minutes....yeah, a 20 minute timeout. That's a bug!

Anyways, things are up and running now, and lesson learned without too much of a cost. Still kinda feel stupid about it though.

Comments

Popular posts from this blog

DFSR - eventid 4312 - replication just won't work

Fixing duplicate SPNs (service principal name)

Logstash to Nagios - alerting based on Windows Event ID