Taking unnecessary risk with important data

I was getting to try VMWare ESXi for a while now, but somehow didn’t have the necessary hardware, NIC being the biggest trouble. So I went and bought a brand new Intel network adapter and put it in my primary desktop along with an empty HDD for testing.

I wanted to disconnect my primary drives, just in case… that would be in accordance with me being a coward. However, I was afraid of something else, too. I have Intel Matrix RAID set up (very neat thing), which is software raid. I don’t really trust this controller to pick up the RAID setting after reconnecting my drives. So I figured, I will just have to be careful choosing the right drive to erase.

ESXi installation is very simple if hardware agrees. The only thing you have to do is choose the right empty drive for installation target. I noticed that ESXi ignored software RAID and listed my two drives as ordinary target candidates. What was interesting about it was that it marked one as full (having data on it) and one as empty. I ignored that and chose newly added test drive.

After restart ESXi loaded successfully. I connected to it with vSphere Client that I use at work. I quickly noted two brand new empty data stores… wait, two? And the second one was the exact size as one of my RAID drives?

My blood pressure spiked and I quickly pulled out everything new to check if my system still boots. It didn’t. Instantly I put together a script of what happened: I have my boot partition on a striping part of Matrix RAID, causing only one drive to have valid boot data. ESXi took the other, as it was “empty”, and kindly formatted it for me no questions asked.

I was afraid all my striped partitions were gone and wanted to check if mirrored data is still there (with Matrix RAID you can have both types of RAID with only two drives). So I booted a Norton Ghost CD and found out that all data except my system boot partition was just, there. VMFS obviously wrote only on the beginning of the drive. I was still afraid it corrupted other parts of drive with some sort of backup for master sectors or some other filesystem data that could potentially spread to the other drive in the mirror (not sure how RAID would react here).

So luckily being a coward I had a full backup image of system partition from last weekend on my mirrored partition (and also off-site, but the first was now easily accessible). I restored it and hoped it would boot. And it did.

I ran all RAID checks and no inconsistencies were found. I must say I was very lucky things turned out the way they did. So what did I learn from this?

  • Always have backups.
  • Don’t play with your primary system if you don’t have to.
  • Even if you are careful, there are rare conditions you can’t think of in advance.

I will end the post with a thought from my last Virtualization event:

There are two types of system administrators: those who had major data catastrophe and those, who are going to have one.