A recent experience of mine has shown a flaw in the use of snapshots.
How Do Snap Shots Work? Essentially, a snapshot is a point in time image of your VM that can also contain the memory data. After taking a snapshot any further data written to the VM's disk is written into a new vmdk file on the vmfs partition. As time goes on the new vmdk file grows in size. This is where issues arise.The VMDK has the potential to grow in size and fill the disk. When this happens the vm will stop working and it may well affect any other VM's on the same partition. So the standard approach is to delete the snapshots either through the GUI and "Snapshot Manager" or through the command line. The trouble is you need to have free space available to delete any snapshots and consolidate your VMDK files and the process errors out if you have no space.
The Solution? Free some space. Delete or move a VM to another disk. In the above scenario I had to delete the whole VM and restore it from tape as it got corrupted when the "remove snapshot process" failed.
Lessons Learnt. :
- Set an alarm in Virtual Center to email you when your disks get 90% full
- If you don't have Virtual Center use your hardware management agents to send an email. e.g. HP Insight Manager or IBM RAID Manager
- Delete any snap shots after a prearranged period of time in order to contain their growth
- If snapshots disappear from you VM you can remove it from the inventory and re import and it may rediscover missing snapshots. Adding an extra snapshot can also force a VM to rediscover previous snapshots that it is missing.
- VMware partner support is only GOLD ie 9am to 7pm.
More info http://www.rtfm-ed.co.uk/docs/vmwdocs/whitepaper-vmware-esx2.x-redo-demystified.pdf