Brents IT Blog

Random thoughts by an IT GOAT


More VMware FUN! Or NOT...

For the past week we have had issues with our primary ECW server.  ECW logged on and tried everything they could think of to fix and fine tune the system.  No matter what tweak they applied to our MySQL instance, our CPU just kept spiking and users were complaining about slowness and error messages.  Well for this to go on for a week is hard on our company.  Customers are unhappy, providers are unhappy, staff are frustrated and everyone else is pointing fingers.

They finally reached a point when they stated that we needed to switch to a physical box, end of story.  I of course dont agree that a physical box is more robust or better than virtual, so we verbally jousted for several calls.  So at this point, i sent out emails to colleagues to see if they had ever run into this issue.  Everything on our end looked fine, SAN was running at 30%, hosts running at 40%, network running at 20%, nothing was overloaded.

Well after several emails back and forth, one finally suggested that i move the VM from one host to another.  At this point i will try anything, but i thought it was a big odd of a suggestion.  I mean after all, i did shutdown the VM completely and let it sit, what good would a vmotion do?  But what the hell, a few clicks and i was done anyways.   So i shifted all the guests from one host to another, then shifted the one troubled VM to the now guest free host.  As the process finished, users got kicked off the system but shortly after that, everyone who logged back on said they saw marked improvement and the CPU was back down to 30-60% utilization.  Judging by the fact that users were kicked off during the move, i would say something was "stuck" at the HAL.  I am still unsure as to what had prompted the issue, but the investigation will continue.

So for anyone else who has a guest with trouble such as a sudden CPU spiking issue, i highly suggest you vmotion that VM first before attempting any other fixes.  Just maybe this will save you time and users frustration.

For the record, we are on vsphere 4.1 update 1 with a stand alone vcenter server.