Jan 31

Although a while back, been meaning to post this somewhere online. On a few servers we had a problem installing this update, the following worked a charm:

Under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\Setup\Resume, if set to 1, change to 0

Then install the update and reboot.

Jan 22

This issue started appearing since we upgraded to ESX4, and to be fair its happened maybe 3 times in total. ESX just disables the VM, stating in the logs:

The CPU has been disabled by the guest operating system

Jan 07 22:22:11.316: vcpu-0| The CPU has been disabled by the guest operating system. You will need to power off or reset the virtual machine at this point.

Some googling didnt turn up much, the machine wasn’t overcommited resource wise, and all other VM’s were fine, so off we go to open a support request with VMware. Short time later and they have identifed the following in the logs:

Jan 07 22:25:07.336: vmx| VMXVmdb_LoadRawConfig: Loading raw config Jan 07 22:25:07.415: vmx| Failed to extend memory file from 0x0 bytes -> 0x1000 bytes.

Jan 07 22:25:07.416: vmx| BusMem: Failed to allocate frames for region BusError.

Jan 07 22:25:07.417: vmx| Msg_Post: Error Jan 07 22:25:07.417: vmx| [msg.memVmnix.ftruncateFailed] Could not allocate 4096 bytes of anon memory: No space left on device.

Jan 07 22:25:07.418: vmx| [msg.moduletable.powerOnFailed] Module PhysMem power on failed.

Jan 07 22:25:07.419: vmx| —————————————-

Jan 07 22:25:07.590: vmx| VMX_PowerOn: ModuleTable_PowerOn = 0 Jan 07 22:25:08.618: vmx| Vix: [112100 mainDispatch.c:3248]: VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1 Jan 07 22:25:08.618: vmx| Vix: [112100 mainDispatch.c:3254]: VMAutomation: Ignoring ReportPowerOpFinished because the VMX is shutting down.

Jan 07 22:25:08.619: vmx| Vix: [112100 mainDispatch.c:3248]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 Jan 07 22:25:08.620: vmx| Vix: [112100 mainDispatch.c:3254]: VMAutomation: Ignoring ReportPowerOpFinished because the VMX is shutting down.

Jan 07 22:25:08.825: vmx| Transitioned vmx/execState/val to poweredOff Jan 07 22:25:08.919: vmx| Vix: [112100 mainDispatch.c:3248]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=0 Jan 07 22:25:08.920: vmx| Vix: [112100 mainDispatch.c:3254]: VMAutomation: Ignoring ReportPowerOpFinished because the VMX is shutting down.

Jan 07 22:25:08.921: vmx| VMX idle exit

Jan 07 22:25:09.108: vmx| Vix: [112100 mainDispatch.c:599]: VMAutomation_LateShutdown() Jan 07 22:25:09.109: vmx| Vix: [112100 mainDispatch.c:549]: VMAutomationCloseListenerSocket. Closing listener socket.

Jan 07 22:25:09.466: vmx| Flushing VMX VMDB connections Jan 07 22:25:09.766: vmx| IPC_exit: disconnecting all threads Jan 07 22:25:09.766: vmx| VMX exit (0).

Jan 07 22:25:09.767: vmx| AIOMGR-S : stat o=280 r=427 w=6 i=255 br=6766592 bw=380928 Jan 07 22:25:09.768: vmx| VMX has left the building: 0.

While the resource management was trying to extend memory on the host for the VM it encountered an error. This triggered a failure and a automated power off of the vm. As I say nothing is overcommited and there are plenty of resources available on the host, the VM itself is not particularly intensive.

Some searching by VMware and it seems we have stumbled across a bug between them and our Equallogic arrays:

This is a known issue with Equalogic arrays (Problem Report No 484220) This issue will be fixed as part of ESX4 patch 5. The release date for this Patch is currently targeted for the the end of March. Looking through the PR, the only available workaround./suggestion to help reduce the occurrence of this issue is to ensure no vmknic binding and to have only 1 vmkernel port per vmnic.

So currently the only workaround is not to configure the vmnics/vmk ports as per Dell’s docs for multipathing on Equallogics! Hopefully this saves some searching 🙂

Jan 19

Like me, you may have a few high I/O windows machines in your enviroment, for us its mainly mailservers, and after realising the performance benefits of the paravirtualised drivers in vSphere (along with them being supported for boot disks in windows), I wanted to move a few servers to use these drivers.

On an existing machine, you may not have seperate boot and data vmdk’s, if you just change the driver to pvscsi you can look forward to a bluescreen on boot, as Windows won’t have loaded in the drivers you need.

  1. Ensure you are running ESX4 U1, as boot disks using the pvscsi driver are not supported in earlier releases (it may work, but if something goes wrong VMware won’t want to know!)
  2. Make sure your VMware tools install is up to date.
  3. Add a new data disk to your VM, and select the virtual device node as SCSI 1:0 rather than 0:1/2/3 etc, this will add a new controller just for that disk.
  4. Change the adapter that is created to be ‘paravirtualised’ drom the default LSI Parallel and boot the VM.
  5. Once its booted, and the drivers have been installed, you can shut the VM down, remove the temp disk you created and change the adapter for your main disk to paravirtualised.
  6. Boot the box and all should be well, if you have any problems you can change back to the LSI driver and try again 🙂

Enjoy the increased performance!

*Please note these instructions are provided for use at your own risk, while they worked fine for me on quite a few machines your setup may be different, so take a snapshot and make sure you have a backup!*

Jan 17

Recently on a a few of our more I/O intensive linux machines we noticed dramatic iowait increases, especially during the backup window. While they are quite heavily used boxes, and the backup will always hit the disks hard, it was more than I would have expected!

While the server itself was still perfectly responsive, throughput did drop to around 7/8mb during these times, and the googling began!

Firstly we looked at our Equalogic SAN’s, and ran some iometer tests with the same average workload on a physical machine with one volume on a SAN and on a VM. The volume connected directly to the SAN performed as expected, while the VM started to struggle slightly, not miles off what you would expect, but the difference was there! Using esxtop we were able to check the latency on the requests, and how big the disk queue was, all looked normal with around 2/3ms latency. While we checked a few other bits and bobs, I came across a few recent posts on the paravirtualised drivers available for ESX, which have only recently become properly supported.

Over the last week we have started using the PVSCSI driver available in vSphere, which has significantly improved performance above the old LSI parallel adapter. While not yet supported for boot disks on RHEL, it is on windows 2003 and 2008, and I would definately recommend trying this adapter out for any machines with a more signifcant I/O demand.

There is a great tutorial for RHEL at http://southbrain.com/south/tutorials/installing-redhat-enterprise-5.html and I will be adding a windows howto shortly!