Jan 22

This issue started appearing since we upgraded to ESX4, and to be fair its happened maybe 3 times in total. ESX just disables the VM, stating in the logs:

The CPU has been disabled by the guest operating system

Jan 07 22:22:11.316: vcpu-0| The CPU has been disabled by the guest operating system. You will need to power off or reset the virtual machine at this point.

Some googling didnt turn up much, the machine wasn’t overcommited resource wise, and all other VM’s were fine, so off we go to open a support request with VMware. Short time later and they have identifed the following in the logs:

Jan 07 22:25:07.336: vmx| VMXVmdb_LoadRawConfig: Loading raw config Jan 07 22:25:07.415: vmx| Failed to extend memory file from 0x0 bytes -> 0x1000 bytes.

Jan 07 22:25:07.416: vmx| BusMem: Failed to allocate frames for region BusError.

Jan 07 22:25:07.417: vmx| Msg_Post: Error Jan 07 22:25:07.417: vmx| [msg.memVmnix.ftruncateFailed] Could not allocate 4096 bytes of anon memory: No space left on device.

Jan 07 22:25:07.418: vmx| [msg.moduletable.powerOnFailed] Module PhysMem power on failed.

Jan 07 22:25:07.419: vmx| —————————————-

Jan 07 22:25:07.590: vmx| VMX_PowerOn: ModuleTable_PowerOn = 0 Jan 07 22:25:08.618: vmx| Vix: [112100 mainDispatch.c:3248]: VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1 Jan 07 22:25:08.618: vmx| Vix: [112100 mainDispatch.c:3254]: VMAutomation: Ignoring ReportPowerOpFinished because the VMX is shutting down.

Jan 07 22:25:08.619: vmx| Vix: [112100 mainDispatch.c:3248]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 Jan 07 22:25:08.620: vmx| Vix: [112100 mainDispatch.c:3254]: VMAutomation: Ignoring ReportPowerOpFinished because the VMX is shutting down.

Jan 07 22:25:08.825: vmx| Transitioned vmx/execState/val to poweredOff Jan 07 22:25:08.919: vmx| Vix: [112100 mainDispatch.c:3248]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=0 Jan 07 22:25:08.920: vmx| Vix: [112100 mainDispatch.c:3254]: VMAutomation: Ignoring ReportPowerOpFinished because the VMX is shutting down.

Jan 07 22:25:08.921: vmx| VMX idle exit

Jan 07 22:25:09.108: vmx| Vix: [112100 mainDispatch.c:599]: VMAutomation_LateShutdown() Jan 07 22:25:09.109: vmx| Vix: [112100 mainDispatch.c:549]: VMAutomationCloseListenerSocket. Closing listener socket.

Jan 07 22:25:09.466: vmx| Flushing VMX VMDB connections Jan 07 22:25:09.766: vmx| IPC_exit: disconnecting all threads Jan 07 22:25:09.766: vmx| VMX exit (0).

Jan 07 22:25:09.767: vmx| AIOMGR-S : stat o=280 r=427 w=6 i=255 br=6766592 bw=380928 Jan 07 22:25:09.768: vmx| VMX has left the building: 0.

While the resource management was trying to extend memory on the host for the VM it encountered an error. This triggered a failure and a automated power off of the vm. As I say nothing is overcommited and there are plenty of resources available on the host, the VM itself is not particularly intensive.

Some searching by VMware and it seems we have stumbled across a bug between them and our Equallogic arrays:

This is a known issue with Equalogic arrays (Problem Report No 484220) This issue will be fixed as part of ESX4 patch 5. The release date for this Patch is currently targeted for the the end of March. Looking through the PR, the only available workaround./suggestion to help reduce the occurrence of this issue is to ensure no vmknic binding and to have only 1 vmkernel port per vmnic.

So currently the only workaround is not to configure the vmnics/vmk ports as per Dell’s docs for multipathing on Equallogics! Hopefully this saves some searching 🙂