Re: VMWare workstation monitor.suspend_on

So here's the monitor ring buffer from the log you provided. The monitor ring buffer tracks various CPU emulation events, many of which are faults or exceptions, and only some of which are actually visible to the guest. It's in reverse chronological order. I'll walk through it and explain as I go:

------MONITOR RING BUFFER START(entries=256, indexUnwrapped 42835 entrySz=64)-----

082 --- CS:XIP 0010 ffffffff81842bdd SS:XSP 0000 ffff8800337c6f88 #TRIPLE 000e CPL0 PROT 64-bit fluff=0000

081 --- CS:XIP 0010 ffffffff81842bdd SS:XSP 0000 ffff8800337c6f88 NESTED #PF addr= ffffffffff574080 CPL0 PROT 64-bit fluff=0000

080 --- CS:XIP 0010 ffffffff81842bdd SS:XSP 0000 ffff8800337c6f88 #DF 000e CPL0 PROT 64-bit fluff=0000

079 --- CS:XIP 0010 ffffffff81842bdd SS:XSP 0000 ffff8800337c6f88 NESTED #PF addr= ffffffffff5740e0 CPL0 PROT 64-bit fluff=0000

078 --- CS:XIP 0010 ffffffff81842bdd SS:XSP 0000 ffff8800337c6f88 #PF addr= ffffffff81842bdd CPL0 PROT 64-bit fluff=0004

So the point where the guest clearly went off the rails was during fetch of the instruction at 10:ffffffff81842bdd (entry index 078) -- the fault address is equal to XIP (RIP in 64-bit code), so the instruction address was not mapped, and the CPU attempted to deliver a page fault.

The IDT entry for the #PF handler was also not mapped (entry index 079, with IDTR=ffffffffff574000), so our emulation raised a double-fault (entry index 080). The IDT entry for the #DF handler was (unsurprisingly at this stage) also not mapped (entry index 081), so our emulation raised a triple-fault (entry index 082). At that point, a physical host would have rebooted.

So... it would seem that something executing shortly prior to 10:ffffffff81842bdd managed to corrupt the page tables or reconfigure the core's CR3 or operating mode such that the page tables referenced by CR3 were not appropriately formatted in order to fetch that instruction, nor to allow access to the IDT. Sometimes vmss2core's diagnostic output might give additional clues as to what's happening, since when run without the "-M" option it will make use of the guest's paging structures in much the same way as the virtual CPU would, so that it can produce a corefile that covers the entire mapped address space for each virtual CPU core according to its operating mode and CR3 value. If not, you'll need to dig into how you got to 10:ffffffff81842bdd without usable paging structures.

Hope that is of some help!

Cheers,

Darius