Current temperature readings of same system under similar conditions...
01-Inlet Ambient Ambient 3 0 OK 23C Caution: 42C; Critical: 46C
02-CPU CPU 10 6 OK 40C Caution: 70C; Critical: N/A
03-P1 DIMM 1-2 Memory 14 7 OK 34C Caution: 87C; Critical: N/A
05-Chipset System 4 2 OK 55C Caution: 105C; Critical: N/A
06-Chipset Zone System 3 4 OK 42C Caution: 68C; Critical: 73C
07-VR P1 Zone System 9 12 OK 48C Caution: 88C; Critical: 93C
09-iLO Zone System 7 15 OK 46C Caution: 72C; Critical: 77C
11-PCI 1 Zone I/O Board 2 11 OK 39C Caution: 64C; Critical: 69C
12-Sys Exhaust Chassis 10 15 OK 43C Caution: 68C; Critical: 73C
RAM parity is certainly another possibility, although I'm not convinced RAM parity and overheating are the
only possibilities. Somewhere in the detailed log message status fields lies, I suspect, a precise reason that would remove the guesswork. Full message logs were...
13 Critical CPU 01/23/2020 21:19 01/23/2020 21:19 1 Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000002, Bank 0x00000003, Status 0xF2000000'00800400, Address 0x00000000'00000000, Misc 0x00000000'00000000)
12 Critical CPU 01/23/2020 21:19 01/23/2020 21:19 1 Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000000, Bank 0x00000003, Status 0xF2000000'00800400, Address 0x00000000'00000000, Misc 0x00000000'00000000)
But I'm not expecting anybody here to be able to decode, it's obviously specialised stuff, probably known only to Intel's internal CPU Gurus. I was just hoping for any pointers on how to decode myself, or what tools might decode them for me. I have installed mcelog, which might give more information in event of a recurrence.
https://mcelog.org