Kitz ADSL Broadband Information
adsl spacer  
Support this site
Home Broadband ISPs Tech Routers Wiki Forum
 
     
   Compare ISP   Rate your ISP
   Glossary   Glossary
 
Please login or register.

Login with username, password and session length
Advanced search  

News:

Pages: [1] 2

Author Topic: Linux (CentOS) Machine Check Exceptions  (Read 22852 times)

sevenlayermuddle

  • Helpful
  • Addicted Kitizen
  • *
  • Posts: 5372
Linux (CentOS) Machine Check Exceptions
« on: January 25, 2020, 01:10:35 PM »

Logged

roseway

  • Administrator
  • Senior Kitizen
  • *
  • Posts: 43974
  • Penguins CAN fly
    • DSLstats
Re: Linux (CentOS) Machine Check Exceptions
« Reply #1 on: January 25, 2020, 03:32:52 PM »

I've never heard of machine check exceptions, but I would suggest that a likely cause of the event is overheating. Do you have any temperature monitoring on that machine?
Logged
  Eric

sevenlayermuddle

  • Helpful
  • Addicted Kitizen
  • *
  • Posts: 5372
Re: Linux (CentOS) Machine Check Exceptions
« Reply #2 on: January 25, 2020, 06:34:17 PM »

Logged

roseway

  • Administrator
  • Senior Kitizen
  • *
  • Posts: 43974
  • Penguins CAN fly
    • DSLstats
Re: Linux (CentOS) Machine Check Exceptions
« Reply #3 on: January 25, 2020, 06:59:08 PM »

If it were a hardware failure, it would most likely be permanent, surely? Your machine recovered after power recycling, which seems to suggest some sort of thermal effect.
Logged
  Eric

sevenlayermuddle

  • Helpful
  • Addicted Kitizen
  • *
  • Posts: 5372
Re: Linux (CentOS) Machine Check Exceptions
« Reply #4 on: January 25, 2020, 07:10:09 PM »

Logged

burakkucat

  • Respected
  • Senior Kitizen
  • *
  • Posts: 38300
  • Over the Rainbow Bridge
    • The ELRepo Project
Re: Linux (CentOS) Machine Check Exceptions
« Reply #5 on: January 25, 2020, 09:26:43 PM »

MCEs -- either thermal events or RAM parity errors
Logged
:cat:  100% Linux and, previously, Unix. Co-founder of the ELRepo Project.

Please consider making a donation to support the running of this site.

sevenlayermuddle

  • Helpful
  • Addicted Kitizen
  • *
  • Posts: 5372
Re: Linux (CentOS) Machine Check Exceptions
« Reply #6 on: January 25, 2020, 10:24:15 PM »

Current temperature readings of same system under similar conditions...

01-Inlet Ambient    Ambient    3    0     OK    23C    Caution: 42C; Critical: 46C
02-CPU    CPU    10    6     OK    40C    Caution: 70C; Critical: N/A
03-P1 DIMM 1-2    Memory    14    7     OK    34C    Caution: 87C; Critical: N/A
05-Chipset    System    4    2     OK    55C    Caution: 105C; Critical: N/A
06-Chipset Zone    System    3    4     OK    42C    Caution: 68C; Critical: 73C
07-VR P1 Zone    System    9    12     OK    48C    Caution: 88C; Critical: 93C
09-iLO Zone    System    7    15     OK    46C    Caution: 72C; Critical: 77C
11-PCI 1 Zone    I/O Board    2    11     OK    39C    Caution: 64C; Critical: 69C
12-Sys Exhaust    Chassis    10    15     OK    43C    Caution: 68C; Critical: 73C

RAM parity is certainly another possibility, although I'm not convinced RAM parity and overheating are the only possibilities.    Somewhere in the detailed log message status fields lies, I suspect, a precise reason that would remove the guesswork.   Full message logs were...

   13    Critical   CPU   01/23/2020 21:19   01/23/2020 21:19   1   Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000002, Bank 0x00000003, Status 0xF2000000'00800400, Address 0x00000000'00000000, Misc 0x00000000'00000000)
   12    Critical   CPU   01/23/2020 21:19   01/23/2020 21:19   1   Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000000, Bank 0x00000003, Status 0xF2000000'00800400, Address 0x00000000'00000000, Misc 0x00000000'00000000)

But I'm not expecting anybody here to be able to decode, it's obviously specialised stuff, probably known only to Intel's internal CPU Gurus.  I was just hoping for any pointers on how to decode myself, or what tools might decode them for me.  I have installed mcelog, which might give more information in event of a recurrence.

https://mcelog.org
Logged

burakkucat

  • Respected
  • Senior Kitizen
  • *
  • Posts: 38300
  • Over the Rainbow Bridge
    • The ELRepo Project
Re: Linux (CentOS) Machine Check Exceptions
« Reply #7 on: January 25, 2020, 10:56:05 PM »

There should be an appropriate RPM package available, for the version of CentOS that you are using.

[bcat ~]$ sudo yum info mcelog
Loaded plugins: fastestmirror, product-id, refresh-packagekit, search-disabled-repos, subscription-manager
Loading mirror speeds from cached hostfile
 * elrepo-kernel: mirrors.coreix.net
Available Packages
Name        : mcelog
Arch        : x86_64
Epoch       : 2
Version     : 128
Release     : 1.c83713fd.el6
Size        : 68 k
Repo        : rhel-6-server-rpms
Summary     : Tool to translate x86-64 CPU Machine Check Exception data.
URL         : https://github.com/andikleen/mcelog.git
License     : GPLv2
Description : mcelog is a daemon that collects and decodes Machine Check Exception data
            : on x86-64 machines.

[bcat ~]$

The above output was obtained on a RHEL6 system.
Logged
:cat:  100% Linux and, previously, Unix. Co-founder of the ELRepo Project.

Please consider making a donation to support the running of this site.

sevenlayermuddle

  • Helpful
  • Addicted Kitizen
  • *
  • Posts: 5372
Re: Linux (CentOS) Machine Check Exceptions
« Reply #8 on: January 25, 2020, 11:48:03 PM »

Logged

burakkucat

  • Respected
  • Senior Kitizen
  • *
  • Posts: 38300
  • Over the Rainbow Bridge
    • The ELRepo Project
Re: Linux (CentOS) Machine Check Exceptions
« Reply #9 on: January 26, 2020, 12:22:08 AM »

What I failed to type was that with the mcelog package installed, there should be some documentation installed under the /usr/share/ directory, as well as the usual manual page, etc.

rpm -qd mcelog

(Just in case you have forgotten . . .)
Logged
:cat:  100% Linux and, previously, Unix. Co-founder of the ELRepo Project.

Please consider making a donation to support the running of this site.

sevenlayermuddle

  • Helpful
  • Addicted Kitizen
  • *
  • Posts: 5372
Re: Linux (CentOS) Machine Check Exceptions
« Reply #10 on: January 26, 2020, 12:52:13 AM »

Logged

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick
Re: Linux (CentOS) Machine Check Exceptions
« Reply #11 on: January 26, 2020, 02:06:53 AM »

« Last Edit: January 26, 2020, 02:54:45 AM by Weaver »
Logged

sevenlayermuddle

  • Helpful
  • Addicted Kitizen
  • *
  • Posts: 5372
Re: Linux (CentOS) Machine Check Exceptions
« Reply #12 on: January 26, 2020, 08:39:30 AM »

Logged

sevenlayermuddle

  • Helpful
  • Addicted Kitizen
  • *
  • Posts: 5372
Re: Linux (CentOS) Machine Check Exceptions
« Reply #13 on: January 26, 2020, 09:04:39 AM »

Logged

Weaver

  • Senior Kitizen
  • ******
  • Posts: 11459
  • Retd s/w dev; A&A; 4x7km ADSL2 lines; Firebrick
Re: Linux (CentOS) Machine Check Exceptions
« Reply #14 on: January 26, 2020, 09:53:46 AM »

> causing some new instability as well as the possibility of fixing it

Very wise. I had my doubts whilst writing it.
Logged
Pages: [1] 2