DELL E1420 CPU BUS PERR ERROR on front LCD

This has been going on for months. January to be exact. A server running Windows Server Datacenter 2003 x64 edition, A Dell PowerEdge 2950. Dell has NO clue why this is happening. i’ve called them probably on an average of once a week and each time they say it’s been escalated. Anyone else having this issue?

Update 7\31\2008

I got so frustrated with the issue and there was no fix after 7 months. They sent me out a NEW server. A Dell PowerEdge 2970 with the AMD Opteron. I decided to install Windows Server 2008 64bit edition on here.

The did narrow down that the problems they were having were the Generation III servers of the Poweredge 2950 and 1950 product line.




Related posts:

  1. Dell open manage storage administrator, where is it?
  2. Dell PowerConnect 2724 and VLAN Trunk
  3. An error has occured on the server. For details, please check the event (application) log on the web server. Unable to retrieve ArcGIS install directory from the registry
  4. Dell PowerConnect 6248 port monitor
  5. Check for failed drive on HP-UX
  • I have the exact same issue with a 2950. I am searching for an answer also. Have ou found anything?

  • Hello. I’ve troubleshooted so many things with dell. Including disabling virus scan, enabling VT, bios firmware updates. My last conference call with my account manager, a lead tech, and 2 others finally brought the attention that it is in fact the generation III poweredge 2950s that are having this issue.

    We worked with returns to get a brand new box, which shipped out Friday and I should be receiving today. Basically this box is the new Poweredge 2970 series, and it’s running AMD. I’m going to run Windows Server 2008 datacenter, and hopefully the combination of the two will fix this error.

  • Hi,

    In my experience I found that the error often is located to either faulty CPU or MB.

    I have had several cases where I had 1.5.1 BIOS on and update to 2.3.1 resolved the issue. Other times I had to crosstest the CPU’s in slot1 in order to find the faulty component.

    There is no real need to replace the entire server as this just takes too long compared to some simple troubleshooting one can do by oneself.

  • The thing is, the server crashed twice when I first got it. we did many levels of troubleshooting with dell enterprise techs and their team leads. Updated bios, spoke with intel, microsoft, you name it. So dell said they would replace the motherboard, or a cpu. I did not want that since they would be refurbished parts. So I opted for a new replacement of the 2950. Well, few weeks later, that crashed. Now months after all the crashes, they finally upgraded me to a PowerEdge 2970 running AMD chipset. Hasn’t crashed since.

  • One other note, how long has your machine been running after you replaced the CPU/motherboard? Because the crashes can happen anytime, without notice.

  • Out of curiousity, how many of you were using Microsoft’s Virtual Server 2005?

    When working on the issue perform the following steps to get into a Microsoft supported configuration:

    1. Update to the latest Service Pack (SP) or patch level for Microsoft Virtual Server 2005. Install SP1 Update for Virtual Server 2005. This requires SP1 to be installed. If Microsoft Virtual Server 2005 is running with no service pack, download and install SP1 first and then download SP1 Update.

    MS Virtual Server 2005 SP1

    http://www.microsoft.com/downloads/details.aspx?FamilyId=BC49C7C8-4840-4E67-8DC4-1E6E218ACCE4&displaylang=en

    Microsoft Server 2005 SP1 Update

    http://www.microsoft.com/downloads/details.aspx?FamilyId=A79BCF9B-59F7-480B-A4B8-FB56F42E3348&displaylang=en

    2. Enable Intel Virtualization Technology in the BIOS. The following Microsoft white paper explains the benefits of this setting.

    Microsoft-Intel Virtualization Whitepaper

    http://www.microsoft.com/downloads/details.aspx?FamilyID=851126EA-FBFE-4AB0-BE06-857F7797ED59&displaylang=en

    3. Verify if host operating system is supported. See the release notes for the supported host operating system and service pack levels. This must be done for the installed version of Microsoft Virtual Server 2005 (that is no SP or SP1). This information can be seen for SP1 Update as well. This can be viewed from the links provided above.

    4. Verify if guest operating system(s) are supported. Use the same instructions provided in step 3 to verify this information.

    For example: Without Microsoft Virtual Server 2005 SP1 Update, Windows Vista® and Windows 2008 is not supported as a host or guest OS.

    5. Exclude virtual machine directories from any anti-virus scanning.

    6. Dell hardware updated BIOS or Base Management Controller (BMC) / Dell Remote Assistant Card (DRAC) (latest versions).

  • Yeah, I downloaded Virtual server 2005 SP1 prior. VT enabled and disabled and bios, both led to a crashed server. Host operating was supposed and we used windows server 2003 as the guest. Anti virus, none we were installed in one deployment. All updates were up to date, and after 6 months of troubleshooting with Dell, I had no choice but to request a new server.

  • did you apply the post SP1 update for Virtual Server 2005?

  • Hello

    Yes, i think it was this
    http://www.microsoft.com/downloads/details.aspx?FamilyId=A79BCF9B-59F7-480B-A4B8-FB56F42E3348&displaylang=en

    and that did not help either unfortunately.. I started a huge thread on dell support forums also, and there were numerous peopel with this issue.. EXCEPT those with redhat had no problems since redhat released a fix few weeks after.

  • I am curious to know whether anyone is experiencing this issue in a Windows environemnt that is not using virtualization software of some sort. All of the threads that I have read regarding this issue within a Windows environment have been hosting “guest” operating systems on top of a Windows “host” OS through Virtual Server 2005, Hyper-V, or VMware Server.

  • Yeah, I think every thread I seen this on it has been a ms virtual server 2005 setup. I have another DELL 2950, I believe it isn’t a generation III, and that is not experiencing this problem.

  • Hi Shank,
    I had this problem on PE2950s and also on blades M600, now there is a new BIOS for X950s which seem to fix CPU IERR but maybe also CPU BUS PERR, so far so good.
    You were talking about RH releasing a fix for the error, could you please tell me about it? My customer has 144 nodes with RH 2.6.9-55 plus LUSTRE filesystem.
    Any help will be appreciated
    David

  • Hello.

    Now the IERR and PERR are totally different errors. I had the IERR on another 2950 of mine. The exact message was:

    CPU1 Status: Processor sensor for CPU1, IERR was asserted
    CPU2 Status: Processor sensor for CPU2 IERR was asserted

    Did you have this error? Dell asked me to reseat everything, which made no sense to me. Without reseating, anda simple reboot, the machine ran fine, and still has.

    Let me take a look into the RedHat fix for you.

  • Thanks Shank,
    I know they are different but because they released a new BIOS to address that CPU errors I though it will be fixed also.
    The bad news is that with the new BIOS still messaging and crashing the machine sometimes. This started again when they put LUSTRE filesystem on preproduction again…the machines have been previously fully tested (CPU stressing and memtest) with no errors without Lustre filesystem.
    So there must be something linked to the filesystem or the kernel patched for LUSTRE filesystem.
    This kind of fix for RH would be a great help to see what have been addressed within the kernel.
    Thanks again
    David

  • http://kbase.redhat.com/faq/FAQ_85_11695

    ^ That link may help, that’s what I found in my emails.

    Still looking for the actual update link.

  • Hi Shank,
    Thanks a lot for the info, the situation matches completely with ours.We are updating the kernel but while compiling it to the updated version we are having crashes when accessing the storage due to multipath(other issues).
    The link you are referring to is the link to the new kernel or a patch to fix this problem?if it is a fix will be great to quickly test it if not…we need to keep on working…:-(
    Thanks again for your help…
    David

  • Shank, after the kernel upgrade the error seems to be gone …finger crossed!

  • Hi david.
    yes it is an upgrade, but I think you figured that out. Good to hear that the error seems to be gone.

    Keep me posted on what happens. I emailed a Dell tech I was working with a few days ago to find out if there were any updates on the Linux end, and of course for the windows site. Still waiting for a response from them.

  • Shank,
    Still no issues so far.
    Thanks
    David

  • Hi david,

    How’s it looking today?

  • Hi Shank,
    Great, I think that the problem has been solved.
    David

  • So I believe it is okay to say that the redhat fix was successful. As far as Dell/Microsoft setup, 1950s and 2950s are still having the problem.

    I’m running on the 2970 with Hyper-V now, for well over a month, not a single problem.

You can follow any responses to this entry through the RSS 2.0 feed.