Hope someone can give me an idea of how to resolve this issue short of replacing the whole machine.
Background/History
I have an ASUS P8Z68-M Pro MB / G620 CPU / 16GB DDR3 1333MHz CL 9-9-9-24 DRAM. The system is about 4 years old, and it had memory errors about 2 years ago. I bought new RAM and RMA'd the bad set to keep for spare.
Last week I noticed some weird errors in FreeNAS (which have been happening for some time), so I took the machine down and started running Memtest86+ v4.2, and found an easily reproducible error in one of the DIMMs at address 0019bd12878.
First time memory failed on Pass 1, Test 2 error bit was 00010000 - bit expected was 0, but 1 was read.
Second time error was on Pass 1, Test 1 - error bit was 00020000, again 0 expected, one read.
Problem was very easy to reproduce - Put the bad DIMM in a different slot for the two different tests - failed both times.
The problem
I replaced the bad RAM with the spare RAM from the first RMA. Brand new Patriot VIPER DDR3 1600MHz CL9-9-9-24 which I set up to run at 1333MHz in the BIOS. (G620 won't take the higher multiplier.) Did XMP in the BIOS, and then set the clock speed to 1333.
I now have a weird situation with the replacement.
This Ran fine for just over 24 hours, then I started getting a few errors at 0004d2fxxxx. (Range of addresses - program only shows a few on the screen and I don't have a printer hooked up to it, or any way to capture more details.)
Without taking down the machine I changed the Memtest86+ settings to spot test the area that was reporting the errors, and got about 4500 errors very quickly. All the errors reported with Test 8 "Random Patterns"
When I tried to reproduce and localize the problem by pulling one of the two DIMMs, and the errors stopped. So the power cycle and/or reinserting the other DIMM cleared the problem.
I went back to the original configuration and so far it has been running error free for over 37 hours. Which makes it less likely to be a simple thermal problem.
Questions
- Any suggestions on how I can localize this problem?
- Any other test programs I should run that might help?
- Is this more likely to be a memory problem, motherboard problem (or even CPU chip or Power supply issue)?
Any suggestions or input would be most appreciated.
Thanks.
No comments:
Post a Comment