first post here,
I have read all what I was able to find (and not only here) on the topic but nothing solved my case up to now, I have something like 30 years experience with PC building/servicing and this is the first time I have to surrender, so I'm humbly here to ask advice.
I have a new Skylake Z170 built PC, Win 10 PRO clean install + all updated drivers (from manufacturers website) and very few programs. The machine is for photo editing and I try to keep all as clean as possible, no antivirus (work mostly offline), no bloatware, only the strict bare necessity.
The hardware (two months old) is stable, stock clocked, I ran Memtest86 for 8 pass (one day and half) without errors, never had a single BSOD.
In November, after the Win 10 10586 update I started to notice that on idle the "system and compressed memory" was always running with around 12%-13% CPU usage (CPU 0 was fully loaded).
I tried the hell out to debug the issue, uninstalled ALL, checked/tweaked all, with no results but after some days of tampering the issue disappeared without I was able to understand exactly why.
In any case once the system was fixed I started reinstalling drivers and apps, checking every step if the issue reappeared, and all was good.
The PC ran perfectly until last week.
Up to my memory I made some minor windows updates, a BIOS update (for fixing the Prime95 issue with latest CPU microcode) and updated to latest Nvidia drivers and a X-rite screen calibration program.
At one (not specific) point I noticed the weird issue again, the damn "system and compressed memory" to 12%-13% CPU usage, always and immediately after boot as previously happened.
I reverted back all the changes, reverted back to the previous BIOS and settings too but without solving. Nothing was changed in the hardware.
At this point I have the following setup:
- Win 10 Pro (10586.63)
- Disabled Page file
- Disabled SuperFetch
- Disabled Sleep and Hibernation
- Power profile on maximum performance
- Disabled Cortana and Indexing and whatever background possible thing
- Disabled RunFullMemoryDiagnosticEntry form Task Scheduler, as suggested in another answer here on Superuser
- Intel Graphics is disabled in BIOS and no drivers installed
- all drivers are again up to date
- the system is 100% clean, only few original trusted programs installed, never used for browsing or any other internet activity apart of Windows Update.
To make the thing even worst I can tell you that the issue is present even booting in Safe Mode.
I run several traces of the CPU with WPR, one even in safe mode, the related ETL is provided here:
ETL CPU trace in safe mode zip file
From what I can see in the WPA Analysis the culprit is in
hal.dll -> HalpReadPCIConfig function as per following screenshot attached
I have tried investigating if this could be a PCI conflict but the hardware was not changed and the same hardware, BIOS and settings run smooth for more than one month without the issue, so I tend to exclude hardware causes.
On the other side the fact that the issue appears even in safe mode cut out a driver suspect too so... no idea...
Yes, I admit I have reached my limit,
if you have any suggestion please let me see the light,
the only thing I would avoid is to make a total reinstall, because there are a lot of little things I have configured that I'm really frustrated to do again, at least without before having found the clear answer regarding what caused the issue:
I would not risk to reinstall and setup all and then having still the possibility to see this happening again because I have not found the real cause (the appearing/disappearing phenomenon without apparent explanation is really worrying)...
Sorry for the long post, I hope that this could not be considered a duplicated question because I have tried all the workarounds present in the other answers without success.
Many thanks in advance.
Ciao.
Andrea :)
Answer
After a lot of debugging work I have decided to put here a preliminary answer with the description of what I have done, because I was able to solve the issue.
In my opinion it should be simply considered a temporary workaround because, given the past reoccurring behavior I want to keep the things under control and see what could happen with future windows/drivers/bios updates before claiming a definitive victory.
I started to make a series of PC reboot, entering the BIOS each time and disabling one-to-one all the motherboard devices. Every time I cumulatively disabled a single device and then I booted to Windows because I wanted a step by step workflow in order to possibly exactly identify the offending resource.
- disabled CPU VtD, fast boot, logo, block num, Trusted Platform, Power Management, wake on LAN, bios guard: no effect
- disabled serial port: no effect
- disabled CPU integrated graphics: no effect
- disabled unused SATA ports: no effect
- disabled onboard Realtek audio: no effect
- disabled onboard Thunderbolt 3 (Intel Alpine Ridge) controller: no effect
- disabled completely the Intel SATA controller (still able to boot from PCI nvme SSD): no effect
- disabled onboard Intel network adapter AND "IOAPIC 24-119 entries"
(NOTE: at this point only the CPU, PCI slots and USB ports were enabled, impossible to go further): SOLVED!
After the last windows reboot the CPU was on 0.2% and "system and compressed memory" never raised up again.
Too bad that at the last step I made two disabling together and not one alone.
After that I started to step by step re-enable ALL the relevant devices in reverse order and the issue never appeared back. This is really curious and it prevents me to replicate the effect at this point.
However, now are several days the PC works perfectly, I have made some minor windows updates and all is OK. I have still not tried to update to latest Nvidia driver (361.75 released yesterday), but at the moment I will wait because I don't want to recalibrate my monitor and I have seen there are some issues with the preliminary Thunderbolt 3 support added, so I will skip this.
CONCLUSION:
As suspected, the debugging work confirmed to me that the issue was not strictly hardware related (failure or conflict), neither a related driver one (because it was present even in safe mode). In this case it should have been reappeared once the conflicting device was enabled again.
I strongly think that in the past (and twice) something went wrong inside the windows configuration, probably during a windows/driver/bios update, due to an erroneous behavior of Windows resource management. After that happened it was difficult to correctly "override" the setup, even with selective hardware disabling.
After freeing up a lot of resources/irq disabling all the devices the ultimate resolving factor in my opinion was the disabling of the IOPIC 24-119 entries remap: probably this forced windows to reallocate their resources configuration from scratch and this happened successfully. After that even by enabling again the bios setting and the mb devices resulted in any case in a final better configuration without wrongly triggering again the "system and compressed memory" high cpu load (which was caused by the hal.dll -> PCI stuff, as visible in the ETL trace).
Being currently not able to replicate the phenomenon again I will keep the whole issue in stand-by for now.
I will keep this post updated if something else happens or I find something more to share.
I still hope you could appreciate my efforts and that the described results could be useful for someone else.
Thanks, ciao.
Andrea :)
No comments:
Post a Comment