Tuesday, May 14, 2019

linux - Processes accessing USB device freeze and become uninteruptible


I have a program that is supposed to write a large number of files onto an external USB-3 drive, with NTFS filesystem. There seems to be some problem with this: at first it periodically freezes for some minutes, then it freezes permanently.


The process becomes uninterruptible. See screenshot: system monitor.


The waiting channel is "read_descriptor". All of these processes (tried launching it multiple times), have openned the file /sys/.../usb4/descriptors.


In this state it seems all commands that access the USB device freeze. Including:



  • lsusb

  • cat /sys/kernel/debug/usb/devices

  • USB Reset scripts which call /sys/..../unbind


After I tried to unmount the partition, after using umount --force (possibly also other commands, not sure exactly anymore), it did unmount, and no longer appears when calling mount. However in the Disks application, it still appears as "Unmounting Filesystem".


Also the disk has a lot of bad sectors (984 already). It is a completely new drive. It seems to get bad sectors when writing to it from Linux.


Disks application


It there any way to restart the USB subsystem / force the device to be disconnected, without restarting the system? (update-grub also blocks, and the default setting of the bootloader menu is wrongly set, so I can't remotely connect after rebooting).


And what could cause this problem with the USB drive?


The system also seemed to have similar problems with another external USB drive (reading slowing down, to less than 1 MB/s).


The system is Ubuntu linux, 16.04.2 LTS, Xenial
on a 64 bit machine


Update:


lspci:


00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller
00:16.0 Communication controller: Intel Corporation 9 Series Chipset Family ME Interface #1
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V
00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2
00:1b.0 Audio device: Intel Corporation 9 Series Chipset Family HD Audio Controller
00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 (rev d0)
00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d0)
00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1
00:1f.0 ISA bridge: Intel Corporation 9 Series Chipset Family H97 Controller
00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode]
00:1f.3 SMBus: Intel Corporation 9 Series Chipset Family SMBus Controller
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GT 740] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GK107 HDMI Audio Controller (rev a1)
03:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 04)



lsmod:


Module                  Size  Used by
btrfs 987136 0
xor 24576 1 btrfs
raid6_pq 102400 1 btrfs
ufs 73728 0
qnx4 16384 0
hfsplus 106496 0
hfs 57344 0
minix 36864 0
ntfs 98304 0
msdos 20480 0
jfs 180224 0
xfs 970752 0
libcrc32c 16384 1 xfs
pci_stub 16384 1
vboxpci 24576 0
vboxnetadp 28672 0
vboxnetflt 28672 0
vboxdrv 454656 3 vboxnetadp,vboxnetflt,vboxpci
binfmt_misc 20480 1
snd_hda_codec_hdmi 53248 1
eeepc_wmi 16384 0
nvidia_uvm 745472 0
asus_wmi 28672 1 eeepc_wmi
mxm_wmi 16384 0
sparse_keymap 16384 1 asus_wmi
intel_rapl 20480 0
x86_pkg_temp_thermal 16384 0
intel_powerclamp 16384 0
coretemp 16384 0
kvm_intel 172032 0
kvm 544768 1 kvm_intel
irqbypass 16384 1 kvm
snd_hda_codec_realtek 86016 1
crct10dif_pclmul 16384 0
snd_hda_codec_generic 77824 1 snd_hda_codec_realtek
crc32_pclmul 16384 0
ghash_clmulni_intel 16384 0
snd_hda_intel 40960 5
aesni_intel 167936 0
snd_hda_codec 135168 4 snd_hda_codec_realtek,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_intel
snd_seq_midi 16384 0
aes_x86_64 20480 1 aesni_intel
snd_seq_midi_event 16384 1 snd_seq_midi
snd_hda_core 73728 5 snd_hda_codec_realtek,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec,snd_hda_intel
lrw 16384 1 aesni_intel
snd_hwdep 16384 1 snd_hda_codec
gf128mul 16384 1 lrw
snd_rawmidi 32768 1 snd_seq_midi
glue_helper 16384 1 aesni_intel
snd_seq 69632 2 snd_seq_midi_event,snd_seq_midi
snd_pcm 106496 4 snd_hda_codec_hdmi,snd_hda_codec,snd_hda_intel,snd_hda_core
ablk_helper 16384 1 aesni_intel
snd_seq_device 16384 3 snd_seq,snd_rawmidi,snd_seq_midi
cryptd 20480 3 ghash_clmulni_intel,aesni_intel,ablk_helper
snd_timer 32768 2 snd_pcm,snd_seq
snd 81920 21 snd_hda_codec_realtek,snd_hwdep,snd_timer,snd_hda_codec_hdmi,snd_pcm,snd_seq,snd_rawmidi,snd_hda_codec_generic,snd_hda_codec,snd_hda_intel,snd_seq_device
mei_me 36864 0
soundcore 16384 1 snd
input_leds 16384 0
mei 98304 1 mei_me
lpc_ich 24576 0
shpchp 36864 0
serio_raw 16384 0
tpm_infineon 20480 0
8250_fintek 16384 0
wmi 20480 2 mxm_wmi,asus_wmi
acpi_pad 24576 0
mac_hid 16384 0
parport_pc 32768 1
ppdev 20480 0
lp 20480 0
parport 49152 3 lp,ppdev,parport_pc
autofs4 40960 2
hid_generic 16384 0
usbhid 49152 0
hid 118784 2 hid_generic,usbhid
uas 24576 8
usb_storage 69632 1 uas
nvidia_drm 53248 1
nvidia_modeset 778240 4 nvidia_drm
drm_kms_helper 155648 1 nvidia_drm
syscopyarea 16384 1 drm_kms_helper
sysfillrect 16384 1 drm_kms_helper
sysimgblt 16384 1 drm_kms_helper
fb_sys_fops 16384 1 drm_kms_helper
drm 364544 4 drm_kms_helper,nvidia_drm
nvidia 11931648 63 nvidia_modeset,nvidia_uvm
ahci 36864 3
e1000e 237568 0
libahci 32768 1 ahci
ptp 20480 1 e1000e
pps_core 20480 1 ptp
fjes 28672 0
video 40960 1 asus_wmi

Answer



If the "waiting channel" is in "read_descriptor" state, it means that the USB channel went into heavy recovery after a fairly serious hardware problem, because the "descriptor stage" occurs only on port reset, and port reset happens only when an unrecoverable transaction error occurs.


The fact that it works under Windows only means that the OS software likely engages some different hardware configuration and controller/PHY parameters.


I strongly suspect that the Link Power Management (LPM) is at fault here. The Linux distribution likely enables all bells and whistles and latest and greatest, while Windows might use so-called Intel-devised "filter driver" to fix some controller deficiencies.


The LPM states U1 and U2 occur on hardware level, so they are likely invisible from software side. To determine if the link goes back and forth into LPM states, you would need a super-speed USB protocol analyzer, Ellisys 280 or Teledyne LeCroy Advisor T3, or some other tool that detect LPM states on Super-Speed link, like this much less expensive tool.


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...