Sunday, August 12, 2018

raid - XFS in-memory corruption after hibernate

I am constantly getting the below errors with XFS filesystem present on software raid-1 which was then converted to 3-disk raid-5. Errors happen exclusively after hibernate, normally either immediately or a few minutes past. dmesg tells (full dmesg output is here: http://bpaste.net/show/130895/):



[155389.814032] PM: restore of devices complete after 1700.425 msecs
[155389.814783] Restarting tasks ... done.
[155390.161993] r8168: enp2s0: link up
[155392.181215] r8168: enp2s0: link up
[155398.859967] sd 7:0:0:0: [sdh] No Caching mode page present
[155398.859972] sd 7:0:0:0: [sdh] Assuming drive cache: write through
[155398.876927] sd 7:0:0:0: [sdh] No Caching mode page present
[155398.876932] sd 7:0:0:0: [sdh] Assuming drive cache: write through

[155398.877945] sdh:
[155690.215471] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 342 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff812049d1

[155690.215478] CPU: 5 PID: 17532 Comm: kworker/5:0 Tainted: P O 3.10.7-gentoo #1
[155690.215481] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 R2.0, BIOS 0601 07/17/2012
[155690.215490] Workqueue: xfsalloc xfs_bmapi_allocate_worker
[155690.215493] ffffffff81565b8a 0000000000000071 ffffffff81201c57 ffff880418328000
[155690.215498] ffff880418328270 ffff8803124ee460 0000000081206839 0000000000000800
[155690.215502] ffff8803990ffd18 ffff880418328000 0000000000000800 0000000000000800
[155690.215506] Call Trace:

[155690.215514] [] ? dump_stack+0xd/0x17
[155690.215520] [] ? xfs_alloc_fixup_trees+0x1e7/0x370
[155690.215524] [] ? xfs_alloc_ag_vextent_near+0xa21/0xd90
[155690.215528] [] ? xfs_alloc_ag_vextent+0xbd/0xf0
[155690.215532] [] ? xfs_alloc_vextent+0x478/0x800
[155690.215536] [] ? xfs_bmap_btalloc_nullfb+0x316/0x350
[155690.215541] [] ? xfs_bmap_btalloc+0x31a/0x770
[155690.215546] [] ? internal_add_timer+0x18/0x50
[155690.215551] [] ? internal_add_timer+0x18/0x50
[155690.215556] [] ? __xfs_bmapi_allocate+0xcd/0x2e0

[155690.215560] [] ? xfs_bmapi_allocate_worker+0x3c/0x70
[155690.215566] [] ? process_one_work+0x150/0x480
[155690.215570] [] ? manage_workers.isra.26+0x1aa/0x2b0
[155690.215575] [] ? worker_thread+0x114/0x370
[155690.215579] [] ? manage_workers.isra.26+0x2b0/0x2b0
[155690.215584] [] ? kthread+0xb3/0xc0
[155690.215588] [] ? async_run_entry_fn+0xf0/0x120
[155690.215593] [] ? kthread_freezable_should_stop+0x60/0x60
[155690.215598] [] ? ret_from_fork+0x7c/0xb0
[155690.215603] [] ? kthread_freezable_should_stop+0x60/0x60

[155690.215619] XFS (md1): page discard on page ffffea000d1df580, inode 0x22057716, offset 8323072.
[155720.362810] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 342 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff812049d1

<...> (a big bunch of similar errors skipped)

[156100.313075] CPU: 4 PID: 27035 Comm: kworker/4:2 Tainted: P O 3.10.7-gentoo #1
[156100.313078] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 R2.0, BIOS 0601 07/17/2012
[156100.313099] Workqueue: xfsalloc xfs_bmapi_allocate_worker
[156100.313103] ffffffff81565b8a 0000000000000071 ffffffff81201c57 ffff88041a811d00
[156100.313107] ffff88041a811dd0 0000000000000001 000000007f95bfd8 0000000000000002

[156100.313111] ffff88017f95bd18 ffff88041a811d00 0000000000000001 0000000000000001
[156100.313115] Call Trace:
[156100.313123] [] ? dump_stack+0xd/0x17
[156100.313129] [] ? xfs_alloc_fixup_trees+0x1e7/0x370
[156100.313133] [] ? xfs_alloc_ag_vextent_near+0x96a/0xd90
[156100.313138] [] ? xfs_alloc_ag_vextent+0xbd/0xf0
[156100.313141] [] ? xfs_alloc_vextent+0x478/0x800
[156100.313146] [] ? xfs_bmap_btalloc_nullfb+0x316/0x350
[156100.313150] [] ? xfs_bmap_btalloc+0x31a/0x770
[156100.313156] [] ? internal_add_timer+0x18/0x50

[156100.313161] [] ? internal_add_timer+0x18/0x50
[156100.313165] [] ? __xfs_bmapi_allocate+0xcd/0x2e0
[156100.313170] [] ? xfs_bmapi_allocate_worker+0x3c/0x70
[156100.313176] [] ? process_one_work+0x150/0x480
[156100.313186] [] ? worker_thread+0x114/0x370
[156100.313208] [] ? manage_workers.isra.26+0x2b0/0x2b0
[156100.313214] [] ? kthread+0xb3/0xc0
[156100.313228] [] ? async_run_entry_fn+0xf0/0x120
[156100.313239] [] ? kthread_freezable_should_stop+0x60/0x60
[156100.313249] [] ? ret_from_fork+0x7c/0xb0

[156100.313258] [] ? kthread_freezable_should_stop+0x60/0x60
[156100.313275] XFS (md1): page discard on page ffffea0008f25340, inode 0x22057716, offset 8499200.
[156155.266439] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1617 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff81205f1c

[156155.266443] CPU: 4 PID: 32209 Comm: QThread Tainted: P O 3.10.7-gentoo #1
[156155.266444] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 R2.0, BIOS 0601 07/17/2012
[156155.266446] ffffffff81565b8a 0000000000000070 ffffffff81202e8c ffff88041b3c3980
[156155.266448] ffff88041821ee40 0000000000000000 0000000000000003 ffff88041b3c3980
[156155.266449] ffff880417bf6800 0000000000000000 ffff8801eac77c5c 0000000800000000
[156155.266451] Call Trace:

[156155.266456] [] ? dump_stack+0xd/0x17
[156155.266460] [] ? xfs_free_ag_extent+0x53c/0x850
[156155.266461] [] ? xfs_free_extent+0xec/0x130
[156155.266463] [] ? kmem_zone_alloc+0x5e/0xe0
[156155.266465] [] ? xfs_bmap_finish+0x16a/0x1b0
[156155.266467] [] ? xfs_itruncate_extents+0x103/0x320
[156155.266469] [] ? xfs_inactive+0x32e/0x450
[156155.266470] [] ? xfs_fs_evict_inode+0x4b/0x130
[156155.266473] [] ? evict+0xa7/0x1b0
[156155.266476] [] ? do_unlinkat+0x19c/0x1f0

[156155.266477] [] ? SyS_newstat+0x23/0x30
[156155.266480] [] ? system_call_fastpath+0x16/0x1b
[156155.266483] XFS (md1): xfs_do_force_shutdown(0x8) called from line 916 of file fs/xfs/xfs_bmap.c. Return address = 0xffffffff812193d3
[156155.445552] XFS (md1): Corruption of in-memory data detected. Shutting down filesystem
[156155.445557] XFS (md1): Please umount the filesystem and rectify the problem(s)
[156160.004902] XFS (md1): xfs_log_force: error 5 returned.
[156190.132832] XFS (md1): xfs_log_force: error 5 returned.
[156220.260719] XFS (md1): xfs_log_force: error 5 returned.
[156250.388550] XFS (md1): xfs_log_force: error 5 returned.
[156280.516400] XFS (md1): xfs_log_force: error 5 returned.

[156310.644246] XFS (md1): xfs_log_force: error 5 returned.
[156340.772019] XFS (md1): xfs_log_force: error 5 returned.
[156370.899941] XFS (md1): xfs_log_force: error 5 returned.
[156401.027736] XFS (md1): xfs_log_force: error 5 returned.
[156431.155576] XFS (md1): xfs_log_force: error 5 returned.
[156461.283434] XFS (md1): xfs_log_force: error 5 returned.
[156491.411366] XFS (md1): xfs_log_force: error 5 returned.
[156521.539215] XFS (md1): xfs_log_force: error 5 returned.
[156551.666963] XFS (md1): xfs_log_force: error 5 returned.
[156581.795447] XFS (md1): xfs_log_force: error 5 returned.

[156611.922687] XFS (md1): xfs_log_force: error 5 returned.
[156642.050630] XFS (md1): xfs_log_force: error 5 returned.
[156672.178470] XFS (md1): xfs_log_force: error 5 returned.
[156702.306332] XFS (md1): xfs_log_force: error 5 returned.
[156732.434176] XFS (md1): xfs_log_force: error 5 returned.
[156762.561988] XFS (md1): xfs_log_force: error 5 returned.


Kernel version is 3.10.7, saw the same error on 3.8.13. Note that md1 is not the only RAID device used for XFS filesystem: I also hold / on RAID1 (SSD+HDD).

No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...