Valid HTML 4.01 Transitional

Linux (3.11.x) Filesystem Horkage

James F. Carter <jimc@math.ucla.edu>, 2014-04-07

UCLA Mathematics Department has about 100 machines running OpenSuSE 13.1 Linux with kernel 3.11.6. After two servers were upgraded to this distro version (Cedar and Walnut), they repeatedly had alleged filesystem corruption leading to the filesystem being mounted readonly. (It is not practical to clear this on a live system, so we rebooted each time; the filesystem was always repairable automatically, so far.) We badly need to solve this issue. So what are the symptoms, as close as can be described?

So would could be the cause, and the cure?

Current Plan for Interventions

Here is what we're doing on Cedar:

Appendix: Syslog Output

Apr  7 13:32:16 cedar kernel: [846079.180677] JBD2 unexpected failure: jbd2_jour
nal_revoke: !buffer_revoked(bh);
Apr  7 13:32:16 cedar kernel: [846079.180682] inconsistent data on disk
Apr  7 13:32:16 cedar kernel: [846079.180685] EXT4-fs: ext4_alloc_branch:395: ab
orting transaction: IO failure in __ext4_forget
Apr  7 13:32:16 cedar kernel: [846079.195435] EXT4-fs error (device sdb2): ext4_
alloc_branch:395: error -5 when attempting revoke
Apr  7 13:32:16 cedar kernel: [846079.195442] EXT4-fs (sdb2): Remounting filesystem read-only
Apr  7 13:32:16 cedar kernel: [846079.195445] Aborting journal on device sdb2-8.
Apr  7 13:32:16 cedar kernel: [846079.219374] EXT4-fs error (device sdb2) in ext4_free_blocks:4818: Journal has aborted
Apr  7 13:32:16 cedar kernel: [846079.225946] EXT4-fs error (device sdb2) in ext4_orphan_add:2616: Journal has aborted
Apr  7 13:32:16 cedar kernel: [846079.231938] EXT4-fs error (device sdb2) in ext4_write_begin:999: IO failure
(wait 23 secs)
Apr  7 13:32:39 cedar kernel: [846102.151759] ------------[ cut here ]------------
Apr  7 13:32:39 cedar kernel: [846102.151768] WARNING: CPU: 9 PID: 340 at /home/abuild/rpmbuild/BUILD/kernel-default-3.11.6/linux-3.11/fs/buffer.c:1194 __brelse+0x2f/0x40()
Apr  7 13:32:39 cedar kernel: [846102.151769] VFS: brelse: Trying to free free buffer
Apr  7 13:32:39 cedar kernel: [846102.151770] Modules linked in: md5 sha1_generic cbc cts nfsv3 nfs fscache rpcsec_gss_krb5 nfsd lockd nfs_acl auth_rpcgss sunrpc oid_registry quota_v2 quota_tree coretemp binfmt_misc kvm iTCO_wdt joydev gpio_ich iTCO_vendor_support dcdbas crc32c_intel pcspkr serio_raw sr_mod cdrom lpc_ich ata_generic ses enclosure i7core_edac edac_core bnx2 shpchp button mperf sg ipmi_devintf ipmi_si ipmi_msghandler dm_mod autofs4 hid_generic usbhid hid mgag200 ttm drm_kms_helper uhci_hcd drm ehci_pci i2c_algo_bit ehci_hcd sysimgblt sysfillrect i2c_core syscopyarea usbcore usb_common scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh fan ata_piix libata megaraid_sas thermal processor thermal_sys hwmon
Apr  7 13:32:39 cedar kernel: [846102.151812] CPU: 9 PID: 340 Comm: systemd-journal Not tainted 3.11.6-4-default #1
Apr  7 13:32:39 cedar kernel: [846102.151813] Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 1.1.4 05/08/2009
Apr  7 13:32:39 cedar kernel: [846102.151815]  00000009 c06abefe f3579978 c02422
7e c0832590 f3579990 00000154 c0832508
Apr  7 13:32:39 cedar kernel: [846102.151820]  000004aa c036ad8f c036ad8f 00000008 00000008 c2d4de48 e32c3e48 c02422d3
Apr  7 13:32:39 cedar kernel: [846102.151824]  00000009 f3579978 c0832590 f3579990 c036ad8f c0832508 000004aa c0832590
Apr  7 13:32:39 cedar kernel: [846102.151828] Call Trace:
Apr  7 13:32:39 cedar kernel: [846102.151836]  [] try_stack_unwind+0x179/0x190
Apr  7 13:32:39 cedar kernel: [846102.151841]  [] dump_trace+0x47/0xf0
Apr  7 13:32:39 cedar kernel: [846102.151845]  [] show_trace_log_lvl+0x3f/0x50
Apr  7 13:32:39 cedar kernel: [846102.151847]  [] show_stack_log_lvl+0x50/0xd0
Apr  7 13:32:39 cedar kernel: [846102.151850]  [] show_stack+0x1f/0x40
Apr  7 13:32:39 cedar kernel: [846102.151855]  [] dump_stack+0x3e/0x4e
Apr  7 13:32:39 cedar kernel: [846102.151859]  [] warn_slowpath_common+0x7e/0xa0
Apr  7 13:32:39 cedar kernel: [846102.151862]  [] warn_slowpath_fmt+0x33/0x40
Apr  7 13:32:39 cedar kernel: [846102.151865]  [] __brelse+0x2f/0x40
Apr  7 13:32:39 cedar kernel: [846102.151869]  [] __find_get_block+0x1bf/0x1f0
Apr  7 13:32:39 cedar kernel: [846102.151872]  [] __getblk+0x2c/0x310
Apr  7 13:32:39 cedar kernel: [846102.151877]  [] ext4_read_block_bitmap_nowait+0x46/0x300
Apr  7 13:32:39 cedar kernel: [846102.151881]  [] ext4_read_block_bitmap+0xc/0x40
Apr  7 13:32:39 cedar kernel: [846102.151885]  [] ext4_mb_mark_diskspace_used+0x3d/0x510
Apr  7 13:32:39 cedar kernel: [846102.151888]  [] ext4_mb_new_blocks+0x296/0x580
Apr  7 13:32:39 cedar kernel: [846102.151894]  [] ext4_alloc_branch+0x2ca/0x310
Apr  7 13:32:39 cedar kernel: [846102.151898]  [] ext4_ind_map_blocks+0x1ab/0x740
Apr  7 13:32:39 cedar kernel: [846102.151900]  [] ext4_map_blocks+0x14c/0x570
Apr  7 13:32:39 cedar kernel: [846102.151903]  [] _ext4_get_block+0x62/0x160
Apr  7 13:32:39 cedar kernel: [846102.151906]  [] __block_write_begin+0x140/0x380
Apr  7 13:32:39 cedar kernel: [846102.151909]  [] ext4_write_begin+0x177/0x410
Apr  7 13:32:39 cedar kernel: [846102.151913]  [] generic_file_buffered_write+0x105/0x260
Apr  7 13:32:39 cedar kernel: [846102.151916]  [] __generic_file_aio_write+0x26c/0x4b0
Apr  7 13:32:39 cedar kernel: [846102.151919]  [] generic_file_aio_write+0x59/0xc0
Apr  7 13:32:39 cedar kernel: [846102.151922]  [] ext4_file_write+0xd4/0x4e0
Apr  7 13:32:39 cedar kernel: [846102.151926]  [] do_sync_write+0x6b/0xa0
Apr  7 13:32:39 cedar kernel: [846102.151930]  [] vfs_write+0x95/0x1b0
Apr  7 13:32:39 cedar kernel: [846102.151933]  [] SyS_pwrite64+0x6a/0xa0
Apr  7 13:32:39 cedar kernel: [846102.151937]  [] sysenter_do_call+0x12/0x28
Apr  7 13:32:39 cedar kernel: [846102.151942]  [] 0xb77a142f
Apr  7 13:32:39 cedar kernel: [846102.151943] ---[ end trace 2141fd7e68c3f778 ]---
(wait 46 secs)
Apr  7 13:33:15 cedar kernel: [846137.980264] ------------[ cut here ]------------
Apr  7 13:33:15 cedar kernel: [846137.980273] WARNING: CPU: 15 PID: 2054 at /home/abuild/rpmbuild/BUILD/kernel-default-3.11.6/linux-3.11/fs/buffer.c:1194 __brelse+0x2f/0x40()
Apr  7 13:33:15 cedar kernel: [846137.980274] VFS: brelse: Trying to free free buffer
Apr  7 13:33:15 cedar kernel: [846137.980275] Modules linked in: md5 sha1_generic cbc cts nfsv3 nfs fscache rpcsec_gss_krb5 nfsd lockd nfs_acl auth_rpcgss sunrpc oid_registry quota_v2 quota_tree coretemp binfmt_misc kvm iTCO_wdt joydev gpio_ich iTCO_vendor_support dcdbas crc32c_intel pcspkr serio_raw sr_mod cdrom lpc_ich ata_generic ses enclosure i7core_edac edac_core bnx2 shpchp button mperf sg ipmi_devintf ipmi_si ipmi_msghandler dm_mod autofs4 hid_generic usbhid hid mgag200 ttm drm_kms_helper uhci_hcd drm ehci_pci i2c_algo_bit ehci_hcd sysimgblt sysfillrect i2c_core syscopyarea usbcore usb_common scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh fan ata_piix libata megaraid_sas thermal processor thermal_sys hwmon
Apr  7 13:33:15 cedar kernel: [846137.980317] CPU: 15 PID: 2054 Comm: qmgr Tainted: G        W    3.11.6-4-default #1
Apr  7 13:33:15 cedar kernel: [846137.980318] Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 1.1.4 05/08/2009
Apr  7 13:33:15 cedar kernel: [846137.980320]  00000009 c06abefe d6de1bc8 c024227e c0832590 d6de1be0 00000806 c0832508
Apr  7 13:33:15 cedar kernel: [846137.980324]  000004aa c036ad8f c036ad8f 00000008 00000008 ec9f5cc0 e32c3e48 c02422d3
Apr  7 13:33:15 cedar kernel: [846137.980328]  00000009 d6de1bc8 c0832590 d6de1be0 c036ad8f c0832508 000004aa c0832590
Apr  7 13:33:15 cedar kernel: [846137.980333] Call Trace:
Apr  7 13:33:15 cedar kernel: [846137.980341]  [] try_stack_unwind+0x179/0x190
Apr  7 13:33:15 cedar kernel: [846137.980346]  [] dump_trace+0x47/0xf0
Apr  7 13:33:15 cedar kernel: [846137.980349]  [] show_trace_log_lvl+0x3f/0x50
Apr  7 13:33:15 cedar kernel: [846137.980352]  [] show_stack_log_lvl+0x50/0xd0
Apr  7 13:33:15 cedar kernel: [846137.980355]  [] show_stack+0x1f/0x40
Apr  7 13:33:15 cedar kernel: [846137.980360]  [] dump_stack+0x3e/0x4e
Apr  7 13:33:15 cedar kernel: [846137.980365]  [] warn_slowpath_common+0x7e/0xa0
Apr  7 13:33:15 cedar kernel: [846137.980368]  [] warn_slowpath_fmt+0x33/0x40
Apr  7 13:33:15 cedar kernel: [846137.980371]  [] __brelse+0x2f/0x40
Apr  7 13:33:15 cedar kernel: [846137.980374]  [] __find_get_block+0x1bf/0x1f0
Apr  7 13:33:15 cedar kernel: [846137.980378]  [] __getblk+0x2c/0x310
Apr  7 13:33:15 cedar kernel: [846137.980382]  [] __ext4_get_inode_loc
+0xf7/0x490
Apr  7 13:33:15 cedar kernel: [846137.980386]  [] ext4_reserve_inode_write+0x16/0x70
Apr  7 13:33:15 cedar kernel: [846137.980389]  [] ext4_mark_inode_dirty+0x34/0x1b0
Apr  7 13:33:15 cedar kernel: [846137.980392]  [] add_dirent_to_buf+0x106/0x190
Apr  7 13:33:15 cedar kernel: [846137.980395]  [] ext4_add_entry+0x685/0xb90
Apr  7 13:33:15 cedar kernel: [846137.980399]  [] ext4_rename+0x22b/0x950
Apr  7 13:33:15 cedar kernel: [846137.980404]  [] vfs_rename+0x113/0x4d0
Apr  7 13:33:15 cedar kernel: [846137.980408]  [] SYSC_renameat+0x2ac/0x2c0
Apr  7 13:33:15 cedar kernel: [846137.980412]  [] SyS_renameat+0x1b/0x20
Apr  7 13:33:15 cedar kernel: [846137.980415]  [] SyS_rename+0x27/0x30
Apr  7 13:33:15 cedar kernel: [846137.980418]  [] sysenter_do_call+0x12/0x28
Apr  7 13:33:15 cedar kernel: [846137.980423]  [] 0xb774a42f
Apr  7 13:33:15 cedar kernel: [846137.980425] ---[ end trace 2141fd7e68c3f779 ]-
(wait 1 hour almost exactly) (irrelevancies snipped)
Apr  7 14:33:55 cedar systemd[1]: Stopping Session 17899 of user root.
Apr  7 14:34:13 cedar systemd[482]: Reached target Shutdown.
Apr  7 14:34:14 cedar rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="993" x-info="http://www.rsyslog.com"] exiting on signal 15.