Can I safely ignore I/O errors on dm devices?

The root user of a system using may occasionally receive a message similar to the following in the daily logwatch email:

--------------------- Kernel Begin ------------------------

WARNING:  Kernel Errors Present
    Buffer I/O error on device dm-7,  ...:  11 Time(s)
    EXT3-fs error (device dm-7): e ...:  90 Time(s)
    lost page write due to I/O error on dm-7 ...:  11 Time(s)

Likewise, you may notice similar error messages in the /var/log/messages file:

May 16 04:04:52 eclipse kernel: lost page write due to I/O error on dm-20       
May 16 04:04:52 eclipse kernel: Buffer I/O error on device dm-20, logical block 0
May 16 04:04:52 eclipse kernel: lost page write due to I/O error on dm-20       
May 16 04:04:52 eclipse kernel: Buffer I/O error on device dm-20, logical block 0
May 16 04:04:52 eclipse kernel: lost page write due to I/O error on dm-20       
May 16 04:04:52 eclipse kernel: Buffer I/O error on device dm-20, logical block 0
May 16 04:04:52 eclipse kernel: lost page write due to I/O error on dm-20       
 

If the device mapper (dm-n) device(s) mentioned in the messages refer to a snapshot logical volume (LV), these messages can be ignored. By their definition, snapshot LVs are temporal in nature; they are created, destroyed and expire when changes written to them exceed their predefined capacity.

To determine if the dm device points to a snapshot LV:

First, locate the “dm” device number in the logs (in our example, 20):

[root@eclipse ~]# grep "I/O error" /var/log/messages
May 16 04:04:52 eclipse kernel: Buffer I/O error on device dm-20, logical block 1545
May 16 04:04:52 eclipse kernel: lost page write due to I/O error on dm-20

Next, list the /dev/mapper/* devices, noting the minor device numbers of each, which correspond with the “dm” device number (in our example, 20):

[root@eclipse ~]# ls -l /dev/mapper/ | grep 20
brw-rw---- 1 root disk 253, 20 May 16 14:00 datavg-lvol1

Finally, list the LVs in the noted volume group to determine whether or not it’s a snapshot, signified by the “s” in Attr column and presence of Origin and Snap% values:

[root@eclipse ~]# lvs datavg
  LV       VG     Attr   LSize   Origin   Snap%  Move Log Copy%  Convert
  eclipse  datavg owi-ao 395.00G                                        
  ereports datavg owi-ao   1.00G                                        
  lvol0    datavg swi-ao   1.00G u2         0.41                        
  lvol1    datavg swi-ao  34.82G eclipse    0.74                        
  lvol2    datavg swi-ao   1.00G ereports   0.00                        
  lvol3    datavg swi-ao   3.61G pdw        0.01                        
  pdw      datavg owi-ao  45.00G                                        
  u2       datavg owi-ao   4.00G                                        
  uvtmp    datavg -wi-ao   4.00G

In our example, the Origin LV is /dev/datavg/eclipse, and the dm-20 device referenced in the error messages is indeed a snapshot LV.

If the dm-n device(s) mentioned in the messages do not refer to a snapshot logical volume (LV), you may have a filesystem, software or hardware issue, and you should contact your Red Hat support provider.

One thought on “Can I safely ignore I/O errors on dm devices?”

Comments are closed.