Ошибка при восстановлении Linux raid-1

Question

Ошибка при восстановлении Linux raid-1

2595

2013-09-04 в 06:08

У меня есть linux box, который действует как домашний NAS с 2 x 1 ТБ жестких дисков в linux raid-1. Недавно один из двух дисков вышел из строя, поэтому я купил новый (1TB WD Blue) и поставил его. Восстановление начинается и останавливается на 7,8%, что приводит к ошибке, что / dev / sdd (хороший диск) имеет плохой блок, и процесс больше не может продолжаться. Попытка удалить / добавить новый диск, но процесс всегда останавливается в одной и той же точке. Хорошей новостью является то, что я все еще могу иметь доступ к своим данным, которые смонтированы в / storage (xfs fs). Ниже я приведу больше информации о проблеме:

Хороший (исходный) диск:

sudo fdisk -l /dev/sdd   WARNING: GPT (GUID Partition Table) detected on '/dev/sdd'! The util fdisk doesn't support GPT. Use GNU Parted.   Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000  Device Boot Start End Blocks Id System /dev/sdd1 63 1953525167 976762552+ da Non-FS data

Новый (целевой) жесткий диск:

sudo fdisk -l /dev/sdc  Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes 81 heads, 63 sectors/track, 382818 cylinders, total 1953525168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x5c5d0188  Device Boot Start End Blocks Id System /dev/sdc1 2048 1953525167 976761560 da Non-FS data

Массив raid-1:

cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md3 : active raid1 sdc1[3] sdd1[2] 976761382 blocks super 1.2 [2/1] [U_] [=>...................] recovery = 7.7% (75738048/976761382) finish=601104.0min speed=24K/sec

Dmesg (это сообщение повторяется много раз):

[35085.217154] ata10.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x0 [35085.217160] ata10.00: irq_stat 0x40000008 [35085.217163] ata10.00: failed command: READ FPDMA QUEUED [35085.217170] ata10.00: cmd 60/08:08:37:52:43/00:00:6d:00:00/40 tag 1 ncq 4096 in [35085.217170] res 41/40:00:3c:52:43/00:00:6d:00:00/40 Emask 0x409 (media error) <F> [35085.217173] ata10.00: status: { DRDY ERR } [35085.217175] ata10.00: error: { UNC } [35085.221619] ata10.00: configured for UDMA/133 [35085.221636] sd 9:0:0:0: [sdd] Unhandled sense code [35085.221639] sd 9:0:0:0: [sdd] [35085.221641] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [35085.221643] sd 9:0:0:0: [sdd] [35085.221645] Sense Key : Medium Error [current] [descriptor] [35085.221649] Descriptor sense data with sense descriptors (in hex): [35085.221651] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 [35085.221661] 6d 43 52 3c [35085.221666] sd 9:0:0:0: [sdd] [35085.221669] Add. Sense: Unrecovered read error - auto reallocate failed [35085.221671] sd 9:0:0:0: [sdd] CDB: [35085.221673] Read(10): 28 00 6d 43 52 37 00 00 08 00 [35085.221682] end_request: I/O error, dev sdd, sector 1833128508 [35085.221706] ata10: EH complete

mdadm подробно:

sudo mdadm --detail /dev/md3 /dev/md3: Version : 1.2 Creation Time : Fri Apr 13 19:10:18 2012 Raid Level : raid1 Array Size : 976761382 (931.51 GiB 1000.20 GB) Used Dev Size : 976761382 (931.51 GiB 1000.20 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent  Update Time : Wed Sep 4 08:57:46 2013 State : active, degraded, recovering Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1  Rebuild Status : 7% complete  Name : hypervisor:3 (local to host hypervisor) UUID : b758f8f1:a6a6862e:83133e3a:3b9830ea Events : 1257158  Number Major Minor RaidDevice State 2 8 49 0 active sync /dev/sdd1 3 8 33 1 spare rebuilding /dev/sdc1

Одна вещь, которую я заметил, состоит в том, что исходный жесткий диск (/ dev / sdd) имеет раздел, который начинается в 63 секторе, где новый диск (/ dev / sdc) начинается в секторе 2048. Связано ли это с проблемой? Есть ли способ сказать mdadm игнорировать этот плохой блок и продолжить перестройку массива? В качестве крайней меры я думал клонировать исходный (/ dev / sdd) диск на новый диск (/ dev / sdc) с помощью ddrescue (livecd), а затем поместить его в качестве исходного диска. Будет ли это работать?

Спасибо за любую помощь. Яннис

Я перераспределил оба / dev / sdd и / sdc. Итак, теперь выглядит так:

sudo fdisk -l -u /dev/sdc  Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes 81 heads, 63 sectors/track, 382818 cylinders, total 1953525168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x0002c2de  Device Boot Start End Blocks Id System /dev/sdc1 2048 1953525167 976761560 da Non-FS data    sudo fdisk -l -u /dev/sdd  Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes 23 heads, 12 sectors/track, 7077989 cylinders, total 1953525168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytess Disk identifier: 0x00069b7e  Device Boot Start End Blocks Id System /dev/sdd1 2048 1953525167 976761560 da Non-FS data

Это нормально?

Хорошо, я перестроил массив снова, а затем восстановил все данные из резервной копии. Все выглядит хорошо, за исключением того, что при перезагрузке / dev / md3 переименовывается в / dev / md127.

 Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]  md127 : active raid1 sdd1[0] sdc1[2] 976630336 blocks super 1.2 [2/2] [UU]  md1 : active raid0 sdb5[0] sda5[1] 7809024 blocks super 1.2 512k chunks  md2 : active raid0 sdb6[0] sda6[1] 273512448 blocks super 1.2 512k chunks  md0 : active raid1 sdb1[0] sda1[2] 15623096 blocks super 1.2 [2/2] [UU]  cat /etc/mdadm/mdadm.conf ARRAY /dev/md/0 metadata=1.2 UUID=5c541476:4ee0d591:615c1e5a:d58bc3f7 name=hypervisor:0 ARRAY /dev/md/1 metadata=1.2 UUID=446ba1de:407f8ef4:5bf728ff:84e223db name=hypervisor:1 ARRAY /dev/md/2 metadata=1.2 UUID=b91cba71:3377feb4:8a57c958:11cc3df0 name=hypervisor:2 ARRAY /dev/md/3 metadata=1.2 UUID=5c573747:61c40d46:f5981a8b:e818a297 name=hypervisor:3  sudo mdadm --examine --scan --verbose ARRAY /dev/md/0 level=raid1 metadata=1.2 num-devices=2 UUID=5c541476:4ee0d591:615c1e5a:d58bc3f7 name=hypervisor:0 devices=/dev/sdb1,/dev/sda1 ARRAY /dev/md/1 level=raid0 metadata=1.2 num-devices=2 UUID=446ba1de:407f8ef4:5bf728ff:84e223db name=hypervisor:1 devices=/dev/sdb5,/dev/sda5 ARRAY /dev/md/2 level=raid0 metadata=1.2 num-devices=2 UUID=b91cba71:3377feb4:8a57c958:11cc3df0 name=hypervisor:2 devices=/dev/sdb6,/dev/sda6 ARRAY /dev/md/3 level=raid1 metadata=1.2 num-devices=2 UUID=5c573747:61c40d46:f5981a8b:e818a297 name=hypervisor:3 devices=/dev/sdd1,/dev/sdc1  cat /etc/fstab # /etc/fstab: static file system information. # # Use 'blkid' to print the universally unique identifier for a # device; this may be used with UUID= as a more robust way to name devices # that works even if disks are added and removed. See fstab(5). # # <file system> <mount point> <type> <options> <dump> <pass> proc /proc proc nodev,noexec,nosuid 0 0 # / was on /dev/md0 during installation UUID=2e4543d3-22aa-45e1-8adb-f95cfe57a697 / ext4 noatime,errors=remount-ro,discard 0 1 #was /dev/md3 before UUID=13689e0b-052f-48f7-bf1f-ad857364c0d6 /storage ext4 defaults 0 2 # /vm was on /dev/md2 during installation UUID=9fb85fbf-31f9-43ff-9a43-3ebef9d37ee8 /vm ext4 noatime,errors=remount-ro,discard 0 2 # swap was on /dev/md1 during installation UUID=1815549c-9047-464e-96a0-fe836fa80cfd none swap sw

Любое предложение по этому поводу?

0

Твой "хороший" драйв на самом деле плохой. Michael Hampton 11 лет назад 2

3 ответа на вопрос

2

1

Community 2013-09-04 в 06:36

You may try the procedure I have described here: Remake SW RAID1 from a new HDD and an old HDD with bad blocks. It uses hdparm to read and write bad sectors and so to remap them on the disk if possible.

0

GioMac 2013-09-07 в 07:06

sdd drive is definitely failed and out of internal reallocation space.

Anyway, you can try to update firmware if available.

BTW, these are GPT disks, use parted or gdisk for listing and manipulating partitions. fdisk doesn't support GPT and globally is very buggy app.

Accepted Answer · 2013-09-04 06:15:47

The good news are that I can still have access to my data which are mounted at /storage

No, you can't; you have a problem reading the data at those dodgy blocks on /dev/sdd. You just don't know that in ordinary operation, either because you don't happen to read those blocks, or your application is tolerant of read errors.

I find messages like those that /dev/sdd is logging to be extremely worrying. If it were my device, I'd back the data up as fast as possible, preferably twice, replace the other drive as well drive, and restore from such a backup as I'd been able to get.

In addition, as you point out, you're trying to mirror a 976762552 block partition with a 976761560 block one, and that won't work; the new partition needs to be at least as big as the old one. I'm slightly surprised that mdadm allowed reconstruction to proceed, but you don't say what distro you're running, so it's hard to know how old the version is; perhaps it's old enough not to check that sort of thing.

Edit: Yes, you should enlarge the partition as you describe. I'm not an ubuntu fan, so I can't comment on that version. If you get this resync done, I'd replace the other drive immediately. If you have a decent backup, I'd stop wasting time with the resync, replace it now, recreate the array, and restore from backups.

Ошибка при восстановлении Linux raid-1

3 ответа на вопрос

Похожие вопросы