Как узнать, сколько (и какие) данных я потерял после сбоя диска

264
Paco el Cuqui

Один из дисков в моем кластере (с RAID 5) умер на прошлой неделе, и мне пришлось его заменить. После восстановления и проверки согласованности некоторые данные, принадлежащие разным пользователям, были потеряны. Мне нужно точно знать, сколько и какие данные мы потеряли. Есть ли для этого какой-либо файл журнала или программа / пакет (нет резервных копий)?

Спасибо

1
Сравните с вашими резервными копиями? DavidPostill 8 лет назад 2
В этом проблема, резервных копий не было. Paco el Cuqui 8 лет назад 0
Нет способа сделать это, если у вас нет текущей резервной копии для сравнения. Moab 8 лет назад 0

1 ответ на вопрос

0
TOOGAM

Rebuilds are not generally intended to be reversible, meaning that the process does not typically save information about how to back out of a rebuild process that doesn't work right. Instead, people rely on backups.

You should always have backups of important data.

Another way to do identify changed data is to use a file integrity checker, such as AIDE, Integrit, or Tripwire. This solution, however, also requires comparing to data that has been made earlier.

Logs can be helpful. Even a saved directory listing can be helpful. However, if you don't have these resources, there may not be a commonly known way to determine the information that you're seeking.

To understand why: I have been shown how RAID5 works using XOR. XOR is sometimes known as the operation that simply checks sameness. So, you have two drives with data that is kept track of, and a parity drive that identifies whether the two corresponding bits, found on the other two drives, are the same. If you lose the parity drive, you simply re-make it. If you lose either of the other drives, you can recover by looking at the data drive that remains, and looking at whether the bits were the same (as recorded by the parity drive), and then you can figure out what bit was on the drive that was lost.

There, that's it. It's that simple.* Unfortunately, if the process fails for some reason, there may be no magical method to figure out what specific bad thing happened, and what specific good thing should have happened instead. As a result, there is no clear path for software to know how to just simply "fix" the situation. I know this is not the news you were hoping for. Sorry.

.* Well, I said it's that simple. Actually, my understanding is that actual RAID implementations often have headers that complicate things, like incompatibilities with other RAID implementations. However, the header is a certain part of the drive, relatively small compared to the total drive space. So if there are issues affecting other areas of the drive, which would be the majority of the drive, you're not likely to be finding salvation there.

Похожие вопросы