sábado, 27 de noviembre de 2010

Testing raidpycovery through mdadm

Hi!

I'm still working on polishing raidpycovery. I started doing real experiments today to recover data from broken RAID5s. In order to do it, I used linux' md. It allowed me to study md a little bit while testing raidpycovery.

So, let's get our hands dirty.

First, let's create a directory where we will work, for example:

$ mkdir raid5
$ cd raid5

Now let's create the separate images where we will create our RAID5. I will use 4 disks 10 MBs each:

$ for i in 0 1 2 3; do dd if=/dev/zero of=disk$i bs=1M count=10; done
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.0584047 s, 180 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.100258 s, 105 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.0691083 s, 152 MB/s
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.0508324 s, 206 MB/s
$

Now we have four empty files that we will feed into md to create our RAID to do our tests.

First, we will "loop" them so that we can use then with md (I don't know if this is really needed, but I'll do it just in case). As I'm working on a live USB, I'll have to link from /dev/loop2 and on. Check the used loop devices with losetup -a (as root... or with sudo):

$ sudo losetup -a
/dev/loop0: [0811]:31 (/cdrom/casper/filesystem.squashfs)
/dev/loop1: [0811]:38 (/casper-rw-backing/casper-rw)
$

Now, I loop the files:
$ sudo losetup /dev/loop2 disk0
$ sudo losetup /dev/loop3 disk1
$ sudo losetup /dev/loop4 disk2
$ sudo losetup /dev/loop5 disk3
$ sudo losetup -a
/dev/loop0: [0811]:31 (/cdrom/casper/filesystem.squashfs)
/dev/loop1: [0811]:38 (/casper-rw-backing/casper-rw)
/dev/loop2: [000f]:27898 (/home/ubuntu/raid5/raidpycovery/bin/raid5/disk0)
/dev/loop3: [000f]:27899 (/home/ubuntu/raid5/raidpycovery/bin/raid5/disk1)
/dev/loop4: [000f]:27900 (/home/ubuntu/raid5/raidpycovery/bin/raid5/disk2)
/dev/loop5: [000f]:27901 (/home/ubuntu/raid5/raidpycovery/bin/raid5/disk3)
$

Great. Now we can use the loop devices to create our RAID device:

$sudo mdadm --create /dev/md0 -l 5 -p ls -n 4 /dev/loop2 /dev/loop3 /dev/loop4 /dev/loop5
mdadm: array /dev/md0 started.
$

If you read the man page of mdadm you will see that by default the algorithm for the RAID will be left sync and a default stripe/chunk size of 64k. We will need that data later on.

Now, let's format it so we can use it as a normal partition:

$ sudo mkfs.ext3 /dev/md0
mke2fs 1.41.11 (14-Mar-2010)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=64 blocks, Stripe width=192 blocks
7648 inodes, 30528 blocks
1526 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=31457280
4 block groups
8192 blocks per group, 8192 fragments per group
1912 inodes per group
Superblock backups stored on blocks:
8193, 24577

Writing inode tables: done
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 23 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
$

Now, we can mount our just formated RAID device:
$ sudo mount /dev/md0 /mnt/tmp
$ sudo mount
.
.
.
/dev/md0 on /mnt/tmp type ext3 (rw)
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
.
.
.
/dev/md0 29557 1400 26631 5% /mnt/tmp
$

There you go. A new partition with around 25 MBs of data available for mortal users. Now, let's copy some files into that partition:

$ sudo cp blah blah blah /mnt/tmp/

I'm using sudo to do the copy because right now that directory is owned by root.

After I copy the files I wanted, let's check their MD5s:

$ md5sum /mnt/tmp/*
a27ebcacc64644dba00936abc758486e /mnt/tmp/IMSLP32718-PMLP01458-Beethoven_Sonaten_Piano_Band1_Peters_9452_14_Op27_No2_1200dpi.pdf
e68fabdcda296ef4a76d834a11a6f1df /mnt/tmp/IMSLP44764-PMLP48640-Mahler-Sym9.TimpPerc.pdf
md5sum: /mnt/tmp/lost+found: Permission denied
d7bfe06473430aad5ca0025598111556 /mnt/tmp/putty.log
670536c55ae9c77b04c85f98459c0cd8 /mnt/tmp/Resume Edmundo Carmona.pdf
8727e8ff88739feca15eb82b4d9cb09b /mnt/tmp/Titulo Ingenieria.png

Now, let's umount our RAID and stop it:

$ sudo umount /mnt/tmp
$ sudo mdadm --stop /dev/md0$ sudo umount /mnt/tmp/
mdadm: stopped /dev/md0
$

Great... now, let's test try to rebuild the RAID with the raidpycovery tools. I don't have the tools at the same directory, so I'll have to move and use relative names for the disks, keep that in mind:

$ ./Raid5Recovery.py 4 left async 65536 raid5/disk0 raid5/disk1 raid5/disk2 raid5/disk3 > wholedisk
Number of disks: 4
Algorithm: Left Asynchronous
Chunk size: 65536 bytes
Skip 0 bytes from the begining of the files
Finished! Output size: 31457280 bytes
$

Now, let's mount it and see if we can get any data from the recovered RAID:

$ sudo mount -o loop,ro wholedisk /mnt/tmp
mount: unknown filesystem type 'linux_raid_member'
$

Not good. From my tests, that's because there's md garbage at the end of the disks that the mount is seeing. I'll force the partition type then:

$ sudo mount -t ext3 -o loop,ro wholedisk /mnt/tmp
mount: wrong fs type, bad option, bad superblock on /dev/loop6,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
$

Gotcha! Did you notice that I resembled the RAID using left async? It has to be left sync, remember? Let's try again:

$ ./Raid5Recovery.py 4 left sync 65536 raid5/disk0 raid5/disk1 raid5/disk2 raid5/disk3 > wholedisk
Number of disks: 4
Algorithm: Left Synchronous
Chunk size: 65536 bytes
Skip 0 bytes from the begining of the files
Finished! Output size: 31457280 bytes
$ sudo mount -t ext3 -o loop,ro wholedisk /mnt/tmp

No complains. Great! Now let's see if we can see the data in the recovered RAID:

$ md5sum /mnt/tmp/*
a27ebcacc64644dba00936abc758486e /mnt/tmp/IMSLP32718-PMLP01458-Beethoven_Sonaten_Piano_Band1_Peters_9452_14_Op27_No2_1200dpi.pdf
e68fabdcda296ef4a76d834a11a6f1df /mnt/tmp/IMSLP44764-PMLP48640-Mahler-Sym9.TimpPerc.pdf
md5sum: /mnt/tmp/lost+found: Permission denied
d7bfe06473430aad5ca0025598111556 /mnt/tmp/putty.log
670536c55ae9c77b04c85f98459c0cd8 /mnt/tmp/Resume Edmundo Carmona.pdf
8727e8ff88739feca15eb82b4d9cb09b /mnt/tmp/Titulo Ingenieria.png

And there you are. Everything is right there. Using this method today I discovered I was not rebuilding chunks from missing images correctly and that one of the right-handed algorithms has to be corrected. Let's see what I find.

As homework, try running the recovery script providing a missing disk (write none for it) and see if it works.

Hope you liked the read or find it useful.

Update: Have corrected the right algorithm issue. Make sure you get the latest stable release if you are gonna use it:

bzr branch -r tag:2.00.02 lp:~eantoranz/+junk/raidpycovery

10 comentarios:

  1. Hi,

    will this method work for RAID 0 too?

    -Marcin

    ResponderEliminar
  2. Marcin, if you lost any of the disks that made up the array, you are lost.

    Now, if the disk controller doesn't want to recognize the array but you have everything, you should be able to build the disk very easily... probably by just using dd.... provided you have enough free sṕace to rebuild the disk inside the array, of course

    ResponderEliminar
  3. Este comentario ha sido eliminado por el autor.

    ResponderEliminar
  4. Hi Edmundo,

    I have two images done already. Each is 73GB. I have found the stripe size 128KB. Each Sector is 512 bytes.
    My problem is that do not know how to reconstruct the RAID 0 array using R-Studio. I see that the way the stripes are concatenated is off by 63 sectors. For last two days I am not able to get the RAID 0 array assembled, so I am googling and I found your posts.

    What I need is a custom size for stripe 1 on disk 1. it should be 192 sectors, then 256 sectors from disk 2, and 256 sectors from disk 1, and so on.

    I am wandering what is the best way to reassemble this RAID 0 array given the fact that my disk 1 image has 192 sectors for first stipe I would appreciate your suggestions.

    I tried offsets in R-Studio etc.... nothing worked
    Best,
    Marcin

    ResponderEliminar
  5. I know now that to build my RAID 0 array I need software that can specify 193 sectors for the first stripe on DISK 1 and 256 sectors for the rest of the Array. Is there anything other that can do this? Do you know?

    ResponderEliminar
  6. Hi man, nice work on the raids.

    I am looking for abit of advice with a raid I am trying to recover. The backups were not done correctly by in house staff and now there is hell to pay :).

    Specifics:
    Raid 5, 3 Disks (each 320G), 2 disks failed but I got them both mostly working again for long enough to image. I now have 2 full disk images and 1 half image. The half image is clearly the first disk, it shows a partition table. Now I do not know the algorithm, the order, the chunk size or if an offset should be applied.

    The output for file -s are as follows
    Image 1
    /media/sdd1/image00.dat: x86 boot sector, Microsoft Windows XP MBR, Serial 0xb98fb98f; partition 1: ID=0x7, active, starthead 1, startsector 63, 41945652 sectors; partition 2: ID=0x7, starthead 0, startsector 41945715, 1833466320 sectors, code offset 0xc0

    Image 2
    /media/sdd1/drive01.dat: PDP-11 UNIX/RT ldp

    Image 3
    /media/sdb1/drive03.dat: data

    Output for fdisk -l is
    Image 1
    You must set cylinders.
    You can do this from the extra functions menu.

    Disk /media/sdd1/image00.dat: 0 MB, 0 bytes
    255 heads, 63 sectors/track, 0 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0xb98fb98f

    Device Boot Start End Blocks Id System
    /media/sdd1/image00.dat1 * 1 2611 20972826 7 HPFS/NTFS
    Partition 1 has different physical/logical endings:
    phys=(1023, 254, 63) logical=(2610, 254, 63)
    /media/sdd1/image00.dat2 2612 116739 916733160 7 HPFS/NTFS
    Partition 2 has different physical/logical beginnings (non-Linux?):
    phys=(1023, 0, 1) logical=(2611, 0, 1)
    Partition 2 has different physical/logical endings:
    phys=(1023, 254, 63) logical=(116738, 254, 63)

    Image 2
    doesn't contain a valid partition table

    Image 3
    Same as image 2

    If I run your script using say left sync or left async with image 1 then 2 then 3, I get the file partition back. However if I run it with none, image 2, image 3 with any combination of left *, I get nothing.

    The best thing would obviously be to work out the chunk size and then random tests should quickly give the order and algorithm, I mean there are only 2 possible orders here.

    Could you please advise on discovering the chunk size.

    Cheers
    Shane

    ResponderEliminar
  7. ShaneW, I found no way to email you back. Throw me an email to see what I can do to help you (your situation sounds like it's perfectly possible to recover). eantoranz gmail

    ResponderEliminar
  8. i have a raid5 with 3 partitions (only 2 images, because 1 harddisk failed) of 1.4TByte and i used your programm raidrec to recover the raid.

    i am running this cmd:
    raidpycovery2/src/raidrec 5 3 left async 32768 /mnt/diskimage-sda3 none /mnt/diskimage-sdc3 > final_disk

    The process is running and the result file is over 2.2 TByte at the moment.
    Is it correct? How much space do I need for the recover?

    ResponderEliminar
  9. If it was a 3-disk array (1.4 TBs each), it should go up to some 2.8 TBs.
    Sidenote: "partitions" (as in the partitions in the "virtual" disk of the array) is not related to the number of disks that make up the array.

    ResponderEliminar