martes, 22 de noviembre de 2011

How to connect your smartphone through your GNU/Linux box. Part I: With a spare wireless router


So, you forgot to pay your phone bill? Or you are just in the middle of a financial crisis? Whatever the reason, you lost connection of the most important item of communication in your daily life: your smartphone.

But you have a computer that does connect to internet. Well, I got my BlackBerry to connect to internet (and regain simple communication to a very dear group of friends) by connecting through the computer.

You see, as I've always said, GNU/Linux is about fun and flexibility. This is no exception. I connect to my ISP through a USB dongle (GSM connection). This box has a wireless interface but, in this chapter, I will use a spare wireless router I have at home. I'll hack on to connect through the wireless interface directly on a following chapter (if the interface allows me to).

So, first, I set up my wireless router and my box to use static IPs (so that my computer and the router will be able to "see" each other). To make things simple, I used this set up:
My Box:
Router's GW: (your box)
Router's DNS: Whatever my ISP's DNSs are (cat /etc/resolv.conf)

Now, in order for the traffic comming from the router to go through your box, you have to make sure two things are set up.

There not be any rule/policy in netfilter's FORWARD chain keeping the traffic from going through
# iptables -L FORWARD -nvChain FORWARD (policy ACCEPT 1756 packets, 917K bytes)
 pkts bytes target     prot opt in     out     source               destination

You have a DROP policy? Don't want to get rid of it? Add rules to allow traffic going from the router to internet to pass through, and also the traffic that comes back. That's a whole topic in and of itself so I won't go into it. I have an ACCEPT policy on FORWARD, so I have no problem with that.

The kernel be enabled to forward ipv4 traffic
# sysctl net.ipv4.ip_forwardnet.ipv4.ip_forward = 1

It says 0? You have to enable it:
# sysctl -w net.ipv4.ip_forward=1

That should be it.... or almost. The thing is that the kernel is now letting traffic comimg from the router to get to internet but it's not doing any network address translation on that traffic so it's coimg out not with your network interface's IP address but with the wireless router's (which is probably doing nat on the traffic from its clientes) and that won't hold water so next step is to masquerade all traffic that is going out to internet. Something like this should be enough:

# iptables -t nat -A POSTROUTING -o ppp0 -j MASQUERADE

I'm using ppp0 for my example (as I said, USB dongle) but it has to be the interface you use to connect to internet. Now, your smartphone (if it's already using the wireless network from the router) should be able to get to internet.

I hope it does the trick for you.

martes, 8 de noviembre de 2011

raidpycovery supports raid4 and repetitions


Just wanted to announce that I just tagged/pushed version 2.01 of raidpycovery which now supports raid4 and also repetitions (like the one an HP SmartArray P410i displayed).

Repetitions works this way.On a normal RAID5 (left async) you would see something like this:
A0 B0 C0 D0 X0
A1 B1 C1 X1 D0
A2 B2 X2 C2 D2
A3 X3 B3 C3 D3
X4 A4 B4 C4 D4
A5 B5 C5 D5 X5

On a raid with 2 "repetitions" you would see:
A0 B0 C0 D0 X0
A1 B1 C1 D1 X1
A2 B2 C2 X2 D2
A3 B3 C3 X3 D3
A4 B4 X4 C4 D4
A5 B5 X5 C5 D5

You get the idea, right? On the raid5 I broke last weekend the controller did 16 repetitions.

Repository for the project is here.

domingo, 6 de noviembre de 2011

Edmundo 750 GBs - HP 0 Gbs


Just today I finished one almost perfect data recovery from another RAID5 that its controller spitted onto (another HP controller, yet again).This time around, it was a 6-disk RAID5 (some 130 GBs each).

I knew beforehand that my python library (which I had translated from java about one year ago) was too slow and this task was going to take days if I hadn't done something about it. So, first I took the task of translating it to C++, which I finished some days ago.

When faced with the disks, the HP controller reported that 1 of them was physically dead, one of them had been hotswapped and another was to fail soon (public administration, don't ask about the details... gruesome). Fact of the matter is that the controller didn't want to make it visible so the server would not start. HP support asked me to dip the array in holy whater and forget about it. But that was not to happen, was it?

The disks were mounted on a separate computer one by one (well, not one by one but on a separate computer), and images were made from each one of them (only one of the drives didn't allow itself to be copied). Tips: LiveUSB, ddrescue.

After getting my hands on the images, I started analyzing how it was built to figure out the order of the drives, the chunk size and the algorithm used. Normally for this job the first MBs from each image help to do it.... after taking a long cold look at the images, I had figured out the chunk size (64ks) by looking at some numbered markers present on the ext2 partition that was at the begining of the disks... however, as this was a linux server with no FAT of NTFS stuff on it, there was no data (specially text) to be able to figure out the order... and specially, not crammed at the begining of the disk (does NTFS still work like this? You gotta be kiding me).

After thinking about it (and working on it) for a while I ended up finding whole stripes that would be made out of text (except, most probably) for a single chunk for every stripe. Finding these stripes and their sequence took its time, but eventually I was able to break it. The only thing that bothered me was that for many stripes (one after the other) the checksum chunk was always on the same disk (which I didn't expect) and the sequence remained the same on those stripes. At first I thought that perhaps it had been built as a RAID4 instead of a RAID5, but after checking many stripes the whole thing made itself clear: It's a RAID5 in which row configurations repeat themselves a number of times (in this case, 16 times). All in all, left async 64Ks with 16 repetitions. This took me to make an adjustment to my C++ utility just for this task and on a single shot I was able to rebuild the RAID5 from its pieces. Rebuilding took some 14 hours (probably not because of the utility but because of the bottleneck of having all 5 images on a single drive).

After rebuilding, I was able to do fdisk -lu on the disk image, do the losetup trick, and mount the server's / partition and get data out of it. There were some IO errors (not that many, fortunately.... given that the controller had already reported more than one disk as broken, maybe some pieces of the images were broken or the controller had stopped using one of the drives as a whole some time before crashing). Some other people will take of getting the stuff working but my task is definitely over.

So, next time you have a RAID5 crash , always remember it can take some time/thought breaking it... but in the end, it can pay off.

I'm starting to feel like the Messi of data recovery. Anybody needs a data recovery hacker around there? I'm willing to travel anywhere!

PS And before you ask, it's 750 GBs (or close) for this array plus the one I had recovered in 2005.

jueves, 3 de noviembre de 2011

Remember the times when I used python for RAID5 Recovery? Not anymore


Remember the times when I translated the raid5 recovery utility from java to python? Well, let me tell you something. It was horrendously SLOW. For example, I used to help a guy who had a problem with his array and the operation to rebuild the array took literally days (and the data wasn't recovered in the end, so...). While doing my tests, a (virtual) 300 MBs RAID5 with a missing disk took about 450 secs to be rebuilt on python (on my netbook, anyway).

Right now I'm facing the recovery from a RAID that has 6 disks 150 GBs each. That is going to take _a lot_ of time to recover. So, just in case we do have to go on with the operation, I decided I was goint to translate it to C++, which I started doing a little more than 24 hours ago (not fully employed on the task so don't complains about me being slow in the process).

So, I'm happy to announce that the translation has worked and after making some tests I can attest that it absolutely kicks ass (at least time-wise :-)). Remember the 300 MBs RAID5 I talked about a couple of paragraphs ago? It's rebuilt in roughly 8 seconds. 8!!!!

The way you call it is the same (same arguments) only that, instead of calling, you will just write Raid5Recovery (with the accompanying ./, you know the drill).

I don't intend to change its name at the time so it will remain Raidpycovery.

As usual, I have made some tests on it but they were not exhaustive so feel free to use it at your own risk.

The project is hosted on launchpad (no release) here:

Last but by no means least, I dedicate this project to the memory of my wife, Lina Marcela "Nena" Salgado.