Linux DevCenter    
 Published on Linux DevCenter (http://www.linuxdevcenter.com/)
 See this if you're having trouble printing code examples


Four Tough Lessons of System Recovery

by KIVILCIM Hindistan
08/31/2006

Last week, I received a brand new laptop with 1.5Gb RAM, a 100GB SATA HD, and a 15.4-inch wide screen, brightview display. It has basically all the technical gizmos that can spoil a new employee.

The computer came to me installed with Windows XP Pro. My game plan was to transfer my files via a USB disk to the NTFS partition and then transfer my second partition which is Debian Sarge (so-called) Unstable, and keep up with my regular business.

My weapon of choice, when I have to use Windows, is a VMware Workstation, configured to work with the real partitions--not the loop filesystem. This means that if I change anything, my files are still there when I boot Debian.

So, I started VMware as usual, configured it to use the physical hard disk, and began my operation.

I used a USB disk to transfer my old system. After that, I began to erase my old disk, which contained the NTFS system partition (C), an NTFS data partition (D), and the Debian partition.

While doing this I first erased my D and Debian partitions with fdisk and wrote the changes to the disk.

After I exited cfdisk, I caught a glimpse of hda1, which troubled me and left me staring at the empty black screen with the root cursor wondering what was wrong. The thing was, that device should have been sda, which was the mounted USB drive, not hda--the laptop's native disk.

I turned red. I had just wiped the partition that contained my backup data and the installation files of my laptop. Fortunately, my boot partition was still there, so I just had to collect my backup data (some 60GB) from different computers and copy them again, which looked like half a day or so of work.

My First Attempt

I checked the files and saw that they were still there. Apparently, Windows does not read the partition data unless rebooted. If I moved the most important files to my USB disk, I'd have something left. Unfortunately, I could only take half of the data because my USB backup disk was already crowded with junk.

After that I tested my theory. Windows booted fine, but there was no D drive or Debian partition.

Then I began thinking (this is where things turned worse--honest). The partition was there and I did not overwrite anything, so it shouldn't be difficult rescue my files. Right?

I began looking for a program to scout my hard disk. Unfortunately, the program I found was not under active development, and the only version I found was a cracked copy. Nevertheless, I started the program and it found the partitions as expected. It told me that there was something wrong and asked if it could correct things. Of course I wanted that. Then it asked me if I was sure. Sure, sure--I sure wanted to rescue my missing partition.

And then it was okay. I just had to reboot to see... that now even Windows would not boot. It started booting with all good intentions (I'm sure), but then some error screen (which put the famous Guru Meditation screen of the good old Amiga to shame) appeared and I was ruined.

Recovering My Data, Badly

Now I not only needed to collect my backup data, but also to re-install Windows XP, which included finding a series of exotic drivers.

In these kind of situations, I always remember a famous quote from Albert Einstein "If I had 60 minutes to rescue the world, I'd spent 59 minutes to define the problem and 1 minute to solve it." I don't claim to understand the real wisdom of this quote, but for that I'm a bit on the lazy side, I like it. I humbly think that real laziness is not one of the seven deadly sins, but a virtue to earn via two mandatory tools: avoiding work cunningly and getting the job done properly at the same time.

As with every computer change, I had to transfer my vital files from the old computer to the new one. I usually have two operating systems. One is Windows for office things and the other is Debian GNU/Linux for security tools, etc. Every time I switch computers, I prefer to install Windows new and keep my good old Debian. Having switched three laptops in last six months, I've developed a nice method; after I finish installing Windows I install, VMware Workstation and from that create a virtual computer that uses the physical hard disk.

This allows me to keep working on Windows and at the same time install and transfer the Linux partition, even booting it and cleaning some glitches from the new network and other settings.

This method also has another advantage; if you suspect that you have messed something up, such as LILO or partitioning, you can easily try to boot (still from VMware) and see if everything is fine. If not, you have an already booted computer that has a network connection, CD-writer, etc., ready for backup and/or recovery procedures.

What could have gone wrong?

Almost everything, I should say.

This time I had a wonderful USB 2.0 jacket with transfer rates up to 25Mb/second, which really eased my file transfers. I simply unscrewed the old laptop's hard disk and put it into the USB jacket and plugged it into my brand new laptop.

Everything was fine as I transferred files to my NTFS partition. After I finished, I booted Knoppix 5.0 from the VMware (with physical discs mounted, as I've mentioned), and began to transfer my Debian partition, which also went wonderfully.

After everything had transferred, I wanted to erase the partitions in the old laptop, so that the new user would have a clean hard disk, but also to make sure that my files were gone for good. Windows would not fdisk the Linux partition, so I decided to use Knoppix for this too.

I started cfdisk and began erasing the two partitions (my NTFS data partition and the Debian partition). After I erased them both, my intention was to write the partition table back, then make new partitions and fill them with garbage data, which, in my case, would be caution enough.

Re-Losing My Data

As always, cfdisk wanted me to verify my choice of writing the partition table back with typing each and every letter of the word "yes", which I did without thinking.

After that, I exited cfdisk. At that last glance, I saw a small irritating detail. The hard disk that I was erasing (USB) should have been sda or sdb, but the screen showed an irritating hdb.

At that moment, I realized that I had just erased the partition in my new laptop that had my old backups and .CAB files for the installation. It was no big deal, I just had to transfer some 40Gb or so, but I was definitely upset at how I could make a mistake like that.

Out of curiosity, I clicked on D and saw that my files were still there. I realized that Windows must not re-read the partition table if it did not reboot or do anything to the partition directly. This fascinated me while I copied my most important files to my USB disk. I was on a lousy 11Mbit wireless network, so I could not think about backing up some 40GB to a network share. I also did not have enough space on my spare USB disk.

Destroying My Boot

After that, I booted the machine to see the damage.

As expected, C was there but D and the Debian partition were missing. In fact, all the data was intact and nothing was overwritten. The computer merely did not know where to find them. As I said, being on the lazy side and seeing myself as a technology-savvy work avoider with a computer, I began to search for a program that would find the exact physical location data of the missing partitions so that I could restore them. Because I was using Windows, I tried to find freeware to solve my problem.

The truth was, there wasn't any freeware. I came across gpart several times. This is a GPL-licensed console program that does exactly what I want on Linux. The true problem is that I was not properly lazy enough. I was more under the influence of a spoiled kind of laziness. gpart would only supply me with the partition table details but then leave me to build those partitions by hand. This was my second and biggest mistake.

After an hour of Internet mining, I came across a commercial program that claimed to do just what I wanted. Unfortunately, I was not ready to pay $50 for such a program, so I found an obsolete, unsupported version. I downloaded the program and started it.

It diagnosed my problem correctly and started to fix the partition table with the real values. After 50 seconds it reported everything was okay.

The performance had convinced me, so I confidently rebooted... to a blinking black screen.

Now I was done. I had not only lost D and my Debian partition, but I had destroyed the partition table somehow and my laptop would not boot from its hard disk.

The Real Solution

At this point, I came eye to eye with the Knoppix CD on the shelf (figuratively speaking). Remembering my mistake, I took the CD from the shelf, put it inside the laptop and booted with LILO command line: knoppix lang=tr 2, which would open with the Turkish locale to runlevel 2--more than enough for me.

I was worried that Knoppix would not recognize my brand new SATA device and I would be totally helpless. To my relief, cfdisk recognized the disk just fine, although there was nothing to see thanks to that wonderful and unnamed partition rescue software.

After that, I started gpart, which easily found my missing partitions.

Thumbnail, click for full-size image.
Figure 1. gpart finds the partitions. (Click for full-size image.)

I noted the exact sector numbers of those partitions and started fdisk. The exact output of gpart was:

root@1[knoppix]# gpart /dev/sda

Begin scan...
Possible partition(Windows NT/W2K FS), size(40000mb), offset(0mb)
Possible partition(Windows NT/W2K FS), size(24998mb), offset(40000mb)
Possible partition(DOS FAT), size(13348mb), offset(64998mb)
Possible partition(Linux ext2), size(16286mb), offset(79106mb)
End scan.

Checking partitions...
Partition(OS/2 HPFS, NTFS, QNX or Advanced UNIX): primary
Partition(OS/2 HPFS, NTFS, QNX or Advanced UNIX): primary
Partition(DOS or Windows 95 with 32 bit FAT, LBA): primary
Partition(Linux ext2 filesystem): primary
Ok.

Guessed primary partition table:
Primary partition(1)
   type: 007(0x07)(OS/2 HPFS, NTFS, QNX or Advanced UNIX)
   size: 40000mb #s(81920097) s(63-81920159)
   chs:  (0/1/1)-(1023/254/63)d (0/1/1)-(5099/74/63)r

Primary partition(2)
   type: 007(0x07)(OS/2 HPFS, NTFS, QNX or Advanced UNIX)
   size: 24998mb #s(51196312) s(81920160-133116471)
   chs:  (1023/254/63)-(1023/254/63)d (5099/75/1)-(8286/29/55)r

Primary partition(3)
   type: 012(0x0C)(DOS or Windows 95 with 32 bit FAT, LBA)
   size: 13348mb #s(27336896) s(133116543-160453438)
   chs:  (1023/254/63)-(1023/254/63)d (8286/31/1)-(9987/194/62)r

Primary partition(4)
   type: 131(0x83)(Linux ext2 filesystem)
   size: 16286mb #s(33354720) s(162010800-195365519)
   chs:  (1023/254/63)-(1023/254/63)d (10084/180/1)-(12160/239/63)r

In fdisk I entered sector mode (for that gpart output was in that format) and began to build a new partition table, which was, in fact, an identical copy of the old one.

For this I wrote fdisk /dev/sda and then pressed o (an empty partition table).

Next I pressed u to change to the sector mode. Then I pressed n (new) and p (primary) and made my first partition (which used to be C). Then I changed its type to 07 for NTFS.

One by one, I rebuilt the partitions. Finally, I selected sda1 as the active boot partition.

After I wrote the partition table, I saw that Knoppix realized that there was a change and udev checked those partitions and mounted them. Needless to say, all my data was there, with nothing missing.

What a relief. I rebooted my laptop--with great expectation, if I may add--and saw that... it would not boot.

I was pretty sure that this had nothing to do with the partition. Most probably the MBR or Windows bootloader had some kind of a problem. Solving this would be easy; I just had to boot with the XP Pro install CD and enter recovery mode.

Thumbnail, click for full-size image.
Figure 2. The cfdisk partition list. (Click for full-size image.)

Finally Reinstalling Windows

With all of my good expectations, I once more had a problem. My Windows XP installation CD lacked the drivers of the brand new SATA disk; therefore, it could not see the hard disk. If that was not enough, it asked me--IMHO mockingly--if I had a disk with my needed drivers. Politely I told the computer that if I even had such a disk, I did not know where to insert it, because my brand new laptop, most understandably, lacked a floppy drive. The installation was not completely unreasonable and told me that a USB diskette driver would do the job, but a USB flash disk or a CD-ROM would not.

This was ridiculous. Looking around for any signs of a Candid Camera crew, I used my old laptop to search for a workaround.

There was a way. A fabulous little program called nLite makes it trivial to prepare a custom Windows installation disk. This disk can include your preferred service pack, hot fixes, device drivers, and even pre-installed software. In addition to this, the program was capable of building a full automated, unattended installation.

Thumbnail, click for full-size image.
Figure 3. The nLite welcome screen. (Click for full-size image.)

I made a new installation CD including my SATA drivers and booted the laptop with that.

Thumbnail, click for full-size image.
Figure 4. nLite lets you customize your installation. (Click for full-size image.)

After entering the recovery mode by pressing r, I gave two commands, fixboot and fixmbr, and booted one final time. (I prepared to accept defeat if even this proved futile).

I don't know if the gods showed me mercy or the little computer gnomes finally gave up. Windows booted just fine with everything in its proper place (some four hours later if I may add).

After this, booting again from Knoppix and reinstalling grub was easy.

I've learned four lessons from this incident.

KIVILCIM Hindistan works as a full time computer security consultant with a CISSP, using Linux and Free Software as weapons of choice.


Return to the Linux DevCenter.

Copyright © 2009 O'Reilly Media, Inc.