It wasn't me. You can't prove anything.


2011-12-25

Imaging drives using DD under Linux

I wrote this for work. It is highly transferable information. If I got anything wrong, scream at me. Comment on this post or email me. The address is somewhere on the page, I think.

I used Ubuntu 11.10 for the commands in this document.

Writing zeros to the drive before imaging

Remember, this is done on the drive you are about to make the image of. It will make the final file size much smaller.

Find out free size of disk.

>$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb1             147G  114G   26G  82% /mnt/disk2

Write the zeros

>$ cd /mnt/disk2
>$ dd if=/dev/zero of=big.txt bs=1G count=26

If you get an error saying something like “memory exhausted” Drop the number bs (byte size) and multiply the count accordingly. The result will be the same. Don't worry about running past the end of the partition. We are just trying to get loads of zeros out there.

>$ dd if=/dev/zero of=big.txt bs=256M count=104

Delete the file immediately after the above command finishes. The zeros are left behind on the drive.

>$ rm big.txt

Write partition to file

From the drive where you want to store the image, run the following command.

>$ dd if=/dev/sdb1 | bzip2 > file.bz2

Write file to disk/partition

When using a partition, first create a partition and format it the same as the original image. This command will only expand data to the “data” portion of the partition and not overwrite any partition metadata on the drive.

>$ bzip2 -dc file.bz2 | dd of=/dev/sdb1

When writing a whole drive, write directly to the top pointer /dev/sdb, for example. This will over write partition information. It should also allow boot information to transfer properly.

>$ bzip2 -dc file.bz2 | dd of=/dev/sdb

Image file integrity check

After making the image file, always do an integrity check before you let the machine go or do anything that will alter the data.

>$ bzip2 -t file.bz2

Errors will be listed.

Recovering broken image files

All this does is break the massive file in to billions of smaller files. Run an integrity check on all the files. Rename the ones that are bad so they do not make it in to the list. You can then use those files to write back to your partition using wild cards. The partition will be damaged and must have utilities run on it to fix those errors. Some data may be recovered and some may not.

The number of files will depend on the size of the original file. I had a 9.5G file break in to 26,933 files. Their total size was just a bit larger than the 9.5G original.

>$ bzip2recover file.bz2

Running integrity check on several files after they have been “recovered” above.

>$ bzip2 -t rec*file.bz2

Notes

File names

I strongly suggest using descriptive file names. Not only describing what is being imaged, but the type of file system and the whether it is a full drive or a partition that is being imaged.

For example.

<computer id>_<desktop|laptop>_<internal|external>_<part|disk>_<ntfs|ext3>.bz2

Time

All of these steps take a long time.

Machine

This is the sort of thing you need to do with a machine that is not used for anything else. I recommend using a thumb drive or CD boot system because you might incidentally over write your OS drive.

The right drive

Do absolutely everything you can to make sure you have the correct source and destination drives. In Ubuntu, I use a thing they call disk_utility on the menu to get a GUI description of the drives and what /dev/sd* they are trying to come up as. Most of the time you can use drive size and the partitions on the drive to double check.

The instant you hit enter on DD, the damage is done. The first part of a drive that is over written is the partition information. The first part of a partition is usually folder information.

Watching things happen

You can use the watch command to keep track of things like disk free space and file size. Open new terminals for these commands. The watch command updates ever 2 seconds by default.

>$ watch “ls -lah 

and

>$ watch “df -h”

and

>$ top

Outstanding issues

This trick seems to work regardless of block size of the drive. I haven't done enough testing yet to know for sure.

Haven't yet tried to put an NTFS image on to an ext3 partition. That should be interesting.

The USB cables seem to cause errors, not to mention how slow they are. Stick to real SATA connections.

BZip2 is single threaded because of the way it works on variable sized blocks. GZ is faster, but does make larger files.

No comments: