Confused about tape drive compression

2007-12-25 10:40:00

Thanks to:

 The Hermit Hacker <scrappy@hub.org>

 Charlie Mengler <charliem@anchorchips.com>

 "Peter L. Wargo" <plw@ncgr.org>

 "Adams, Chad M CRL" <cadams@crl02.crrel.usace.army.mil>

 mason@ncipher.com

 Jochen Bern <bern@penthesilea.uni-trier.de>

 Harvey Wamboldt <harvey@iotek.ns.ca>

 "Brion Leary" <brion@dia.state.ma.us>

 "Ackerson, Greg" <ackerson_ga@nns.com>

 Greg Sawicki <sawicki@interlog.com>

 Rich Kulawiec <rsk@gsp.org>

 Michael Sullivan <mike@trdlnk.com>

 nobroin@sced.esoc.esa.de (Niall O Broin)

 martin@stavanger.geoquest.slb.com (Martin Oksnevad)

(and a few others - forgive me if I missed you)

who all supplied lots of good info on tape compression.

Our original question was about actual amounts of data that can be stored

to an 8705DX Sun tape drive rated to hold 7GB uncompressed, 14GB compressed.

We are seeing about 27GB of Oracle data going onto a drive which should

only support 14GB of compressed data.

A number of people explained that the specific data we were backing up in

this instance, Oracle databases, could be compressed to a very small size

because it often consisted mostly of empty space, waiting to be filled with

data.

"Oracle pre-allocated the disk space when you create the table, so if the

tables are relatively empty, even though 'df' shows it using 29gig of disk

space, compressed it will use much much less then that, conceivably next to

nothing..."

Others pointed out that there's "average compression", and then there's

actual compression, and the actual result can be much better or much

worse than the average.

"The "rule of thumb" on data compression is "2 to 1" "on the average".

For some files, compression might leave the file the same size or

even larger. On some ASCII files I've seen compression ratios of

10 to 1 or greater."

There are ways to see this for yourself:

==========================================================

%gzip -9v csh.txt

csh.txt: 71.2% -- replaced with csh.txt.gz

(The text file is 71.2% smaller when compressed)

%gzip -9v tire1-1.jpg

tire1-1.jpg: 1.0% -- replaced with tire1-1.jpg.gz

(The .jpg photo is already compressed, and can't go much further.)

============================================================

Try this simple test.

% mkfile 1m test

% ls -l test

% compress test

% ls -l test.Z

Note the difference in filesize. This is due to the fact that mkfile files the

file with all zero. A pattern that is highly compressable.

=============================================================

Two posters clarified the concept of compression:

"The SUN Guy was right in *some* Places and confused in others.

A Tape of 7G uncompressed Capacity holds 7G, either without or

after Compression, PERIOD. If you get 20G onto it, your Compression

Ratio is approx. 3:1 or better."

"For one thing the Sun person told you wrong when they said if it was

compressed 2:1 you could fit 28 GB; a 160M tape is rated to hold 14 GB

assuming a 2:1 ratio; hence, to get 28 GB you'd need a 4:1 ratio."

And one poster pointed out that if an oracle database is compressing

very small, you need to remember to plan for when that database fills up -

good point!:

        "However, there is some cause for concern: if the reason your data is

being compressed so well is due to having many empty fields of data, then it

stands to reason should meaningful data start getting imported or changed at

a fast pace, the database as a whole could quickly become a 2-tape deal.

(this happened to us just last week) It's better to plan for this

contingency then continuously expect your DB to fit on one tape."

It was pointed out that there's a compression FAQ:

"There's a three-part data compression FAQ in Usenet's news.answers that

goes into detail on algorithms, examples, etc."

and another poster talked about possible alternative ways to backup oracle

databases:

"One thing that I am wondering about is whether or not you export your

database before backing it up. Over the years, I've found that it's

much better to use the database's native tools to dump it in a simple

format (e.g. ASCII) and then back *that* up because should you need to

recover from a failure of some kind, your chances of successfully

being able to do so from ASCII (which you can work on with standard Unix

tools and re-import using the database's tools) are much better than your

chances of restoring the relatively fragile database itself.

Not only that, if you export the database, you'll get a better idea of

its true size, since only data that exists will get exported."

Another poster explained where the 2:1 ratio comes from:

"The commonly mentioned ratio of 2:1 compression is just

a convenient simplification for marketing purposes that roughly corresponds

to what "typical" users can expect with "typical" data. If the original

data is random, there is no redundancy, so the data cannot be compressed and

the ratio will be 1 (or perhaps, slightly less than 1 if the compression

algorithm imposes some fixed overhead). If the data is very redundant,

then very high compression ratios can be achieved."

One poster talked about the more normal situation, as opposed to the oracle

compression issue:

"It's very unusual to get 2:1 compression on tape drives and with

29gb on a 7gb tape you have 4:1 compression.

Data compression is very data (type) dependant but people normally

expect ~30% (20-40%) data compression (1.3:1). If your data is

compressed allready (ex. compressed tar files) don't expect to get

any compression at all on tape.

30% data compression on a 8705DX with 160 meter tapes should give

you ~9gb per tape.

With (a more normal over average) 45% data compression your 29gb you

would just fit on a 20gb tape on a Exabyte 8900 tape drive (Mammoth)."

Thanks to all who wrote!!!


--
Judith Reed
jreed@appliedtheory.com
(315) 453-2912 x335

Comments

Got something to say?

You must be logged in to post a comment.