On Nov 30, 2008, at 7:29 AM, Roy Hann wrote:
> What is the current thinking on HIDATA compression?
>
> I have lately been converted to the performance benefits of using
> COMPRESSION=DATA, but COMPRESSION=HIDATA seems unimpressive.
>
> I don't feel like digging into the code just now; is it done with
> Lempel-Ziv-Welch compression (which is what I was once told), or is it
> done with a faster technique now? (Or will it one day?)
Yes, it's an LZW style symbol-replacement compressor. Or so
the code claims.
I am not impressed by HIDATA. The typical row is too short to
compress well with LZW because of the dictionary overhead.
HIDATA requires an extra leading byte that says whether or
not that particular row could be compressed, so in the worst
case, the table gets bigger rather than smaller. And yes,
it's extremely CPU intensive. For very long rows in archival
tables it might be a win.
I would like to see a simpler run-length compressor that
is more intelligent than standard trailing-blank compression.
I do have a code candidate, although it was written for
hash-join spill file compression rather than DMF row
compression. With a bit of fiddling to add NULL inspection,
which can be a big win when a table is defined WITH NULL,
the code might work OK for row compression. I have been
meaning to fool with that at some point.
Karl
>> Stay informed about: HIDATA compression