Gzip on a PCI card 141
steve writes "The German tech news site heise.de is reporting here (in German, of course) about a PCI card developed by the Universiy of Wuppertal and Vigos AG being shown at CeBIT, which does Gzip compression in hardware, thus freeing the CPU to do other tasks. The PCI card can compress 32MB/sec, which is more than enough to compress a 100Mbit LAN in realtime. A future version will do 64MB/sec. The article mentions that this will be of particular interest for web servers. The card should be on sale by the end of the year."
Useful for netbackups too (Score:5, Insightful)
Re:Useful for netbackups too (Score:5, Informative)
-Baz
Re:Useful for netbackups too (Score:4, Insightful)
It would be Very cool if the card supported multiple compression algorithms. Considering that GNU tar supports bzip as well., this would definately be useful.
Re:Useful for netbackups too (Score:2, Interesting)
Re:Useful for netbackups too (Score:2)
Re:Useful for netbackups too (Score:1)
Zipping backups? (Score:2)
One of the big problems with compressed backups, particular if you are tar-gzipping something is that any resulting damage/error in the file can render an entire archive unusable.
Hopefully, most people are into tar-clustering files (that is to say... tar'ing large archives as a group of files, then gzip'ing the grouped archive). You might save a lit
Re:Zipping backups? (Score:2)
Re:Zipping backups? (Score:2)
bandwidth saving (Score:5, Insightful)
i love the idea of a hardware based gzip... but i'd start by educating the software users on the cost vs benefit ratio of their existing configuration... i always seem to find that those who don't know what they're doing are the ones that have it set to maximum compression
interresting (Score:2)
How would this be implimented into unix? Would there be a device to stream to and a replacement for the gzip command and compression libraries?
Hardware Gzip (Score:2)
On an aside note this could be ofcause easily dome using an FPGA pci card. One that can do anything you want. Reprogram it to accelerate seti at home or stick some routines used in quake into it. Much more versetile.
The only problems are standarsation and convincing developers to use them.
Re:Hardware Gzip (Score:5, Funny)
Maybe I can even make some money on Intel, as they were in clear violation of my patent with their arithmetic coprocessor for use with the 80386SX family of microprocessors .
Re:Hardware Gzip (Score:2)
Re:Hardware Gzip (Score:1)
Don't get out much, do you?
This is relatively simple to do, and most of the major FPGA vendors offer "PCI development kits" which allow you to develop your own PCI card using their FPGAs. They're quite expensive, though, as they're aimed at OEMs.
The biggest problem in this, is that the compilers are propetary and expensive.
That would be why FPGA Vendors like Alt [altera.com]
A bzip2 version would be nice ... (Score:5, Insightful)
Browser Compression (Score:4, Informative)
It'd be nice if I could convince my boss to get some of these for us, but our CPU usage is pretty low right now with the mod_gzip module installed, so it'd be an unnecessary luxury at this point for us.
Re:A bzip2 version would be nice ... (Score:5, Informative)
gzip works with streams, producing input as it gets output. OTOH bzip2 treats the input as blocks. Thus it needs to get a whole block before it produces any output. Similarly the client needs to get a whole block of data before it can even start rendering the page. The man page of bzip2 says that the default block size is 900,000 (!) bytes. So while using bzip2 may improve bandwidth it will result in large latency.
Re:A bzip2 version would be nice ... (Score:5, Interesting)
Gzip works with blocks of data too, but the block size is 32KB instead of nearly 1MB and it is not nearly as CPU intensive as bzip2, so this is why it appears to produce a continuous stream of compressed data (even if, strictly speaking, it doesn't).
Gzip just seems to be a well-balanced compromise between resources and resulting compression ratio, plus it is Free Software (hint: bzip2 is Free Software too, but Rar isn't).
That's a 32K window, not block (Score:2)
IIRC it hashes the three bytes from its current position and looks for a match against hashes from 32k previous positions, then does a lookup in the hash bucket for as much as it can match following the initial 3 bytes.
The BWT actually sorts every position in the block. It's not streamable in any significant way.
Complete, Utter, Comprehension! (Score:1)
Ein Joint-Venture der Universität Wuppertal mit der Hagener Vigos AG zeigt auf der CeBIT (Halle 11, D26) den Prototyp eines "GZIP Accelerator Board". Die PCI-Steckkarte nimmt dem Prozessor die zeitraubende Kompression ab und soll in der aktuellen Version bereits 32 MByte pro Sekunde zusammenstauchen können. Damit läßt sich der Netzwerktraffic einer 100-MBit-Leitung bereits in Echtzeit komprimieren; durch einen modularen Aufbau sollen später bis zu 64
Re:Complete, Utter, Comprehension! (Score:3, Informative)
Comparison (Score:4, Interesting)
dd if=/dev/urandom of=32m bs=1024k count=32 ; time gzip 32m
P4-1.8Ghz:
real 0m4.428s
user 0m4.220s
sys 0m0.170s
AthlonXP2200+
real 0m3.579s
user 0m3.310s
sys 0m0.160s
So 32MB/s sounds pretty good to me.
Not a good comparison (Score:3, Interesting)
Note that this isn't necessarily a bad thing; at the expense of maybe 5-10% less compression, you're getting that high throughput. Depending on your task, it's a good trade-off.
Re:Not a good comparison (Score:4, Interesting)
P4-18Ghz: gzip -9
real 0m4.437s
user 0m4.200s
sys 0m0.210s
P4-18Ghz: gzip -1
real 0m4.366s
user 0m4.130s
sys 0m0.200s
AthlonXP2200+: gzip -9
real 0m3.387s
user 0m3.160s
sys 0m0.210s
AthlonXP2200+: gzip -1
real 0m3.427s
user 0m3.200s
sys 0m0.170s
The really funny part is that I ran the Athlon one several times and the gzip -9 was always just ever so slightly faster than the gzip -1 version.
Maybe random data is not the best for testing the different compression levels though, since if it is truly random it cannot be compressed no matter how hard you try.
Even if this is not a perfect(or even reasonable) "apples to apples" comparison, it is a good end-to-end system level comparison. While it may not be "4x faster than a 2Ghz CPU", when building a system that _needs_ to do compression, adding this card would _effectively_ boost my CPU speed.
Re:Not a good comparison (Score:2, Interesting)
It clearly is a flawed test to compare the CPU loads of -9 and -1 but it is an excellent example that IO is often the bottleneck.
Break out the ramdisk! (Score:2)
Merlin? Mind running those tests one more time, this time to a ramdisk?
Re:Break out the ramdisk! (Score:1)
It's on an athlon Tbird 1Ghz
time gzip -9 1m
real 0m2.403s
user 0m0.180s
sys 0m0.020s
time gzip -1 1m
real 0m1.813s
user 0m0.180s
sys 0m0.010s
yeah, I know pretty useless.
Now pretending like I can multiple this by thirty-two to get the rate for 32MB... 76.896s for gzip -9... hmm that can't be right. ah whatever. Someone with more ram than i have can figure it out.
Re:Comparison (Score:1)
Try saving the data to a file first, and then gzipping that.
/August
Re:Comparison (Score:2)
please type then type then type
Re:Comparison (Score:1)
/August, better get some coffe...
dd yields much smaller file here (Score:1)
Re:Comparison (Score:1)
If you want to test compression, try something like large log files, which usually have a lot of repetition.
Re:Comparison (Score:2)
translation (Score:1)
GZIP compression by hardware A Joint venture of the University of Wuppertal with the Hagener Vigos AG points to the CeBIT (, D26 resounds to 11) the prototype of a "GZIP accelerator board". The PCI plug-in card removes the time-consuming compression from the processor and is in the current version already 32 MByte per second to compress together to be able. Thus the Netzwerktraffic of a 100-MBit-Leitung can be already compressed in real time; by a modular structure
Re:translation (Score:2)
what about decompression??? (Score:1)
Re:what about decompression??? (Score:1, Informative)
Re:what about decompression??? (Score:1)
I would imagine this card would be aimed at the server market, where the application is in serving dynamic data to a large number of clients. By compressing that data at the server side, the effective network bandwidth can be increased. The hit for real-time decompression is less for the client, since they are only decompressing one set of data, while the server needs hardware acceleration as it's compressing many data sets.
Another potential applicat
Re:what about decompression??? (Score:1)
Most likely any replacement for libz.so would try to use the hardware as much as possible, offloading compression and decompression. Ideally it'd be configurable by the administrator.
How cute but useless. (Score:5, Interesting)
On a Xeon 2.8GHz, I just got 71 MB/s for gzip.
What's the use for such hardware then?
Plus it will eat the PCI bus because data has to go out of memory to processing card, back to memory, then to network card. You triple the PCI bus bandwidth. (Not true if the compression is embedded in the network card).
Re:How cute but useless. (Score:3, Insightful)
Xeon's arent' THAT cheap, but hey, 1ghz machines (or even 500mhz machines) with this card would easily match your Xeon once the 64MB/s cards come out. Or was that 64mb/s. Well, you get the point.
As for the bus latency, well.. you are right, it'd be better in the network card, but remember, that's layer 1 and 2 stuff you'd be meddling with, where gzip would end up in layer 4. Layer 3 is tcp/u
Re:How cute but useless. (Score:1)
Re:How cute but useless. (Score:2)
I know there's a 7 layer one, which is what I think you are describing.
Re:How cute but useless. (Score:2)
General purpose CPU power is still more expensive than specialized processing for compute heavy tasks. High level gzip compression still eats CPU on multi-ghz machines.
Besides, that's not the trend at all. The trend
Re:How cute but useless. (Score:2)
Re:How cute but useless. (Score:1)
You have an important point... (Score:5, Interesting)
When the PCI bus is taken, other stuff that the CPU needs to do will also be halted. And then the PCI bus is much slower than the FSB.
I think what we need to push distributed computing more is altering the RAM and DMA channels. There should be many physical channels to the RAM capable of simultaneously reading/writing different parts of it. As in if the ram can output 200 MB per sec, 16 devices could attach themselves to the RAM via maybe EDMA (enhanced DMA?) and simultaneously be able to read at 200MB each. This might be done by:
(1) Altering the addressing logic in the memory ICs, maybe put 16 different addressing systems and multiply their pins x16. Then have an external matrix, more advanced than the 802x DMA chip to allow simultaniety.
(2) Seperate the addressing schemes of each chip, so an OS kernel could smartly put data of important processes in the right chip to be worked on by external devices.. again also having an external matrix for the address multiplexing.
This way such a PCI gzip device could have its PCI address space, IRQ as well as (EDMA?) address which it would use to access the data to gzip and put back into the RAM, at full speed, not taking up RAM bandwidth, PCI bandwidth, IRQs or the CPU at all.
The AGP as achieved this by seperating the AGP channel from PCI, but still using dedicated memory rather than smartly-shared memory. I understand multiprocessor systems technically do the same thing, but in this case we are treating the external devices like complete slaves, like the GPU, for only dedicated purposes, and I'm emphasizing the smart sharing of memory that doesnt exist in multiprocessor systems either. In this scheme, one could add CPU cards, maybe hot-plugged, and have insta-multiprocessor system or use it to offload kernel compilation, zipping, 3d transformations, or even take user tasks while the main CPU just works in supervisor mode.
PCI details and Addressing tricks (Score:1)
On current PCI architectures, you already have that implemented.
Here is the description of the Serverworks chipset [serverworks.com] (Scroll down to the drawings) Intel's (e7500/7501) is very similar, in architecture at least.
The memory subsystem is one leg of the northbridge (center of the chipset), (two channels allows the chipset to double the bandwidth, but not the latency)
The CPU(s) sit on another bus.
The PCI busses are interconnected through HUBs and specilised links. With this kind of architecture, you can reac
Re:How cute but useless. (Score:1)
now if the limit of your use of gzip consists of sitting at a command prompt and typing
tar xvfz pr0n.tar.gz
then you are correct
Re:How cute but useless. (Score:1)
What kind of line are you serving if you want to do 10s of MB/s ? If you do, don't you have load balancer? What is the average throughput of an idividual server then?
A hardware DB offload engine would definitely be more inpressive, and I thing much more usefull.
Re:How cute but useless. (Score:1)
yes, we do run a load balancer with multiple servers
Reconfigurable (Score:5, Interesting)
The best idea would be to make the chip an FPGA not a specially-designed processor. Then you could load in different chip designs for whatever was currently needed. Need to do RSA encryption? The board reconfigures the FPGA for it. Same goes for Divx compression, gzip, SETI@Home, etc. FPGAs take a few milliseconds to reconfigure but when they operate as a dedicated signal processor they can leave a general purpose processor in the dust - leaving the main CPU to run the other apps, the desktop, etc.
Check out the IEEE archives and journals, searching for "adaptive computing" or "reconfigurable computing".
KingPrad
Re:Reconfigurable (Score:1)
Re:Reconfigurable (Score:1)
Re:Reconfigurable (Score:2, Interesting)
Re:Reconfigurable (Score:2)
Only useful for dynamic sites? (Score:2, Interesting)
At any rate, most of the visitors to my site rarely get the gzipped pages, as their browsers don't seem to support it
Cool (Score:5, Informative)
Another thing about gzip is that it is assymmetric: decompression is much faster than compression. Again this is a nice feature, because most files will be decompressed many times but compressed only once. Thus for instance, all man pages are stored in gzipped form and decompressed on demand.
But I can't see the point of implementing it in a PCI card. Wouldn't it be better to integrate it with either the processor or the network interface?
Re:Cool (Score:2)
I think that if one were planning to dedicate hardware to the task of compression, one would decide that space should take precedence over speed. Performance is the reason that hardware gets dedicated to a task. Why design something to be efficient with your CPU, and then solve the efficiency problem with dedicated CPUs?
And the algorithm is
Not quiet yet... (Score:5, Informative)
I'm assuming one is referring to something that will work with mod_gzip. That may be fine and dandy, but I just recently had to disable mod_gzip on my server. You can blame Microsoft.[1] It seems that both IE 5.5 and 6.0 have nasty little "sometimes" bugs[2] where they won't know what do with gzipped content. I tried to disable by user agent header with no luck. If anyone else has some good pointers or perhaps even a link to a patched version of mod_gzip that'll avoid those two bugs, I would apprieciate it.
[1] No, really. This isn't a troll. They even admit the bugs.
[2] Microsoft Knowledge Base Articles: Q313712 IE 5.5 [microsoft.com] Q312496 IE 6.0 [microsoft.com]
Re:Not quiet yet... (Score:3, Funny)
Re:Not quiet yet... (Score:2)
For example, my own tests have revealed that Flash is installed in 70% or less of browsers that frequent one of these sites. That's 30+% of your users that you'd be locking out! That's also quite a bit smaller than the 93% that I've seen Macromedia claim; I wonder w
Re:Not quiet yet... (Score:2)
Re:Not quiet yet... (Score:2)
Re:Not quite yet... (Score:2)
Re:Not quiet yet... (Score:1)
This might sound counter-intuitive, but 2k spaces compresses *very* well (about 14 bytes according to a quick test).
Of course, it's always a shame to have to put in a hack like this to get around IE's "features" (after being in as many versions as this has, its hard to think of it as just a bug anymore) in the fi
Re:Not quiet yet... (Score:1)
If the problem is with an MS dll and MS patches it, don't expect mod_gzip to work around it when your clients are the ones with the malfunctioning software.
Re:Not quiet yet... (Score:2)
It's still necessary to work around the malfunctioning software, since many of those users won't update for a long time.
Moo (Score:2, Informative)
This thing is going to sit on the PCI bus? Isn't that where your hard drives are too? On older computers which use a 33 megahertz bus, that would mean that compression @33 megahertz would keep the hard drive receiving any of the data. So, it would actually have to compress it at a slower rate, unless it caches everything. Even at 133 megahertz, the hard drive would be both reading and writing when trying to compress, and that's without worrying about swap.
Re:Moo (Score:2)
Putting it on the PCI makes sense from a research perspective - later implementations may be in other places, say on the network card or the disc controller. Danger, but fun.
Re:Moo (Score:3, Informative)
That's Bytes, as in 8 bits. A 100 Mbit/sec NIC is only 12.5 MBytes/sec.
*most* new PCs have a 33 MHz PCI bus (Score:2)
Re:Moo (Score:1)
Feel free to flame me, but i think the Motherboards days should be numbered.
Re:Moo (Score:2)
Warning (Score:1, Funny)
Mod_Gzip on a card.... (Score:2)
Why? Gzip already uses minimal processor time...and many [netcraft.com] sites [netcraft.com] already [netcraft.com] use [netcraft.com] Mod_Gzip [schroepl.net]...
So, as far as I'm concerned, unless the Mod_Gzip project supports this hardware,it's not gonna float...
Re:Mod_Gzip on a card.... (Score:1)
i think the definition of "minimal" might be useful here. if you have access to a reletively high volume web server (something on the order of 1mil+ hits per day), take a look at MRTG graphs with and without mod_gzip running ... you might be shocked.
if this card is in the sub$200 range, i'd outfit my server farm with it immediatel
Gcc (Score:2)
Re:Gcc (Score:1)
Ok, but (Score:1)
Oh... wait...
Sorry about that; my computer date was set for January 3rd, 1987... let me get out my soldering iron and correct it
Sun machines use PCI busses, too. (Score:3, Insightful)
I think it's a little naive to say "Oh, my 1000 hit a day web box, running on a cheap 686 wouldn't benfit from this, so it must suck." Hey, dont get mad! You said it! :P
Here is a thoguht! (Score:2, Insightful)
a sure way to profit... (Score:1)
that does both ssl/cram-md5/AES/etc.. and gzip/zlib/other compression
I can see my clients salivating already(saving the processors for those
well except for the IO-bound jobs...
Very interesting, but a little late (Score:3, Interesting)
The card is being sold on an OEM basis to manufacturers of load balancers and SSL accelerators. These boxes front-end multiple Web servers and have very high performance requirements. Also, the CPU has plenty of other work to do, for example TCP/IP processing. This is the application that needs hardware acceleration.
For a low performance site, mod_gzip is fine. But, if you have a busy site with hundreds of Web servers, you don't want to go around installing mod_gzip hundreds of times. It is a lot cheaper to buy a load balancer with gzip hardware acceleration.
bzip2 is irrelevant here as IE and Netscape would not understand bzip2 encoding anyway. But they understand gzip just fine (unless you have a version that is many years old).
Monish Shah
CTO, Indra Networks
www.indranetworks.com
Why use Gzip? (Score:2)
Using just the standard options, here's my results:
Original file: 732,921,856 bytes
.ZIP compressed: 725,244,234 bytes
.CAB compressed: 719,244,234 bytes
.RAR compressed: 719,855,409 bytes
.TAR compressed: 732,928,000 bytes
.BZ2 compressed: 732,884,505 bytes
.LHA/.LZH compressed: 725,886,696 bytes
.BH compressed: 725,251,468 bytes
.tar.gz compressed: 725,254,634 b
Re:Why use Gzip? (Score:2)
Also, why on earth do you have
And finaly, the use of this card is for compressing web pages. That's plain text of about 5 to 30k. Why on earth are you comparing 730 Meg of binary (and possibly already compressed due to the bad results from everything) to make your point?
Re:Why use Gzip? (Score:2)
Why not go with the superior cross-platform compression? With a little work they could have done just that. That was the point.
Re:Why use Gzip? (Score:2)
Also, your comparison is flawed, looking at the compression factors you achived (and the file size) I'm guessing that what you're trying to compress already is. (a DivX file?)
Re:Why use Gzip? (Score:2)
The real answer to your question, though, is: #1) web browsers know how to decode gzip, not rar, so gzip is useful for a web server sending web pages while rar is useless for that purpose, and #2) somebody mentioned that gzip is designed to work with a stre
Re:Why use Gzip? (Score:2, Insightful)
a. It appears (as someone mentioned elsewhere) that you are compressing an already compressed file
b. You have not specified the options used when compressing, which can seriously alter the result
c. You have thrown in TAR, which can be overlooked, however taring a single file before gzip compressing it is simply a waste of time unless there is some particularly pertinent permissions/directory structure data you want to preserve
Re:Why use Gzip? (Score:2)
Of course it also largely depends on what it is you are compressing. Let's not forget that "real" compression is, after all, impossible.
Much better: Reprogrammable Co-processors (Score:4, Informative)
A lot of computing records over the years have been set vector computers or other specialized hardware. Putting that power on a PCI-card like this gzip-solution and in addition making the algorithm reprogrammable and reconfigurable you get: Mitron Co-processor on a PCI-card [flowcomputing.com].
has been traditional areas for these kinds of devices, but with the new FPGA's and PCI-express on the horizon I can see it becoming usable for even more specialized applications. [idi.ntnu.no]
Here is a crude translation of an article in Swedish ( Source Elektroniktidningen [elektroniktidningen.se])
FPGA enhances PC
You don't have to be a logic constructor to make use of FPGA-chips. Using a normal PCI-card and a compiler from the innovation startup Flow Computing in Lund, programming in Flow's dialect of C is enough.
- We can make a normal PC do calculations that otherwize would have needed supercomputers of large Linux-clusters, said Josef Macznik on Carlstedt Research & Technology, a company that invested and works together with Flow Computing.
The main idea is parallelism. That implies that the PC hardware has to be added in some way, since normal PC-processors works sequentially and normal programs are written to be executed in that way.
Flow has chosen to use normal PCI-cards. The cards are equipped with an FPGA-chip from Xilinx with two million gates, but the size of the chip can be selected depending on requirements according to Josef Masznik.
The corporate secret lies in the compiler. Software has to be written in Flows own variety of C, and the compiler can decide which processes that wins the most on parallell execution, configuring the FPGA for maximum efficiency.
- The user don't see the FPGA-chip and don't really have to know what kind of hardware there is on the card. We are directed towards programmers - that's where the market is, said Josef Macznik.
Flows solution is currently used by a bioinformationcompany in Lund. But the technology can according to the company be used for all purposes where the computing power in a PC needs to be multiplied using parallelism ane where the effort to adapt their programs to the special variety of C is worthwhile.
Re:no bz2 (Score:1)
Re:no bz2 (Score:1)
Off-topic, but anyway, it would be nice to see dedicated MPEG-2 decoders aimed at consumers. Something like that would be great for sticking in a MythTV box. Right now you need a fast CPU to do anything useful on those, only because of the encoding.
Re:no bz2 (Score:1)
Re:no bz2 (Score:2)
Also, some video cards include mpeg2 decoding
Re:no bz2 (Score:2)
Now, as for dedicated encoders, there must be some around. Like here, [dealnews.com] or maybe high speed here [jmfiberoptics.com] (just a quick Google turned these up).
Of course, another thing that can be improved is I/O bandwidth and compression tweaks. As storage gets cheaper and cheaper you can use HuffYUV lossless,
Re:ahem (Score:2)
But of course, i may be wrong...
Re:huh? (Score:2)