To compress?

If Booko looks up 1000 book prices in 1 day, it will be making 1000 queries to 33 online stores. How much quota could be saved by using HTTP compression? Picking a page such as the one below from Book Depository, I did a quick test. First, I fired up the Booko console and set the URL, grab the page and see how big it is::

>>url="http://www.bookdepository.co.uk/browse/book/isbn/9780141014593"
>>open(url).length
=> 38122

That’s the length in characters, which is ~ 37 KB. Let’s turn on compression and see if it makes much difference:

>> open(url, "Accept-encoding" => "gzip;q=1.0,deflate;q=0.6,identity;q=0.3").length
=> 7472

That’s around 7 KB, which is about 20% of the non-compressed version.

So, 1000 books from 33 shops is 33,000 requests per day. If they were all 37KB (of course they aren’t but let’s play along) we get around 1,200 MB of data or 1.2 GB. If they’re were all compressed down to 7KB, that would come to around 235 MB. Using compression means there’s a trade off – higher CPU utilisation. However, the price grabbers spend most of their time waiting for data to be transmitted – any reduction in this time should yield faster results with significantly lower bandwidth usage.

No prizes for guessing the next feature I’m working on adding to Booko 😉

Update: Thanks to Anish for fixing my overly pessimistic calculation that 1,200 MB == 1.2 TB.