All posts by Dan Milne

Summer fixes

Couple of little fixes, the largest of which is the search functionality is no longer powered by AJAX.  This means the Back button will work correctly after you’ve searched for a book, then gone and viewed a book, then clicked back.  Surprisingly, Safari actually did clever stuff to make this work – clicking back in Safari would take you back to the search results – in every other browser you’d be taken an unexpected page – usually the front page.

Secondly, I’ve added HTML5 attributes to various fields as described by Mark Pilgrim. The most obvious will be to Safari and Chrome users (at least the Mac version) where the search box will have round corners and stuff.  Other fields such as the email entry on login and register, and the OpenID field will now be easier to use on the iPhone, with alternative keyboard making it easier to enter that data.

Let me know what you think.

Google / Yahoo user?

Logging into Booko just got easier. If you have a Google or Yahoo account, just hit the appropriate button and you’re in, registration included.

Turns out, this was super easy to add to Booko since it already does vanilla OpenID logins. Basically, Booko just fills in the OpenID URL with “https://www.google.com/accounts/o8/id” or “http://yahoo.com/” and OpenID Directed Identity does the rest. You could also just type those URLs in yourself and it’ll work just the same.  Sweet!

New Booko features

Hey everyone, Booko has some new features.

User accounts – you can now create an account to save your cart.  Booko accepts OpenID also – so you can use your OpenID provider to log in to Booko.

If you have a Booko account, you can now create additional lists for keeping track of books. For example, you could create a wish list, or a scifi list, or a kids’ book list or a list of all Tintin books. Just go to the “Manage Lists” page and go to town.

Finally, you can make lists public and sharable. Once you’ve created a list, just tick the “Public?” checkbox.  Booko will then create a public URL for you to use.   For example, here’s a list I made of all Tintin comics:
http://www.booko.com.au/lists/view/nx9deGGd9Qgy2SE8

You can copy all the items from any public list into your own cart or into any of your lists.

Let me know what you think.

Updated cart implementation

Tonight I introduced a new shopping cart implementation to Booko. This new cart code is in preparation for allowing Booko users to create lists of books and to enable people to save these lists in their user accounts (coming soon).  The shopping cart will be one of these lists.

The new cart includes the ability to increase the number of any particular title without needing to visit that book’s page.

I wasn’t able to migrate existing carts across to the new system, but any book you’ve viewed recently will show up in the  “Your Recently Viewed” section.  Enjoy!

Playing on the Master branch

I’m working on adding a new feature to Booko, but I accidentally started working on the Git Master branch, which I like to keep sync’d with the production version of Booko. So after a few commits I want to be able to add some fixes to Booko, but I’ve polluted my master branch with untested, unfinished changes.  What to do?

After reading up on Stack Overflow, I decided to fix things.  What I want to do, is reset my master branch to the version in production, and take all the subsequent commits and create a new branch with them.  Turns out, it’s easy.

1. Find the commit you want the master branch to be at. You can find the SHA-1 name with “git log”

2. Create a branch from that commit with: git checkout -b new_master <SHA-1 commit name> (hint: this will be the new master )

3. Rename your current master branch to the new name of the feature: git branch -m master branchname

4. Rename the new_master to master: git branch -m new_master master

And, job done. You’re no longer messing up your master.

So awesome.

There are times I really enjoy using Ruby on Rails.  Recently, Fishpond started 403’ing http requests for cover images if the referrer isn’t fishpond.com.au.  Sites do this so that other sites don’t steal their bandwidth.  Really, Booko should be downloading the images and serving them itself (It’s on the todo list BTW).  Since Booko had been using Fishpond image URLs to display covers, you may have noticed a bunch of missing cover images – some of them are caused by Fishpond’s new (completely reasonable) policy.

So I’ve updated the code so I don’t link to Fishpond images, but now I need to go through every product Booko’s ever seen  and update those with a Fishpond image URL.   This is laughably easy with ruby on rails. Just fire up the console and run this:

Product.find_each do |p|
  if p.image_url =~ /fishpond/
    puts "updating details for #{p.gtin}"
    p.image_url=nil
    p.get_detail
    p.save
  end
end

The Rails console gives you access to all the data and models of your application – and this code, just pasted in, will find links to all Fishpond images, find a replacement image, or set it to nil. Point of interest – Booko has 396,456 products in its database.  Iterating with Product.all.each would load every product into memory before hitting the each – that would probably never return. On the other hand Product.find_each loads records in batches of 1000 by default.  Pretty cool.

* Thanks to http://ryandaigle.com/ to posting about this feature.

Fun with git post-commit

While developing new features or bug fixes Booko, I usually work in branches. This makes keeping things separate easy, and means I can easily keep the current production version clean and easy to find.  But when changing branches I often have to restart the rails server and the price grabber to pickup any changes.  For example, if I’m adding a new shop in a branch, when I switch branches I want the price grabber to restart.

Turns out git makes this super easy. You just create a shell script: .git/hooks/post-checkout

That script gets called after checkout. So, mine is pretty simple:

!/bin/sh
./bin/fetch_price.rb 0 restart;
thin restart

There’s probably a better way to get Thin to reload itself, but this works nicely.

You can checkout all the hooks here: http://www.kernel.org/pub/software/scm/git/docs/v1.5.5.4/hooks.html

On being Google’d

google_crawling

I love Google – they send me stacks of traffic and make sites like Booko reach a far greater audience than I could effect on my own. Recently, however, Google’s taken a bigger interest in Booko than usual.  These kinds of numbers are no problem in general – the webserver and database are easily capable of handling the load.

The problem for Booko, when Google comes calling, is that they request pages for specific books such as:

http://www.booko.com.au/books/isbn/9780140232929

When this request comes in, Booko will check to see how old the prices are – if they’re more than 24 hours old, Booko will attempt to update the prices. Booko used to load the prices into the browser via AJAX – so, as far as I can tell, Google wasn’t even seeing the prices.  Further, Booko has a queuing system in place for requests to look up prices, so when Google requests pages, this adds a book to the queue of books to be looked up. Google views books faster than Booko can grab the prices, so we end up with 100’s of books scheduled for lookup, frustrating normal Booko users who see the problem as a page full of spinning wheels – wondering why Booko isn’t giving them prices. Meanwhile, the price grabbers are hammering through hundreds of requests from Google, in turn, hammering all the sites Booko indexes.  So, what to do?

Well, the first thing I did was drop Google’s traffic. I hate the idea of doing this – but really Booko needs to be available to people to use and being indexed by Google won’t help if you can’t actually use it. So to the iptables command we go:

iptables -I INPUT -s 66.249.71.108 -j DROP
iptables -I INPUT -s 66.249.71.130 -j DROP

These commands will drop all Google traffic.

The next step was to go to sign up for Google Webmaster Tools and reduce the page crawl rate.

Google Webmaster Tools

Once I’d dialled back Google’s crawl rate, I dropped the iptables rules:

iptables -F

To make Booko more Google friendly, the first code change was to have book pages rendered immediately with the available pricing (provided it’s complete) and have updates to that pricing delivered via AJAX. Google now gets to see the entire page and should (hopefully) provide better indexing.

The second change was to create a second queue for price updates – the bulk queue. The price grabbers will first check for regular price update requests – meaning people will get their prices first. Requests by bulk users, such as Google, Yahoo & Bing, will be added to the bulk queue and looked up when there are no normal requests.  In addition, I can restrict the number of price grabbers which will service the bulk queue.

This work has now opened up a new idea I’ve been thinking about – pre-emptively grab the prices of the previous day or week’s most popular titles. The idea would be to add these popular titles to the bulk queue during the quiet time between 03:00 and 06:00.   That would mean that when people viewed the title later that day, they’d be fresh.

I’ve just pushed these changes into the Booko site and with some luck, Google & Co will be happier, Booko users will be happier and I should be able to build new features with this ground work laid. Nice for a Sunday evening’s work.