Booko's Blogo

All about Booko

Archive for the ‘Development’ Category

A new Booko feature

with 6 comments

Booko now supports displaying “Works” – that is, a list of editions of the same book. A given work may have several editions – for example, a paperback, hardback or eBook version. Booko is starting to collect these various editions together as a “Work”.  We’ve had this feature for a while – but now Booko includes the minimum and maximum price for each edition of a work.  I think it should be a pretty useful feature.
Here’s the Top 10 works over the last few days. Check them out to see the new feature in action.
  1. The Girl with the Dragon Tattoo
  2. Tomorrow, When the War Began
  3. The Girl Who Played with Fire
  4. Eat, Pray, Love: One Woman’s Search for Everything Across Italy, India and Indonesia
  5. The Girl Who Kicked the Hornets’ Nest
  6. The Catcher in the Rye
  7. The Brain That Changes Itself
  8. Anna Karenina
  9. To Kill a Mockingbird
  10. Brave New World

Enjoy!

Written by Dan Milne

October 8th, 2010 at 10:34 pm

Posted in Booko,Development

On Daemons

with 8 comments

As you might imagine (depending on just how nerdy and imaginative you are), Booko is a poster child for the concept of long running background tasks. Grabbing prices from 40 online stores isn’t a fast process and you certainly would not want your front end webservers making your users wait as long as the slowest of the 40 stores before responding to a user request.

Over the years, I’ve tried various approaches to running user level daemons. My first attempt was ok – I rolled my own and slowly improved it. It could handle HUP signals, write PID files, die gracefully and it knew if it hadn’t died properly and attempted to kill zombie versions of itself. It had stop / start / restart commands. But it wasn’t all sweetness and light.  What happens when it dies? This is probably the trickiest part of running daemons (Well, having to fork twice and make sure you have detached from the terminal is probably tricker, but still).

So, how do you make sure your daemon is running? Cron immediately springs to mind. So, part two of writing your own daemons is writing something to keep them going.  You may have found yourself in this position and felt a little tickle in the back of your mind when you setup a cron job to solve this problem. My cron job looked at the daemon’s log file’s modified time and if it was more than 5 minutes old, looked for the PID file and sent that process a KILL signal.

It’s an easy, stable solution to the problem at hand – albeit with a 5 minute lag to detect crashed daemons. It’s ok because I run multiple daemons which can take the load if one dies.  But, what happens if you only have a single daemon? Increase the frequency of checking?  Cron’s smallest resolution is 1 minute – that’s not really ok (depending on what your daemon does, it may be fine).  But now you have to make sure that your daemon’s writing to the log at least every minute.  Ugh.

This solution is starting to smell. So, what does everyone else do? Well, I checked out God – but it just doesn’t feel like an elegant solution to this problem. It may solve the problem nicely, but there must be a better way? Hard core nerds would probably move on to daemontools but it’s too much work for me.

That tickle you may have had in the back of your mind earlier was your subconscious telling you the problem is already solved and you already use it for your webserver, mail server, DNS server, ssh server and more. Your operating system can provide this exact service for you. Since I’m using Ubuntu that service is provided by Upstart.

Running your service with Upstart has two very nice consequences. Firstly – you can remove all the code used to manage daemonising. You can now write your code to hang around in the foreground. Leaving your code in the foreground while you’re in development mode is good anyway – you can watch it more closely. If you really want to daemonise in our dev environment, bang up a tiny ruby script with the Ruby Daemon gem which calls your actual script and manages PIDs, signals and a stop/start interface for you.

Setting up a service to run with Upstart requires just a config file – here’s one I prepared earlier:

description "Price Fetcher Upstart script"
author      "Dan Milne"

start on startup
stop on shutdown

console output

respawn
instance $FID

script
    env RAILS_ENV=production
    export RAILS_ENV

    exec sudo -u booko RAILS_ENV=production /opt/ruby-enterprise/bin/ruby /var/www/booko.com.au/booko/bin/fetcher.rb $FID
end script

That file gets named “fetcher.conf” and goes in the /etc/init/ directory. This has some nice features; the first of which is that once it’s started, it will keep running. If it dies, it’ll respawn (you can see the option right there in the script).  The fact that it died goes in /var/log/daemons – but what’s even awesomer, you can run multiple instances of the same script, by passing in FID=0 or FID=1 etc when you’re starting it. Finally, it gets the standard init features. You can start it with ‘service fetcher start FID=0′ for example.

The only missing feature that I can see, is that because I need to pass in FID=0 to the script, it doesn’t start at bootup. There appears to be no way of stating “Startup 2 of these at boot time”.

In summary, if you use your OS init services, you get to write simpler code, get respawning at an OS level and you get all the normal daemon control features.

Written by Dan Milne

August 7th, 2010 at 11:48 pm

Booko’s moved, features added.

with 3 comments

I’ve been working on a beta version of Booko for, like, 7 months now.  I finally upgraded it while moving from Slicehost to Linode.  I’ve made a large number of changes to the way Booko performs long running tasks and how those tasks communicate. But I’ll leave the nerdy stuff for later.

The biggest change from a user point of view is some integration into Freebase.com. You’ll notice extra information appear in book listings now. For example, the Booko page for The Girl with the Dragon Tattoo now tells you:

  • that the book is part of the Millennium Trilogy
  • the other books in the series (The Girl who played with Fire & The Girl who kicked the Hornets’ Nest)
  • the other editions of The Girl with the Dragon Tattoo – hardcover and paperback.

The data at Freebase is a long way from complete, but it’s constantly growing. This should be a very useful feature. I’ll be doing more to integrate Freebase into the search results – for example, so a book shows up only once, listing the different editions in that single search result item.

List management also got an overhaul. That old list manager page was pretty bad. There’s still work to do, but it should be far more usable now. Log in and check it out!

There are still bugs to fix (soooo many missing images for cover art) and features to add (smarter list price calculation, used books).

Written by Dan Milne

August 1st, 2010 at 10:55 pm

Google / Yahoo user?

without comments

Logging into Booko just got easier. If you have a Google or Yahoo account, just hit the appropriate button and you’re in, registration included.

Turns out, this was super easy to add to Booko since it already does vanilla OpenID logins. Basically, Booko just fills in the OpenID URL with “https://www.google.com/accounts/o8/id” or “http://yahoo.com/” and OpenID Directed Identity does the rest. You could also just type those URLs in yourself and it’ll work just the same.  Sweet!

Written by Dan Milne

December 30th, 2009 at 8:03 am

Posted in Booko,Development,Ruby

Playing on the Master branch

with one comment

I’m working on adding a new feature to Booko, but I accidentally started working on the Git Master branch, which I like to keep sync’d with the production version of Booko. So after a few commits I want to be able to add some fixes to Booko, but I’ve polluted my master branch with untested, unfinished changes.  What to do?

After reading up on Stack Overflow, I decided to fix things.  What I want to do, is reset my master branch to the version in production, and take all the subsequent commits and create a new branch with them.  Turns out, it’s easy.

1. Find the commit you want the master branch to be at. You can find the SHA-1 name with “git log”

2. Create a branch from that commit with: git checkout -b new_master <SHA-1 commit name> (hint: this will be the new master )

3. Rename your current master branch to the new name of the feature: git branch -m master branchname

4. Rename the new_master to master: git branch -m new_master master

And, job done. You’re no longer messing up your master.

Written by Dan Milne

November 2nd, 2009 at 11:45 pm

Posted in Development,Git

So awesome.

without comments

There are times I really enjoy using Ruby on Rails.  Recently, Fishpond started 403’ing http requests for cover images if the referrer isn’t fishpond.com.au.  Sites do this so that other sites don’t steal their bandwidth.  Really, Booko should be downloading the images and serving them itself (It’s on the todo list BTW).  Since Booko had been using Fishpond image URLs to display covers, you may have noticed a bunch of missing cover images – some of them are caused by Fishpond’s new (completely reasonable) policy.

So I’ve updated the code so I don’t link to Fishpond images, but now I need to go through every product Booko’s ever seen  and update those with a Fishpond image URL.   This is laughably easy with ruby on rails. Just fire up the console and run this:

Product.find_each do |p|
  if p.image_url =~ /fishpond/
    puts "updating details for #{p.gtin}"
    p.image_url=nil
    p.get_detail
    p.save
  end
end

The Rails console gives you access to all the data and models of your application – and this code, just pasted in, will find links to all Fishpond images, find a replacement image, or set it to nil. Point of interest – Booko has 396,456 products in its database.  Iterating with Product.all.each would load every product into memory before hitting the each – that would probably never return. On the other hand Product.find_each loads records in batches of 1000 by default.  Pretty cool.

* Thanks to http://ryandaigle.com/ to posting about this feature.

Written by Dan Milne

October 25th, 2009 at 10:17 pm

Posted in Development,Ruby

Fun with git post-commit

with one comment

While developing new features or bug fixes Booko, I usually work in branches. This makes keeping things separate easy, and means I can easily keep the current production version clean and easy to find.  But when changing branches I often have to restart the rails server and the price grabber to pickup any changes.  For example, if I’m adding a new shop in a branch, when I switch branches I want the price grabber to restart.

Turns out git makes this super easy. You just create a shell script: .git/hooks/post-checkout

That script gets called after checkout. So, mine is pretty simple:

#!/bin/sh

./bin/fetch_price.rb 0 restart;
thin restart

There’s probably a better way to get Thin to reload itself, but this works nicely.

You can checkout all the hooks here: http://www.kernel.org/pub/software/scm/git/docs/v1.5.5.4/hooks.html

Written by Dan Milne

October 24th, 2009 at 3:15 pm

Posted in Development,Ruby

On Users and Passwords

with 6 comments

Update: thanks to the commenters for pointing out some flaws in the logic of the previous version of this page. I’ve updated the page to incorporate their feedback.

I’ve been thinking about adding wish lists to Booko. Wishlists require an implementation of Users – after all, what’s the point of having Wishlists if you can’t change them or publish them?  Booko’s built on Ruby on Rails, so I had a look around for plugins, but, truth be told, I’m too much of a Ruby n00b to trust other people’s plugins.  I’m sure they’re easy to install, but how do you keep them up to date? Finding any kind of bug will mean reading and understanding the code and seriously, that’s as much work as implementing Users on my own.  Plus, I have concerns about their implementation which I’ll discuss more later.

So, I figure having users requires two bits of data:

  • Email address
  • Password

Having the email address as the login name makes sense to me – it’s unique and if someone forgets their password I can email them a password reset link.  No need to remember a separate username.  One day I’ll add OpenID because I’m a freetard and like the concept.

Now, passwords are valuable bits of information and they need to be protected. This may sound obvious, but they need to be protected for a couple of reasons:

  • only the actual user (or owner of the email address) can log in and manipulate Booko Wishlists
  • many users use the same email address and password on other sites (PayPal & Amazon for example).

Maybe you don’t do this, but if you have a separate password for every site requiring login, you’re a better person than me. The consequence of someone getting hold of your email and password can lead to some … difficulties on sites you’ve used the same email address and password.

So, we need to protect passwords from prying eyes.  There are two main ways for your password to be discovered:

  • Database compromise
  • Web sniffing proxies.

So, what to do? What we do is turn to hashing functions. Hashing functions (like MD5 & SHA*) can take information like a password and send it on a one-way trip.  In effect, it is impossible to work backwards from the hash to the password, however, some clever people have calculated the hash of hundreds or millions of passwords and stored the hash – then they can simply lookup the hash stored in Booko with their pre-calculated hashes and find the password. This is known as a dictionary attack.

To thwart this attack, we introduce a “salt”. The salt is a random string of characters which is combined with the password before it is hashed.  This means that the dictionary attack is now useless – they would need an entire dictionary which included your salt combined with all those guessed passwords. The correct way of combining a password with a salt is to use a HMAC function.  In Ruby you can do this like this:

require 'hmac-sha1'

def self.do_hashing(password, salt)
    passwd_hmac = HMAC::SHA1.new(salt)
    passwd_hmac << (password)
    passwd_hmac.hexdigest
end

Ideally, each User has their own salt. This means an attacker would need to generate an entire dictionary attack per user.

In any case, this is where the other implementation of Users seem to stop. When you type your password into a form on a web page, it gets sent to the server – the server hashes your password (along with the salt) and checks if that hash matches the stored hash. If they match, you’re in.

But there is still the matter of your password being sent over the internets possibly via proxy servers which can listen in on the traffic. There’s two ways to stop the password being sent in the clear. Either hash the password first, or use SSL. If you’re using SSL to send the passwords, you could stop here. Booko currenly doesn’t have SSL certificates so we need to stop passwords travelling over the internet in the clear. How do we do that? Easy, we hash the password before sending it over the internet. We’ll create a single salt for this hash and call it the transport salt.

On the server side, we’ll also hash the password with the transport salt, prior to hashing it with the per-user salt.

This means we send the browser the transport salt, and the browser calculates the hash. This hash is sent to the server. The server can validate the password as shown below:

require 'hmac-sha1'

def self.do_hashing(password, salt)
    passwd_hmac = HMAC::SHA1.new(salt)
    passwd_hmac << (password)
    passwd_hmac.hexdigest
end

final_hash_to_check = do_hashing(password_hash_from_browser, user_salt)

if final_hash_to_check == stored_password_hash
    # User provided correct password.
    session[:user_id] = user.id
end

In the code above, when the response comes back from the client, the server calculates the final hash by using the user_salt. This final hash is compared with the stored_password_hash – if they’re the same the client provided the correct password.

So, where does this leave us?

  • At no point is the password sent in the clear, nor stored in the clear.
  • The client only sees the transport salt, not the per-user salt and can’t precalculate a dictionary attack
  • Each user has a separate salt, making it far more difficult for an attacker to perform a dictionary attack
  • Compromising the database and retrieving the hashes doesn’t allow you log in with that hash
  • Compromising the database and retrieving the hashes doesn’t allow you to log onto any other sites
  • Sniffing the hashed transport hash will allow an attacker to access your account

So, we’ve achieved some of our goals. The password is never sent in the clear. However, if an attacker snoops traffic between the client and server and get’s a copy of the hashed password, that password can be used to log on to the service.

If that’s unacceptable, then moving to SSL is the next step. You can use SSL to protect the hash as it moves between the client and server. However you can also use SSL to protect the plain password being discovered too. Is there any point in doing both? Hashing the password prior to SSL is slightly more secure. If you hash the password before it leaves the client, there’s no danger of the password appearing in the log files of your web sever or application server. If you have a dedicated host decrypting the SSL prior to passing it to your web server, the password could be sniffed between those servers.

After all that, I’ve decided that Booko will use both hashing the password before transmission and, eventually, SSL.

Notes on SHA1 and extra security:
SHA1 hashes are very secure and can be calculated very fast. The faster the hash, the faster you can create a dictionary attack.  Ideally for this scenario, we want a slow hashing function, or, more correctly, we want the method that generates our hashes to be slow. This can easily be achieved by simply hashing the password multiple times.  Here’s some timings of hashing:

>> require 'hmac-sha1'
>> require 'benchmark'
>> salt = (0..256).map { ((0..9).to_a + ('a'..'z').to_a + ('A'..'Z').to_a).rand }.join
>> hash =  nil
>> Benchmark.realtime { hash =  HMAC::SHA1.new(salt)<< "MyPassword" }
=> 6.60419464111328e-05
>> Benchmark.realtime { 100.times {  hash = HMAC::SHA1.new(salt) << hash.hexdigest } }
=> 0.00437498092651367
>> Benchmark.realtime { 1000.times {  hash = HMAC::SHA1.new(salt) << hash.hexdigest } }
=> 0.0426349639892578
>> Benchmark.realtime { 10000.times {  hash = HMAC::SHA1.new(salt) << hash.hexdigest } }
=> 0.463771820068359
>> Benchmark.realtime { 100000.times {  hash = HMAC::SHA1.new(salt) << hash.hexdigest } }
=> 4.64294099807739

So, 10,000 hashes took around 1/2 second on my 2.8Ghz Core 2 Duo MacBook Pro. You can see that performing 10,000 hashes will seriously slow any attempt at creating a dictionary attack on your passwords. I haven’t timed how long it takes in Javascript, but you’d want to keep it under a second to make user log in not too painful.

Links:

You can find javascript libraries for doing HMAC on PAJ’s Homepage or jsSSH on sourceforge.

Written by Dan Milne

September 23rd, 2009 at 11:20 pm

Posted in Booko,Development,Ruby