The need for speed

January 2nd, 2007

We’ve been doing a lot of work behind the scenes trying to get some of our slower parts of the site up to speed. Two of the slowest parts were 1) When you bookmark an item via the browser button, it takes a long time for the matches to load, and 2) when you search site-wide for an item, the search takes a long time to come back (although searches of just your stuff or just your friends’ stuff should be snappy).

The thing both of these areas have in common is that we simultaneously search our database (people’s wishes) and also the whole Amazon.com catalog for items that match what you’re looking for. Going off and searching Amazon’s catalog is always going to be a little slow because of the round-trip we have to make to their servers, but in doing some profiling I realized a lot of the time was being sucked up in XML parsing with REXML. Not only is there overhead in loading and an XML processor, but also the format Amazon gives search results back in required quite a bit of massaging on our end to get into a format suitable for our database.

Fortunately, Amazon provides a facility for you to give them an XSL stylesheet and then they’ll do that processing for you. That’s pretty sweet, because now we can get back better-formatted data from them, and it doesn’t cost us any processing power since they do the transformation on their servers. Since I could use XSLT to get the data back in any format I wanted, I really wanted to avoid having to do any XML processing at all. So, I chose to create a stylesheet that would return the Amazon search results in YAML. YAML is a super-simple data format which, because of it’s simplicity, is fast and easy to parse… and thankfully Ruby has some very YAML-friendly methods which mean I didn’t have to write any parsing code.

Read the rest of this entry »

Play with Wishlisting!

December 14th, 2006

For those of you who didn’t guess the URL, it is http://beta.wishlisting.com. It should now fully support Internet Explorer. Please play around with it and send us bugs! You can send e-mail to beta at wishlisting.com, or just add a comment to this blog post. Eventually we’ll probably have a formal “log a problem” link on the site.

Major things that we know are busted:

  • Searching for a new thing to add to your list is slooow. Searching your own list, a friend’s list, searching for people, etc should be reasonably fast. Let us know if it isn’t.
  • We need a proper front page, privacy policy (”we don’t sell your personal info”), and terms of service (”don’t screw with the site, and don’t sue us”).
  • The browser button is a huge pain to install and use in Internet Explorer. Firefox isn’t perfect either, but it’s better. We are aware of those issues and plan on rectifying them. The descriptions for how to install them now are improved, but still not great. If you have problems, feel free to e-mail me and I’ll walk you through them until we get solid instructions up.
  • Our host still kills our processes once in a while. I’m not sure what activity is causing us to blow out of memory, but I’m trying to find out. If you use the site, you might cause it to happen with more frequency - that would help us figure out what’s going on.

We have a ton of features planned, which we’re not ready to talk about just yet. Our focus is to make sure the site is stable and functional at the moment. However, if you have any suggestions for features by all means send them to us!

When we’re confident that the site is stable and usable, we’ll move it over to wishlisting.com proper. All of your data will move along with it, so feel free to use it in a realistic way.

Some Rails Tips

December 13th, 2006

We recently resolved a few longstanding issues with the site which I thought might be generally applicable to anyone else building a site in Rails. So, FYI:

RJS Templates are Slow

We do almost all of our screen updates via AJAX, and use Rails’ RJS templates to do almost all of that. The problem is, in the current version of Rails (1.1.6) they’re really slow to build. Rails does a lot of string parsing to turn a block of HTML into a JavaScript command. Fortunately, this is a known bug in Rails which will be fixed in 1.2. Thanks to the fact that Ruby is a dynamic language, you can drop this code into your project as a plugin, and effectively patch that part of Rails. It helped our performance in rendering big pages of gifts quite substantially. A list of 75 items was taking nearly 3 seconds to generate, and this patch dropped it to .3 seconds.

Running Background Tasks in Rails

Unlike other sites, the model we use to interact with your Amazon.com wishlist is “associate it, and we’ll keep it sync’d up” rather than “import it and never go back to Amazon” The reason we do this is twofold:

  1. Don’t force people to abandon their Amazon wishlist if they want to use wishlisting.
  2. Amazon is being very web 2.0-friendly by exposing their wishlists via an API. Consumers of the API shouldn’t take advantage of that openness by driving Amazon’s customers off-site.

So, in order to keep in sync with Amazon, we need to run some background tasks to see if you added/removed anything from your list there, see if anyone bought anything from your list, etc. There is a whole wiki entry on how to run background tasks in Rails, although none of them are particularly good in a shared environment, much less on a Grid (where even script/runner doesn’t work). Here’s a rather clever solution that MediaTemple provided us. In a cron job, run this:

/usr/bin/curl http://yoursite.com/path/to/a_page_that_does_some_work

Everything will run in your already running instance of Rails without needing to load up anything additional. Very handy - and so far it’s doing a good job of keeping everyone’s wishlists in sync.

UPDATE 9/26/2007: MediaTemple has an additional way to run background tasks on the grid. There is an issue on the grid where if you run a request that takes longer than 20 second to complete, it will start sending 502 error messages back to clients who are trying to access your site at the same time (yes, this is a little alarming). So, we stopped using the curl method. They posted a KB article on how you can run rake tasks in the background on the gs. With this power, you can effectively do what script/runner does. (Note: If you run into errors about RMagick read this article.) All you need to do is add require RAILS_ROOT + '/config/environment' to the beginning of your rake task to have access to the full rails environment.

There were two interesting articles over at PC world talking about the problem with current price comparison engines.

From Protect Yourself While Using Web Shopping Tools:

Most shopping engines are biased toward companies that pay them fees for prominent placement.

Shopping engines don’t always report the most comprehensive list of products… because merchants often remove low-margin and low-cost items from the product catalogs they share with shopping engines…

And from Shopping Engines: Suspect Advice?:

Nearly a third of the merchants that came up in our searches for popular consumer electronics on Yahoo Shopping, for example, had “unsatisfactory” ratings with the BBB.

I’ve been reading quite a bit about the comparison shopping space and there appear to be a lot of areas for improvement.

Heart - Spotlight Over Thanksgiving break we updated the top banner so that it was both prettier and more usable, which was one of the things preventing us from going beta.  One of the other big ones is IE support, which we’re still working on.  A lot of the problems have been resolved due to us adding some tables for layout to prevent quirky CSS bugs, and a number of bugs were fixed dealing with image uploads, etc.  It’s nice to see some of the CSS problems we were having appear to be resolved in Internet Explorer 7.

For those of you who read Digg, you may have seen this article about Media Temple. If you’ve come to the site and found it down, or seen a bunch of errors pop up, it’s quite likely due to the fact that our host has had some serious stability problems lately.  Sometimes the whole server is down, sometimes the database… lots of various problems.  We were one of the customers who was given 2 months’ free hosting for our troubles, but needless to say, the stability concerns us.  It probably won’t stop us from going to public beta, but it’s certainly one of the things we’re going to have to think hard about before the ultimate launch.

Finding Memory Leaks in Ruby

November 27th, 2006

If you suspect you have a memory leak in a Ruby on Rails app, you’re probably going to have a hard time:

  1. Proving that you do have a leak
  2. Finding it

(The “fixing it” part probably is easy.) I had to do both activities because of a memory leak I suspected in wishlisting, and found very little assistance on the web as to how to go about it. A Google search revealed two possible helpful hints. One is this blog post by Scott Laird which offers you a script that can dump all of your in-memory strings to a file every 10 seconds. I spent a good amount of time playing with this, and generally concluded (as many of the comments suggest) that I couldn’t make much sense of the output. There were thousands of differences, and I didn’t know which were “ok” and which were signs of problems. Another possibility is this commercial tool for watching Ruby’s memory. I didn’t try it at all because it was Windows-only, which meant I’d be installing/testing my app on a non-production platform, and also because the screenshots made me think I’d be in much the same place as I was with Scott’s tool. The tools simply aren’t as refined as you’ll find in the Java or C++ worlds.
Read the rest of this entry »

Choices

November 22nd, 2006

The Paradox of Choice is on my reading list for a number of reasons. One of them is that when I’m shopping for, say a digital camera, sometimes I’ll use one of those price comparison sites to find the lowest price… and I’ll end up with something like this (site to remain nameless):

Choice

The lowest price it lists is $351 at a store called Butterfly Photo. There are 51 other options. I understand that a purchase decision involves more than price, which is why for each store they offer ratings of merchants, reviews, shipping and tax breakdowns, etc. You end up with an overwhelming amount of information on each store, and a good chunk of options that are wholly inferior to other options. For example, is anyone really going to consider buying this camera for $535.77 from a store called Compuvest? Similarly, an eagle-eyed reader might notice that Amazon.com isn’t included among the options. It happens to be available there for $361, and there’s a 5% off electronics coupon today.

So, for people who are really worried about getting the best price, do they have to visit multiple shopping engines? Multiple coupon sites? How much trouble is it worth to save $10 on a camera? This is the kind of thing John and I have a lot of discussions about. What do you think?

Thank You Alpha Testers!

November 20th, 2006

We didn’t actually plan on having an ‘alpha’ release of Wishlisting.  However, a few enterprising friends did some URL hacking and found the live test site we’ve been working with, and started using it.  Special thanks to Shuttle and Kate for all of the bugs they’ve sent in.

There are a number of outstanding issues, a subset of which are preventing us from going into real beta.  Namely:

  • Our IE support is weak.  We have various hacks in there to work around CSS, PNG, and compressed JavaScript issues in IE, but there is still much to be done.  Since most people in the world use IE, we really need to make sure that works (John will likely eventually write an article about the frustrations involved in that).
  • There’s a big difference between something that “works” and something that’s “done”.  For example, something that works lets you add stuff to your wishlist, and work with the basic functionality of writing reviews, adding friends, etc.  Something that’s done has a privacy policy, terms of service, a front page with actual graphics, etc.
  • Our host shoots us sometimes.  We’re running on the MediaTemple GridServer so that hopefully we can grow without having to worry too much about hardware.  The rub with the Grid is that although you can theoretically scale across machines and use lots of CPUs, etc… you can’t use lots of memory.  Currently, if our process uses more than 64 MB of RAM, they’ll kill us.  I fully expected us to grow out of 64MB, but I didn’t expect it with only a handful of alpha users… so there’s definitely something wrong.  Periodically functionality will stop on the site because they’ve killed one of our processes.

Our alpha testers have found a lot of great bugs that were important to fix before going live, including Dan, who immediately started in with the SQL injection attacks.  Kate found a great bug with the images we import, as well as a ton of usability problems.  Shuttle seems to have a knack for performing just the right actions to get MediaTemple to kill our processes.  We’re knocking down problems as fast as possible, but there is a good bit of nastiness left.  Thank you to all the Alpha testers for helping us minimize the unknown unknowns.

Bug squashing

November 10th, 2006

Last night John and I sat down and went through the site to determine how far we are from being able to go beta.  We played around with all of the sites most common features, nitpicked various areas of the experience that we didn’t like and/or were broken, and logged about 30 bugs.  Although a beta is used to some extent to find bugs, people shouldn’t have to run into obvious bugs during a beta.  They’re supposed to find the “hard” bugs.  The ones that come up because they’re using IE7 with some pop-up blocking toolbar on a release candidate of Windows Vista.  They shouldn’t have to suffer through much in the way of totally predictable mainstream problems.

So… this weekend is going to be a lot of bug-squashing and test case writing.  We want to get the beta out though… soon!

Keeping bugs out

November 7th, 2006

The underlying technology behind wishlisting.com is Ruby on Rails. Coming down the home stretch before showing it to people, one of the things we’re focusing on is preventing people from hitting errors, especially when performing basic operations.

Ruby is widely touted as being a dynamic language - which has it’s pros and cons. One of it’s cons is that you have to test everything a lot to make sure that when you make a change, it doesn’t break other things. For example, we started using the conditional caching plugin, got that working, and when we added the querystring action caching plugin, they stomped all over one another, overriding the same methods and not providing any error messages.

In other languages, errors like this would be caught by the compiler. You’d get a broken build, and know immediately that something was wrong. The equivalent in Ruby is to write good, automated, unit and functional test to catch problems. Writing tests is pretty easy - Rails has that baked in. Automating them requires some work. The tools we found worked the best were the continuous_builder plugin. Ryan Daigle has a great article on how to install and configure that plugin here. This plugin will fire off your suite of tests whenever you check code into your repository. That way you can get some amount of satisfaction that what you checked in didn’t break other stuff - at least, to the extent that you wrote tests that covered all of the other stuff.

How do you figure out how much of your code your tests are testing? That’s where code coverage tools come in, the leader in the Ruby space is Rcov. We haven’t gotten there yet, but we will. For a quick and dirty way to keep an eye on how successful your functional and unit tests over time are, Ben Curtis has a cool online tool called Tesly, Jr. You can install a plugin into your codebase that will automatically post your test results up to that site, which will summarize them and produce some little graphs. We’re using it now, and although I’m not yet sold on it’s usefulness, it’s pretty cool and is very easy to install and configure.