Small DB optimization for your apps
January 22nd, 2008
After doing some work with the efficient_sql plugin for Rails, I discovered a small optimization that a lot of sites could probably benefit from but didn’t know existed.
Unless you’re using a binary collation on your database tables, your indexes are actually case insensitive. (ex. we use utf8_general_ci instead of utf8_bin in MySQL). How can you use this for an optimization? Well if you end up building a site with readable URLs, odds are you’ll put a username in them, like:
http://wishlisting.com/#person/tlianza
For usability, odds are you’ll want to make your URLs case-insensitive, in case someone created a username like Tlianza and someone else tries to navigate to it with tlianza. So, you end up writing a SQL query that looks something like this (ignoring the appropriate SQL injection cleansing for the purposes of a simple example):
SELECT * FROM users WHERE LOWER(user_name)='username_they_entered_lowercased'
Assuming you put an index on the user_name column (which you probably did since you’re going to query on it a lot) by wrapping it in a LOWER you’re actually not using your index anymore, as you’re performing a transformation on what would be the index hash/lookup key. Just take the LOWER out. You’ll be able to use your indexes again, and they’ll remain case insensitive.
Client Side Optimization with YSlow
October 29th, 2007
Last week I spoke at the Seattle Tech Startups meeting about using YSlow to optimize websites.
If you missed it, the video is online here: Client-side Website Optimization with YSlow.
If you want a copy of the slides, they are here. There are also more details in the slide notes in some cases. The presentation covers:
- What YSlow is, how you use it, and what it does for you
- How browser caching works - the Expires header, Etags, and Conditional GETs
- Simple compression techniques for minimizing bandwidth usage
- Strategies for optimizing HTTP requests
- Techniques for modifying web page structure to improve performance
Improve Google Analytics Performance
October 17th, 2007
A lot of sites, including ours, use Google Analytics to study how people are interacting with the site. It’s overboard for simply measuring visitors, but if you want to track specific conversion goals (like how many new visitors who hit the front page ultimately sign up for an account) it’s great.
One of the problems with it is that it’s not terribly reliable, and because it’s totally JavaScript-based, our end users can see slowdowns if their site is particularly busy. One way to help alleviate this issue is to keep a local copy of their urchin.js file on your server. It will still send stats to Google Analytics, but your end users don’t need to download that file from Google. There are scripts that you can use to download this file nightly, but I wanted a rake task so it would be cross-platform, and have some checks built-in to make sure that we wouldn’t accidentally overwrite the file with bad data if we had a bad download.
The case of the mysterious headphones
September 7th, 2007
They look like ordinary headphones. I even have them on my wishlist and I like them - and yet they’ve haunted me for months.
It all started 5 months ago, when an early beta tester reported that these headphones mysteriously appeared on her list. I looked into it and they didn’t seem out of the ordinary, aside from the fact that they didn’t have a rating (and back in those days, we didn’t have the little green check marks, so everything had to have a rating). Months went by and there were no similar issues. Eventually, the headphones started popping up on other people’s lists. No other item in the entire catalog of wishes had ever done this. Just those headphones, and just certain people. If they deleted them, eventually they’d come back. The headphones were unstoppable.
Pouring through the logs, the best clue I had was that this was happening at night, probably during our scheduled Amazon sync. The users who were affected also happened to be users who had linked Amazon wishlists, although not every user with a linked Amazon wishlist got the headphones. The headphones aren’t even interesting. There are no weird characters in the name, and they didn’t even have a particularly special ID (it’s 4). Today, at long last, the mystery was solved.
(aka "How I spent my weekend")
If you're looking to deploy Rails in a production-quality way on a server, there are a ton of tutorials on how to do that. None did exactly what I wanted though, so I'm adding another to the mix. We're using MediaTemple’s (dv) 3.0 server. Design goals:
- I wanted to use Mongrel instead of FastCGI.
- Because this server comes with Apache pre-installed, and it’s not 2.2 (which supports mod_proxy_balancer), I decided to use the Pen load balancer. (Coda Hale sold me on this)
- I needed to support RMagick
- I wanted everything to be as simple as possible.
The need for speed
January 2nd, 2007
We’ve been doing a lot of work behind the scenes trying to get some of our slower parts of the site up to speed. Two of the slowest parts were 1) When you bookmark an item via the browser button, it takes a long time for the matches to load, and 2) when you search site-wide for an item, the search takes a long time to come back (although searches of just your stuff or just your friends’ stuff should be snappy).
The thing both of these areas have in common is that we simultaneously search our database (people’s wishes) and also the whole Amazon.com catalog for items that match what you’re looking for. Going off and searching Amazon’s catalog is always going to be a little slow because of the round-trip we have to make to their servers, but in doing some profiling I realized a lot of the time was being sucked up in XML parsing with REXML. Not only is there overhead in loading and an XML processor, but also the format Amazon gives search results back in required quite a bit of massaging on our end to get into a format suitable for our database.
Fortunately, Amazon provides a facility for you to give them an XSL stylesheet and then they’ll do that processing for you. That’s pretty sweet, because now we can get back better-formatted data from them, and it doesn’t cost us any processing power since they do the transformation on their servers. Since I could use XSLT to get the data back in any format I wanted, I really wanted to avoid having to do any XML processing at all. So, I chose to create a stylesheet that would return the Amazon search results in YAML. YAML is a super-simple data format which, because of it’s simplicity, is fast and easy to parse… and thankfully Ruby has some very YAML-friendly methods which mean I didn’t have to write any parsing code.
Some Rails Tips
December 13th, 2006
We recently resolved a few longstanding issues with the site which I thought might be generally applicable to anyone else building a site in Rails. So, FYI:
RJS Templates are Slow
We do almost all of our screen updates via AJAX, and use Rails’ RJS templates to do almost all of that. The problem is, in the current version of Rails (1.1.6) they’re really slow to build. Rails does a lot of string parsing to turn a block of HTML into a JavaScript command. Fortunately, this is a known bug in Rails which will be fixed in 1.2. Thanks to the fact that Ruby is a dynamic language, you can drop this code into your project as a plugin, and effectively patch that part of Rails. It helped our performance in rendering big pages of gifts quite substantially. A list of 75 items was taking nearly 3 seconds to generate, and this patch dropped it to .3 seconds.
Running Background Tasks in Rails
Unlike other sites, the model we use to interact with your Amazon.com wishlist is “associate it, and we’ll keep it sync’d up” rather than “import it and never go back to Amazon” The reason we do this is twofold:
- Don’t force people to abandon their Amazon wishlist if they want to use wishlisting.
- Amazon is being very web 2.0-friendly by exposing their wishlists via an API. Consumers of the API shouldn’t take advantage of that openness by driving Amazon’s customers off-site.
So, in order to keep in sync with Amazon, we need to run some background tasks to see if you added/removed anything from your list there, see if anyone bought anything from your list, etc. There is a whole wiki entry on how to run background tasks in Rails, although none of them are particularly good in a shared environment, much less on a Grid (where even script/runner doesn’t work). Here’s a rather clever solution that MediaTemple provided us. In a cron job, run this:
/usr/bin/curl http://yoursite.com/path/to/a_page_that_does_some_work
Everything will run in your already running instance of Rails without needing to load up anything additional. Very handy - and so far it’s doing a good job of keeping everyone’s wishlists in sync.
UPDATE 9/26/2007: MediaTemple has an additional way to run background tasks on the grid. There is an issue on the grid where if you run a request that takes longer than 20 second to complete, it will start sending 502 error messages back to clients who are trying to access your site at the same time (yes, this is a little alarming). So, we stopped using the curl method. They posted a KB article on how you can run rake tasks in the background on the gs. With this power, you can effectively do what script/runner does. (Note: If you run into errors about RMagick read this article.) All you need to do is add require RAILS_ROOT + '/config/environment' to the beginning of your rake task to have access to the full rails environment.
Finding Memory Leaks in Ruby
November 27th, 2006
If you suspect you have a memory leak in a Ruby on Rails app, you’re probably going to have a hard time:
- Proving that you do have a leak
- Finding it
(The “fixing it” part probably is easy.) I had to do both activities because of a memory leak I suspected in wishlisting, and found very little assistance on the web as to how to go about it. A Google search revealed two possible helpful hints. One is this blog post by Scott Laird which offers you a script that can dump all of your in-memory strings to a file every 10 seconds. I spent a good amount of time playing with this, and generally concluded (as many of the comments suggest) that I couldn’t make much sense of the output. There were thousands of differences, and I didn’t know which were “ok” and which were signs of problems. Another possibility is this commercial tool for watching Ruby’s memory. I didn’t try it at all because it was Windows-only, which meant I’d be installing/testing my app on a non-production platform, and also because the screenshots made me think I’d be in much the same place as I was with Scott’s tool. The tools simply aren’t as refined as you’ll find in the Java or C++ worlds.
Read the rest of this entry »
Keeping bugs out
November 7th, 2006
The underlying technology behind wishlisting.com is Ruby on Rails. Coming down the home stretch before showing it to people, one of the things we’re focusing on is preventing people from hitting errors, especially when performing basic operations.
Ruby is widely touted as being a dynamic language - which has it’s pros and cons. One of it’s cons is that you have to test everything a lot to make sure that when you make a change, it doesn’t break other things. For example, we started using the conditional caching plugin, got that working, and when we added the querystring action caching plugin, they stomped all over one another, overriding the same methods and not providing any error messages.
In other languages, errors like this would be caught by the compiler. You’d get a broken build, and know immediately that something was wrong. The equivalent in Ruby is to write good, automated, unit and functional test to catch problems. Writing tests is pretty easy - Rails has that baked in. Automating them requires some work. The tools we found worked the best were the continuous_builder plugin. Ryan Daigle has a great article on how to install and configure that plugin here. This plugin will fire off your suite of tests whenever you check code into your repository. That way you can get some amount of satisfaction that what you checked in didn’t break other stuff - at least, to the extent that you wrote tests that covered all of the other stuff.
How do you figure out how much of your code your tests are testing? That’s where code coverage tools come in, the leader in the Ruby space is Rcov. We haven’t gotten there yet, but we will. For a quick and dirty way to keep an eye on how successful your functional and unit tests over time are, Ben Curtis has a cool online tool called Tesly, Jr. You can install a plugin into your codebase that will automatically post your test results up to that site, which will summarize them and produce some little graphs. We’re using it now, and although I’m not yet sold on it’s usefulness, it’s pretty cool and is very easy to install and configure.