January 2nd, 2007
We’ve been doing a lot of work behind the scenes trying to get some of our slower parts of the site up to speed. Two of the slowest parts were 1) When you bookmark an item via the browser button, it takes a long time for the matches to load, and 2) when you search site-wide for an item, the search takes a long time to come back (although searches of just your stuff or just your friends’ stuff should be snappy).
The thing both of these areas have in common is that we simultaneously search our database (people’s wishes) and also the whole Amazon.com catalog for items that match what you’re looking for. Going off and searching Amazon’s catalog is always going to be a little slow because of the round-trip we have to make to their servers, but in doing some profiling I realized a lot of the time was being sucked up in XML parsing with REXML. Not only is there overhead in loading and an XML processor, but also the format Amazon gives search results back in required quite a bit of massaging on our end to get into a format suitable for our database.
Fortunately, Amazon provides a facility for you to give them an XSL stylesheet and then they’ll do that processing for you. That’s pretty sweet, because now we can get back better-formatted data from them, and it doesn’t cost us any processing power since they do the transformation on their servers. Since I could use XSLT to get the data back in any format I wanted, I really wanted to avoid having to do any XML processing at all. So, I chose to create a stylesheet that would return the Amazon search results in YAML. YAML is a super-simple data format which, because of it’s simplicity, is fast and easy to parse… and thankfully Ruby has some very YAML-friendly methods which mean I didn’t have to write any parsing code.
So, now when I get an item back, the results come back in a 33K YAML file instead of a 129K XML file. Instead of a wordy 100-line block of XML, including attributes and information we don’t need, we can get a small, readable bit of YAML ready for immediate insertion into our database:
B0007OF3QU: name: "Shakespeare Coffee Mug" imageuri: "http://ec2.images-amazon.com/images/P/B0007OF3QU.01-A2B6NBKP2Y88KU._SCMZZZZZZZ_V33857835_.jpg" imageh: 151 imagew: 160 upc: isbn: mpn: category: Kitchen ean: salesrank: 105007 price: 895 binding: Kitchen description: > A striking 16-ounce mug of new bone china featuring a delightful illustration of Shakespeare by illustrator Mike Caplanis.
I never really understood YAML’s place in the markup ecosystem (yes I know YAML Aint Markup) but I have a better appreciation for it now. There is a place for a simple file structure for quick operations.
All that said - our search is still pretty slow, although there are a number of less interesting reasons for that (our host is having database issues). If you’re interested in using XSL to convert XML to YAML, you can look at our stylesheet as an example. I also highly recommend the Oxygen XML editor.