Yahoo SearchMonkey Launches Into Private Beta

Yahoo is letting a limited number of beta testers into its SearchMonkey developer platform, a service announced in February that allows site owners to customize their Yahoo search result listings.

With SearchMonkey, site owners create “applications” for Yahoo search that can be installed by users in the same sense that Facebook applications can be installed. Each application modifies results for a certain URL specification (for example, all reference pages on Wikipedia or product pages on Amazon). Modifications include both changes to the basic elements of a search result (the title and description) and additions such as an image, deep links, and key/value pairs.

Here’s the Yelp search result example we used in our first post about SearchMonkey:

Notice that the modified result has links to reviews and photos, plus quick data such as the restaurant’s rating, address, and phone number. This extra info is supposed to make the result more useful to searchers and hopefully entice them to click through at higher rates.

Creating an application

I had a chance to play around with the SearchMonkey application creation tool, which is actually quite straightforward (if you’re comfortable with PHP, XSLT, and DataRSS).

While you can import other people’s applications into SearchMonkey and modify them, you’ll probably want to start off from scratch when customizing search results for your website. The process starts by picking a name, description, category, icon, and type for your app.

There are two possible types: enhanced results and infobars. Most site owners will opt for the “enhanced result” option, which can display a bunch of information in full view as the Yelp example above does. The “infobar” option condenses supplementary information into an interactive bar below each search result. Users need to engage with it to reveal more information, but apparently it provides more flexibility in terms of HTML and styling (I couldn’t get it to work in my tests).

Once you set those basic settings, you need to indicate the URL structure that your application will target. The structure looks something like *.crunchbase.com/company/* where the asterisks allow wildcard matching at the beginning and/or end of your URL pattern. That example pattern would be used to customize search results for all company pages on CrunchBase.

After submitting up to ten URLs for which to test your new application, things start getting a little more complicated. You must select the sources for the data you want to show in search results, and then you need to tell SearchMonkey how to display that data. By default, you have access to 16 basic data fields from Yahoo. These include the result’s title and description, plus other less useful things such as its language, file size, and position among all the results.

But you want to pull information from the page being listed as a search result so users can get a taste of what they’d learn by clicking through. So you’re going to have to somehow inform Yahoo of this information. That requires creating a “custom data service”, which is a multi-step process in and of itself that I describe below.

For a second, though, assume that you’ve already created a custom data service. All that’s left is to tell SearchMonkey how to display the custom data, which consists of setting values in a simple PHP class. You can set up to 4 deep links (restricted to your own domain) and 4 key/value pairs. As you play around with the appearance of your result, you can preview it at the bottom of the page by clicking “Save & Refresh”. This gives you constant assurance that you’re actually doing things right (always a good thing).

Creating a custom data service

So those are the basic steps behind creating an application. Before you do that, however, you’ll probably have to set up a custom data service. These services use either semantic page scraping or web services to obtain the data you need for your customized result.

The page scraping method follows Yahoo’s previous announcement that it would embrace semantic web standards. It allows you to pull information from your webpages without creating any APIs, since it can be taught to crawl your HTML structure for just the data it needs.

Setting up page extraction requires the creation of an XSLT file, a process that’s made easier by the use of the XPather Firefox extension, which helps you extract the requisite code for identifying data on your site. I won’t go into any details about XSLT, but needless to say you’ll need to brush up on your knowledge of it. You’ll also most likely need to tweak your website code to ensure consistency across pages that are meant to display the same type of data.

As for the web services method of creating a custom data service, you’ll need to know XML in OpenSearch format or XSLT.

Our (half-baked) search result customization

So what did our test application produce? The modification you see below:

Obviously it’s not complete by any means, but you can see how much of the result is customizable – and that we were able to pull out Facebook’s address from its CrunchBase page without much configuration.

A step forward in the evolution of search

While the eyes of non-technical readers have probably glazed over at this point, I can assure even them that it’s not rocket science to customize search results with SearchMonkey. As such, I can see this becoming very popular with website owners big and small.

Perhaps the only question left, then, is how Yahoo will promote these applications to its search users. Facebook benefited from powerful viral channels when it kicked off its platform, but we have yet to see how Yahoo will attract users to its applications. Some of the applications (for larger, more prominent sites) may be enabled by default, but the vast majority will presumably require people to manually opt in.

Assuming that a vibrant app ecosystem does arise, SearchMonkey should be a significant step toward the more personalized, and potentially social, search that’s been anticipated by many. Once you allow 3rd-party developers to innovate – and users to adopt the best innovations – the market will have a direct ability to push search forward in its evolution.