About a week ago we soft-launched a new search engine for TechCrunch, the first to be built and deployed using Yahoo’s new BOSS Custom technology (otherwise known as “BOSS vertical lens”, as Yahoo called it in its announcement today).
The new TechCrunch Search is considerably more powerful than WordPress’s default search capabilities. For one, you can now search across our entire network of English-speaking blogs (which includes CrunchGear, MobileCrunch, TechCrunch UK, TechCrunchIT, Gillmor Gang and others in addition to TechCrunch) with just one query. Results also include pages from CrunchBase about notable companies, people, and financial organizations in the technology sector. And perhaps most crucially, results are no longer listed in dumb reverse chronological order but actually rise to the top based on their relevance to your query (Yahoo’s relevance model, however, has indeed been tuned to highlight our freshest content about particular subjects, such as the iPhone or Facebook).
The improvements don’t stop at better breadth and relevancy. The new user interface features an image for each result (when available), making it easier to identify posts visually. If you search for the name of a company, person or financial organization (such as Sequoia Capital or Jerry Yang), a specially formatted result at the top of the page will guide you to information about them in CrunchBase. And if you conduct a search that yields a lot of results, we have advanced search options that will help you narrow in on just what you’re looking for.
These new search options reflect some of Yahoo BOSS Custom’s strongest suits. Because we manually feed our blog and CrunchBase data to Yahoo using an XML-based API, we have the ability to associate any type of information (such as author name and number of comments) with each result. The search options leverage this meta data by allowing users to filter the results by certain criteria. For example, you can choose to view only results about “microsoft” by Michael Arrington that have at least 94 comments. Or you can decide to see what John Biggs has to say about the hottest new watches on CrunchGear.
BOSS can also be used to incorporate results from elsewhere on the web. When you conduct a search that doesn’t turn up many results from within the TechCrunch network, Yahoo backfills the pages with results from elsewhere on the web. These are essentially the same results you’d find by searching with Yahoo’s main search engine, except they are provided conveniently at the end of our own. Sponsored results from across the web are also displayed, allowing marketers to reach out to our visitors with products and services that are relevant to their search queries.
Setting up this new search engine was a much more hands on experience than, say, installing something like Google Site Search. For one, the Yahoo BOSS Custom technology is intended for developers, not those looking for an out-of-the-box solution. BOSS Custom consists primarily of two XML-based APIs: one that we use to supply data from our sites to Yahoo in real-time, and one that we used to construct our search’s user interface.
Since we publish using WordPress, supplying our data using the first API essentially required that we design and deploy a plugin that would send information to Yahoo’s servers every time there was a new post or comment on any of our blogs. We also needed to create a similar data indexing system for CrunchBase so that contributions there would show up in the results as well. To ensure that all of our archived content was incorporated in the search index, we supplied Yahoo with historical data dumps from all 10 sites. Perhaps needless to say, this took a considerable amount of time just to ensure that the data we indexed at Yahoo was accurate and complete.
Once the data from our sites was synchronized with Yahoo’s servers, we built out the search interface from scratch using the XML returned by Yahoo’s other API. This API takes the user’s search term and returns data about the corresponding results, such as titles, URLs and descriptions. It also returns meta data about the results such as their ordering and refinement options. Our search front-end parses and processes the XML to display the user interface seen by the end user. The creation of this front-end, too, took a significant amount of time since all of the pieces needed to work together in all query scenarios.
Development proceeded over the course of several months after the initial idea was floated last Spring. Granted, it didn’t take several straight months of development time since there were gaps in between when attention was paid to other projects. However, in aggregate, it consumed about 4-8 weeks of focused attention by two developers, suggesting that Yahoo BOSS Custom projects require a considerable amount of technical resources.
Given how much time and effort it took, why didn’t we just roll out Google Site Search in the matter of a few days? Simply put, Yahoo BOSS is a more flexible and professional solution. Data gets indexed in near real-time, so there’s no waiting for search engine bots to re-crawl your site. We have the ability to associate non-standard data, like images and number of comments, with the results and then display that data in any way we desire. Users can filter the results by criteria instead of simply accepting the default search scope. We can assign arbitrary boosts to results, ensuring that they show up on top whenever certain search terms are used (see “layoffs” for our Layoff Tracker, as an example). And we can blend a variety of result types, such as blog posts and CrunchBase pages, serving them up in ways that make the most sense for our readers.
We are continually working to improve TechCrunch Search, so please drop us a note with suggestions and bug reports.









searching “sucks” returned 8,156,952
Ah yes, I forgot to mention in my post – Yahoo backfills the result pages with results from the broader web. So if we don’t have enough content about a particular subject, perhaps you’ll be able to find among the web results that appear after ours. We don’t, indeed, have 8,156,952 posts with the word “sucks”.
I rank #1 for my blog for now, until TechCrunch covers a story about it; then I’ll lose my #1 spot.
searching “spammer” and could not find myself anyplace.
searching “mylocator” and the results were immaculate. this TC searchengine definitely knows it’s stuff and appears to work beautifully.
CustomLocator.com – suit yourself
ROFL…
I remember meeting you at Startup camp.
I do believe you are overpromoting your company on this site.
mr mylocator, you searched wrong… a search of “irritating douchebag” turns up plenty of results on you.
hi leonid.
i remember you and still have your card. funny you mention startup camp. i used TC’s new search engine and found a result with a picture from the camp i never knew existed. http://wiki.sta...5#MyLocator.com
Pure Envy, i know you really like me and have a strange way of showing affection. Your the kid in school that never had a girlfriend in grade school only to figure out after graduation you never will. Frustrated and confused you lash out at the ones you really desire. there is one small problem between us: im straight and your bent. i do care about you as a person but that is all there can ever be between us.
RelationshipLocator – straighten out
Looks very neat. Definitely worth the time spent. The next iterations should be much quicker I guess.
Yep I would expect it to take less time for future partners to deploy BOSS since it’s a new platform and Yahoo has been working to resolve the largest pain points.
Do one need to pay for using Yahoo BOSS ? Wondering how Yahoo BOSS can earn by giving free use of technology and infrastructure especially of this massive scale.
Kudos Mark. It would be very interesting to right a post of the process you went through and what some of the hardest issues were. Happy Thanksgiving.
The one qualm I have always had with TechCrunch search, which is still there, is the inability to really break my search down by company categories. Example, then I search ‘Travel’, I would like to see every travel startup that has been written about by TechCrunch or indexed @ CrunchBase. For startup researchers or just general interest about compannies, this is what would make the search complete.
I’m actually very impressed also. I think Yahoo BOSS would be just what we need for our company’s search system.
What Mark didn’t mention is that he is one of the two developers who oversaw this project on our end. The other is Henry Work. Awesome job, guys.
nice job guys, pretty impressed actually
and regarding speed, i thought it was pretty damn fast..
Are all the options on the side build using BOSS, looks pretty impressive. UI can be improved font is not pleasing
Not too shabby.
The search is really fast and returns good results. Good job on that front. However, more work is needed on the UserExperience front. The search results look a bit off. Not as sharp in font and colors etc. Looks like the search is actually loaded in an iframe.
I think it looks awesome! The font is the same font TC uses on their blog. You must be an engineer, stick to your code buddy.
And you must be the yahoo coder who worked on this search. No wonder it is sinking like a stone with people like you.
stinking like a stone? are you serious? does your mom still dress you?
Techcrunch, or how to turn a blog into a powerful industry repertory. impressive!
Thanks for the insight on Yahoo Boss, I’ve been wanting to play around with the API just didn’t see too much out there on it yet.
Awesome! Crunchbase is definitely awesome.
good addition
Mark,
You guys did a nice job on the Boss implementation. I’d love your readers to have a look at our TechCrunch view and give feedback: http://www.view...echcrunch/apple There are some notable differences in UX with the two implementations that I’d be interested in hearing opinions on…
giovanni, viewzi evangelist…
Not bad. But too slow.
Oh and a column for relevance? or filter i.e. ‘Apple in title’ or something like that.
My browser hangs because your site keeps trying to fetch results using ajax calls. I hope you start using simple pagination functionality at least.
And about different views, as you know it yourself, it is nothing but different skins so I would suggest to include more functionality than skins.
But it is impressive until I start using it.
I would recommend some sort of better separation between sponsored ads and not. Something as simple as a horizontal rule or as ‘complicated’ as a second color if that’s possible
Wow, I’m impressed. Maybe not with the font. But impressed nonetheless.
Though I’m surprised Yahoo were not tempted to throw in a few ads in their results when they came up with this. Though that is one of the good things about BOSS. Which brings me to the piont; what’s in it for Yahoo?
Looks awesome!
Interesting to see you’ve decided to go with Yahoo Boss
Mark – I’m curious, did you consider Lucene? We’re trying to decide between Lucene and Yahoo Boss for a few projects.
There are several pros and cons to each. Would love to get your feedback
Lucene and BOSS are different beasts. With Lucene you can do you own index, but not blending of your own results with web results. You can do it yourself by doing query with Lucene and web (via BOSS API).
Hey it looks pretty sweet and very nicely integrated.
Do you get a rev share of the sponsored results and if so do you have to apply to be part of the YSM partners program?
Looks tight and you guys did a nice job.
Cheers – Eric
i see ads to the right on you custom BOSS engine, but the regular BOSS has no option for monetization as far as i know – it’s still listed as “Coming soon” – what gives?
Extremely well done, not just the search but the initiative of being the first to both cover latest tech news and implement innovative ideas in tech.
Congrats…
Yahoo is always a search dominates. Great here them coming out with something new. On my website I have with the help of google created mobile search able to search any mobile phone get ads and news. Just click on my name to go directly to my website.
Search Options work for me in IE but not in firefox.
http://search.t...=online+storage
Bad Boys.
Great innovation, love the new search! Well done!!
never mind, must be a intermittent issue on my pc, restarted the browser and it worked.
Not sure I agree with the order in which actual search results are returned (odd mix of new to old to TC sub-sites) but it’s generally slick so good work.
I won’t complain about the ads other that to point out they’re not very effective or competently-targeted. This is 90% Yahoo’s failure but you could tweak as well. Again, I don’t mind ads just don’t do a crappy job with them because not only are they annoying at present due to bad targeting, but you’re reducing the effectiveness of the ad unit in the future. Better to display no ad than an irrelevant, idiotic one.
Presently the Yahoo ads do not match up well with your demographic or content as a whole. The screen literally fills with links to British food products when I search “Crunchies”. Did a search for your buddy “Loren Feldman” and got ads for Feltman (with a ‘T’) baking supplies. Moreover the TechCrunch ads are now buried below the fold bottom-right. This only gets worse when you pop the search options panel.
It all seems screwed based on Yahoo’s lack of tech advertisers and its crappy targeting system. Yes, opening up this page on your site may increase the market for better advertisers and Yahoo will target better over time based on clickthrough. “Content-based” ads (like AdWords on blogs) almost always suck, but something on a search page like this should work a lot better.
I also think you paid the early adopter tax on this one. I’ll wait for Google to knock off the tech and for the Wordpress plugins to appear. Then I’ll get the same benefits but with better ad inventory and one day of dev work instead of two months.
I do not think the ads are from yahoo.
Why do you think Yahoo is providing this service? Search “Feldman” here and at Yahoo.com — same ads. No additional targeting given the wealth of data TC provided to them. FAIL.
@meanguy – I am not sure there is much inventory for term Crunchies? Google provides these page one ads:
1.
Hip Hop Abs
Dance your way to hot, sexy abs
and burn fat! Official Web site.
http://www.Beac...y.com/HipHopAbs
2.
Crunchies at Amazon.com
Low prices on crunchies.
Qualified orders over $25 ship free
Amazon.com
3.
Cadbury Crunchie USA
Chocolate, Flake, Roses, Candy
& Tea from UK, ship anywhere in US
http://www.engl...ore.com/cadbury
4.
Crunchies
Crunchies Online.
Shop Target.com.
http://www.Target.com
5.
Australian Cadbury
Yowies, Strawberry & Cherry Ripes
Old Jamaica, Snack, Turkish Delight
http://www.Abou...tralia-Shop.com
Are you suggesting that Yahoo should build a profile and then have targeted keywords – if techcruch.com has a search for “crunchies” then serve up terms related to keyword X, Y, Z ?. Part of targeted PPC is that the user was looking for something specific and therefore the chances of them clicking on the ad and being a more qualified lead is higher. So I am not sure how one addresses your comment – be it Google, Yahoo or anyone else.
Cheers – Eric
Since the TC content has lots of references to “Crunchies” that do not reference chocolate, Yahoo can avoid showing ads for chocolate in response to Crunchies searches.
Since the local, domain-specific text content supplied to Yahoo references “Crunchies” in regards to startup and venture funding but not chocolate, Yahoo can substitute different ads using those keywords that are already in the blog feed.
For obsessives or valuable verticals, push the burden to content creators to provide blacklists and keyword substitutions. Sites look at their top 100 searches and fix the ads manually. Re-tweak every week as new advertisers come online to the Yahoo network. What a life.
Basic stuff. I’m sure it’s on page one of the design document. Might even be hidden in the product. But the implementation at present sucks.
I guess they could use the url on which search is performed as a hint for context instead of just key word – this probably not built in because BOSS is new and I think there is still some discussion on whether BOSS monetization is via ads or payments.
They cannot use search results to fetch better ads because that would introduce too much latency.
@meanguy – If you push the burden to the content supplier then you open yourself up to a lot of abuse. Type in the keyword “Crunchies” and get “Asbestos Lawyers” that payout at 25 bucks a click. This is basic black hat 101 – I don’t think Yahoo is going to fall for that.
In any event I guess it really is that Yahoo is too dumb and don’t understand a thing about search and/or the guys at TC are to incompetent to RTFM. Anon-armchair advice is worth it’s weight in gold for sure.
Cheers – Eric
@qwery: The whole point of the service is that they perform searches over a list of pages you supply. So while a search for “Crunchies” at yahoo.com might guess chocolate, a search here should not.
@Eric: You’re mixing up SEO and ad targeting. TC controls the content catalog so no “rogue” sites are going to appear in the results. For ads, Linkshare and others will give you all the ad units you want for any keyword. Cherry pick the valuable ones if you like and serve ‘em up. If your readers aren’t interested you don’t make any money since it’s paid per action.
@meanguy could you point to a site that does what you are talking about?
Cheers – Eric
Amazing. Their BOSS platform has really recreated how a webmaster can incorporate search into their website.
Yea, it’s great for a “select few” partners. As for the millions of developers out there this is useless.
I like the idea behind this, and it appears that you have good intentions, but when I searched for my website it came up with some really old results and out of date information, I’m sure things will get better though, good job on this project! All the best from Icy Cold Vancouver Canada!
Adrian Eden
If you use Firefox you can get results from the TechCrunch Network each time you search on Google by downloading the search extension here:
http://www.webm...techcrunch.html
The latest version of the WebMynd extension lets you personalize your Google searches with the information sources that you most value. So we’ve set up a version that shows TechCrunch results by default, but you can also get results from Hacker News, YouTube, Flickr, Wikipedia and many others.
Wow, two months.
Seems like a pretty in-depth project.. I built this in 10 minutes
http://www.liji...photo+arrington
http://www.liji...itter+arrington
You guys can get this from Lijit for free (BOSS is not free or you have to take Yahoo Ad’s which convert very poorly) and we can provide XML results, you control the ads, we provide an entire stats package, etc..
At least with Google CSE you can use any Ad package.
BOSS is good solution if you have the time and are Ok with only a handfull of URL’s to index. Its not quite prime time unless you spend a lot of time with it.
Todd Vernon
CEO Lijit
I don’t know what you are using for the search, but the relevancy sucks Compare the result for “yahoo” on Lijit and Techcrunch
http://www.liji...log&q=yahoo
http://search.t...ery.php?s=yahoo.
I’m one of the engineers who works on Yahoo BOSS platform. The current BOSS Open API that is available to the general public is pure web search with site restriction, similar to Google CSE.
TechCrunch is actually using the BOSS Custom platform which indexes the partner’s structured data in realtime (2 min delay), provides Guided Navigation (fancy refinement menu), and allows custom relevancy tuning. Furthermore, for the backfill, the partner provides us whitelist/blacklist of other sites to be included/excluded in the blended web results. You get all of these for FREE plus monetization thru Yahoo ads. Granted the ad platform is not integrated with our search platform, thus not using the domain specific knowledge yet.
I agree with Todd Vernon that one can build a site search using Google CSE and Yahoo BOSS Open api within hours. I should know this since I built a relevancy comparison tool which searches all of these search engines plus BOSS Custom at the same time. BOSS Custom shows way better relevancy. It’s a pain in the neck to parse Google CSE’s results, doesn’t return XML, not easy to integrate with your site’s look and feel. Both BOSS Open and BOSS Custom return XML results.
LOL, so much for Yahoo being “open”. How can others get access to this??
You guys really need to develop have products & services that can scale to the entire web population if you want developers to give you more love than Google.
Google Site Search does provide an JSON/XML API for search results.
Nice work guys! Am curious..how real time is the indexing? (I will also be searching to see how soon this comment shows up in search
)
About a minute delay
Really cool, I am suprised Mr. A with his yahoo hating would want all of his sites searching to be powered by a “doomed” company….
Mark thanks for the detailed overview of implenting this. It does sound like a lot of work to get it working really well, but I agree, if you have the time it is much better than Google Site Search. The results work really well!
Do you get a cut of the sponsored search ad revenue?
Why did you pick Yahoo instead of http://www.goog....com/coop/cse/?
If it took you guys months, I guess I should plan to do it in years…
That looks impressive.
Don’t you need the clickurl to be used in the results?
Do you plan on releasing the wordpress plugin(s) that you have created? I can imagine having those would save others a lot of work.
Google’s Search is the best engine.Why did you pick Yahoo?
Does Techcrunch and Yahoo have some type of financial relationship now that should be disclosed? Did Yahoo pay you to use their search instead of Google?
Nice information. But i am confused.
Is Yahoo Search Boss paid or not? Can i put the new search engine in my blog or site? Is there any cost to pay?
I learned about your new search the hard way. When I went to find a TC article that I read at some point in the past, I go to your search box and type in a 2 or 3 word phrase that I can remember from the post. This technique always worked in the past. But it doesn’t work any more- the results are crowded with too much junk.
It would be nice if you restored Wordpress search as a secondary option. Despite its obvious limitations, the Wordpress feature was very adept in this way . Did anyone at TC bother to do some user research to find out how your audience typically uses search? If they are like me, the new search doesn’t really work. I will never use your search option (not will anyone else) instead of Google for general searches, so a broader set of results isn’t going to help anyone.
Using the Google, enter
“type query” site:techcrunch.com .
It always works for me.
Did you throw out Eurekster Swicki?
@mustafa (old style reply since ‘reply’ link was missing)
This is what you get on Google CSE help pages. I think it explains very well why people choose BOSS over Google. BOSS is free (I don’t know about the custom one, someone here should enlighten us).
Also there is no getting away with ‘powered by google’ (see terms of reference). There are plenty of ‘BOSS’ sites where the word Yahoo or the word BOSS is not mentioned.
quote:”Google must be given attribution using the appropriate text or brand elements when Google results are shown”
http://www.goog...arch/index.html
Pricing of CSE (in pages, annual):
<5000 – $100
<50000 – $500
<100000 – $850
<300000 – $2250
more – contact sales
you can also just use google to search techcrunch. open a browser, go to google, and type this search string:
site:techcrunch.com “social network” OR “social search”
you can change your keywords to be whatever you want just so long as you have the site:techcrunch.com command at the beginning. alternatively you can search crunchbase with a similar command site:crunchbase.com then enter your kewords.
Fantastic
since you’re a news site, the article’s posting date should be more present.
Yahoo should partner with WordPress, Expression Engine or other CMS providers.
This is awesome. I am liking it… better than anything else…
It’s looking nice because images are also coming parallel but bad thing is lots of banner adds coming in right hand side.
Any ideas re. how to determine which companies are planning layoffs BEFORE
the actual layoff is announced?
This comment is automaticlly submitted by WpSender2.1 Soft.
$20 per copy. My email is 12515842@qq.com。