February 14, 2008

Topper Makes Search Complicated In The Name Of Progress

Duncan Riley

44 comments »

topper.jpgTopper search offers a search service which….um…machine learning….um something…..results.

Here’s the pitch anyway.

Here is a quick description of how to use it. Enter a search query. Topper Search will go out and get the top 100 results and display the top 10. Select at lease one ‘+’ and one ‘-’ result. Topper will use machine learning to build a model and classify all 100 results. Use the buttons in the top bar to redorder these results to your taste . Topper will then display the a different top 10 based on your model.

It is great for building ad hoc models for a single search. Try it for disambiguating a query. The stock example I use is ‘lions’ (for the football team or the animal). It can also be used for multiple searches and has a simple persistence mechanism for later use. For example once you have a model built for ‘lions’ try a search for ‘bears’ or ‘eagles’.

This is just the tip of the iceberg. We are building out tools for blogs and news etc using similar machine learning tools.

There’s a PDF on Topper here for those interested.

It sounds cool, but I’m not a coder and this sounds rather complicated. Using it is worse again, after the slow delivery of search results you’re asked to + or - results in an attempt to create better search results (sort of a poor mans Reddit, but more complicated and without the kooks.) Developers and coders may find this interesting so ultimately those of you reading this who are that way inclined can take a look. For the vast majority of the rest of the planet, it comes in on the usability scale at between ZOMG and FAIL. Still, it might take off, maybe the world is looking for a complicated version of Google :-)

topper1.jpg

  • Sphere It

Trackbacks/Pings (Trackback URL)

Comments

RSS feed for comments on this post.

  1. Benjamin Parr

    Two of the golden rules of website design are to not force users remember anything (or almost anything) and to make it simple enough for a gorilla or a pidgin to use it. AKA it should never require instructions.

    Google’s popularity was built on the 60 words or less homepage. Topper…well…all it has is a simple (but unappealing) homepage.

  2. Tim Cinel

    I can see why it’s getting a negative response, but not all websites have to be simple. Some people will appreciate this, I know I will. It’s really useful for queries that use somewhat generic terms, even though you’re looking for something specific.
    Thanks for introducing me!

  3. Jim

    This might be a great idea, but I see a massive downside if this would take off: it seems to rely on the underlying index of other search engines.

    Plus, personally, I don’t want to take time to teach a search engine what are good results for me. I suspect there are a few others that feel the same way.

  4. shams

    I agree with @1, homepage is not convincing enough :-) .

  5. ben

    That was the funniest thing I have read this week. ZOMG and Fail… classic.

  6. max

    I always wondered why search engines give no option of marking results as incorrect. Kind of logical, don’t you think?

  7. duncansmom

    I have a better solution. Go to the bash prompt and type python. Sit for 10 minutes and get creative. Same amount of BS. Perhaps even better search results. WTF. These idiots really came short of including an interpreter and letting people write the own queries. Oh how exciting… NOT.

  8. Greg

    Wow, this is almost exactly the topic I worked with for my master’s project several years ago. However, I never did implement it for a couple of reasons:

    1. The interface design, as mentioned above, can put too much burden on the user, when the user just wants to find something simply and quickly.

    2. There are a lot of searches where this kind of approach doesn’t actually improve the results. Any search that has polar groupings (like “abortion”, which would have pro-life and pro-choice users “pimping” their viewpoint through the search engine) or that has multiple possible correct answers (like “jaguar”…are they searching for the car, the animal, or the NFL team?) doesn’t see any substantial improvement based on user feedback.

    It seems like a good idea, and I’m sure they got some funding for it. However, I spent a year and a half learning that it just doesn’t work very well. Machine learning can figure out a lot, but it can’t judge intent.

  9. Scott

    Dear Topper - What the hell is a “Hat” and why should I care?!

    I did a search for “mercury” (great test of a search engine’s disambiguation power). I got no results for mercury as a liquid metal in the top ten, so I voted all of the visible results down. Nothing interesting happened beyond that, so I’ll probably never visit the site again.

  10. Marcin Grodzicki

    This thing is incredibly complicated for such a simple feature but I must admit that I got far better results with it than I got with simple googling for the past two days.

  11. Solo Programmer

    Do people really not know how to disambiguate their serches on their own? Really “lions nfl” is not that hard to figure out.

  12. GoRilla

    Didn’t Salton work on stuff like this in the 1970’s? Originally people didn’t approach it from a ‘machine learning’ viewpoint, but rather they perturbed the document vectors towards the + documents and away from the - documents.

    I don’t think it worked all that well back then, and this method doesn’t address the fundamental problem of web search: adversarial information retrieval. At least half of all web pages are spam (in the time I wrote this, a black hat made 10000 pages.) Job #1 is to get rid of the spam without losing all the real content. If you can’t do that, all of the gimmicks in the world aren’t going to make a useful search engine.

  13. jb

    Better and easier: http://clusty.com/

  14. William - Neutralize

    I think the website doesn’t implement this well and it will never take off because it is too techy. However, it is a very cool feature for the geek inside you!

    The use is very clear, but i think a major downside is that you have to select an upside. If it could somehow do a similar thing simply, and automatically filter the results (a slider feature is just too much) it could probably do reasonably well in the geek side of the net.

    I think a major downside to this product, is its not using enough information from the websites (or at least i dont think they are) to do the results. Seems like only the title is being used, which is lets be honest pretty damn rubbish!

  15. Tual

    This is an interesting area for research, but the usability is not there for the general audience at this time.

  16. I Am Not Posting To Spam My Blog

    Extremely important and relevant question here. When you read “ZOMG”, do you read it as “Zed o em jee” (or “Zee o em gee”, I guess), or, since it isn’t wholly a proper acronym as the Z doesn’t stand for anything, do you read it as one word, somewhere between “Zong” and “Zom”?

    I read it as one word, personally.

  17. Fabian Schonholz

    I just tried to use this THING. I had no idea what to do. I am sorry to say, but … unless they rework the UI this is a non-starter.

  18. New Fast Browser

    Looks like it is the begining of the end.

    TechCrunch today has 2 BAD news Posts

  19. sourceroot

    Realllly , the complexity is a bit unnecessary .. Yahoo, used to count intent scores based on visitors leaving and not coming back to the same search. I imagine that still weighs in at some level. but an integrated dynamic Bayesian logic algorithm (which would make a snappy slogan for Topper) into a serp is … retarded?

  20. AK

    Who would realy want to sit and tune a search engine & hope it would work fine for you in future!
    I compare search engines to inquiry booths. I dont want to keep analysisng and teaching the person in the inquiry booth! I just want my answer or direction to the place where I can get the answer!

  21. Matt

    @ duncan: lol @ between ZOMG and FAIL…. the master!! duncan you are awesome, man.

    @ #1… i doubt a pidgin [sic] could upload a video shot on his mobile phone to you tube then commence to linking all his friends to it via a proverbial cacophony of social networks and instant messaging services… then again, these things break the “golden rule” so will never really take off in popularity… btw, jakob nielson called… he wants his narrow mindedness back. ;)

  22. Andrew

    did they spell “lease” instead of “least” in the “Select at lease one” or did TC just typed it wrong

  23. CanCar

    Topper models can be applied again & again. Topper doesn’t simply categorize your results. It learns, in real time, from your likes and dislikes in order to filter out results that you don’t want or consider spam.

  24. Topper

    Ok. I will try to address the complaints.

    1. Simplicity - yes there is additional complexity. The plus/minus buttons are similar to thumbsup/thumbsdown on reddit or other such collective opinion sites. Really the only addition is the pseudo-slider at the top. This does make things less than obvious but we are hoping that the benefits of personalized search outweigh this.

    2. Already been done. The difference with Topper is that it is not simply classifying your results based on some global categories (e.g. Clusty). Topper allows you to make up the categories as you go along. As you build your category there is instant gratification that it is working because the learning is actually done in your browser. This is something new. If you can find another example of real-time explicit personalization of web content please let me know.

    3. Privacy. Other personalization attempts have failed mainly because of privacy issues. Topper both learns and classifies without sending any data back to us so there are no privacy issues. Users have to opt-in to give us information that they want to share with others.

    I concede that Topper may not be for everyone. There has already been large acceptance of voting on content however. These votes usually fall into some vast pool to determine the global popularity of something. Our thought was to build a system where those votes can help you immediately to filter out irrelevant content. Yes we are still experimenting with interface. Yes it is not perfect. We are, however, trying to push the envelope of how people can filter content on the web. Often I wish I could apply a Topper model to comments on some popular blogs that would fit my personal taste. I would like to filter out the negative comments on this page. However I wouldn’t want my personal bias to act like censorship across the board. This personalization of content is what Topper is trying to achieve.

  25. i am under age

    i just tried to use it, it is slow and i can’t get a search. I had no incentive to go to the next page because it takes so long to load compare to google.

  26. Topper

    I agree that it is slow. It is a demo built on top of Yahoo Search results. Topper is relying on them for the top 100 results for each query. If the traffic gets too high in the course of a day they will cut off this demo btw. I would like to try it out over Google search results. Unfortunately they have restrictions on reordering their results. I don’t know how they would feel about someone filtering their results to taste. Alternatively we are looking into filtering other content other than search in the same way.

  27. Christopher Smith

    Seriously, have you guys actually taken a second to grok what this thing does before commenting? This isn’t some way of learning what are good and bad search results in general (search engines already do that). It is about customizing search results for a particular user, immediately.

    I do think the current search engine experience has kind of biased people’s expectations. There is an expectation that you spend a microsecond typing in a query and then a lot of time combing through the results and that is how it is. To me Topper is a tool that can speed up that second part. Yes it requires some more interaction with the search engine (although if you save your hats I imagine that becomes less necessary over time), but it search beats reading through pages and pages of results to find what I’m looking for.

    The UI could definitely use some more work to make the experience smoother, but as a concept, it is a great idea. One along the lines of what I’d been hoping to see for a while now.

  28. EH

    For me, the #1 unmet need in commoditized search is the ability to blacklist domains. Give the user tools to remove nextag, experts-exchange, cooks.com, bizrate, etc. I See A Great Need.

  29. Topper

    #28 - try building a Topper model for it. You should find that it works quit well. Filtering spam is just one possible use however.

  30. skline

    As a gadget freak I was able to toss aside non-gadget listings associated with ambiguous words like Ocean or Mogul to get the good stuff within 6 feedback clicks. Client-side only is the only way to go. I don’t need the search engines knowing my preferences. Already made it my new default search engine.

    Haven’t published any “hats” to the global pool yet, but thinking about it. I am working on a Lebowski “hat” that will send Lebowski related results for general terms right to the top. Does that mean I easily amuse myself and have way too much time on my hands? Maybe.

  31. Bottomer

    Dear Mr. TopperSearch,

    Did you submit your (gramatically poor) pitch to TechCrunch or did they find your site? I just want to know how these humiliating posts keep happening and how to avoid it when I launch my site!

  32. Topper

    Nice goad. I assume you meant ‘grammatically’ unless that is a valid alternative spelling. Yes, I did submit the text to TechCrunch. This seemed like a good idea given they have treated new innovation favorably in the past. Without any glitz and polish I did expect the worst. Keep in mind this is a demonstration of an idea not a company announcement so I was being informal. More demos, that may or may not be more appealing to this audience, will follow. I appreciate all of the feedback that I have received so far - no matter how banal. If you are interested in funding these projects and ideas please let me know.

  33. Markus Thomson

    Topper,

    Don’t listen to the negative jerks around here. Keep on coding and refining your idea(s). People can be so jealous, immature and petty when hidden behind a keyboard.

    Search mashups are so rare nowadays but search is light years from being perfected. I wish we saw more of them.

    Here’s an idea/suggestion:

    Why not apply your algorithm based on click-throughs instead of +/- buttons? If I search “lions” and my first click is on a page regarding the football team, then you can apply that filter if/when I hit the back button to return to the SERPS page (via Ajax presumably).

  34. Topper

    #33 Thanks for input. We certainly can instrument it via click-throughs. That is easy enough. We borrowed the +/- idea from the various collective opinion sites thinking it would be obvious for people as what to do. It also makes the task a little easier when scanning through documents then having to follow a link. I imagine that as we apply Topper to more and more content areas using ‘followed links’ might be a natural evolution.

  35. zellius

    I’d like to disagree with others and say the interface is actually quite cool. I love the instant fading/bolding of search terms as I filter because it lets me see the consequences of my actions immediately. It isn’t too complicated to use, IMO.

    However, it does seem to me like a solution looking for a problem. Like a previous commenter said, I can disambiguate my query to google very easily (and I’ve gotten quite good at it). Even if Topper is perfect at disambiguating based on my clicks (which it never can be - since its a machine and all), it would still take me longer than to just type some disambiguation words into Google.

    Still though - its a pretty innovative idea, and perhaps for others (search engine newbies, foreign people making english searches, etc) it could be useful. Plus, brownie points for actually reading and responding to criticism about your site (don’t feed the trolls, though).

  36. Bjoern Wilmsmann

    Splendid, this sounds very much like a concept Karen Spärck-Jones (http://en.wikipedia.org/wiki/Karen_Sp%C3%A4rck_Jones) wrote about about four decades ago.

  37. Topper

    #35 I agree about the disambiguation with Google. Topper is valuable as a search tool only if we assume that expressing what you want by selecting examples is as easy as augmenting the search query. Many times this is not true. I built Topper over search first because I thought this would be the easiest form for users to grok what it was doing. It is unique in its ability to provide instant gratification in just clicking on what you like. I do think that applying it to non-search content is more valuable simply because there is no query language to provide filtering.

  38. Jack Sauce

    For me, this was a sexual experience. First off, Topper? As in Top Her? On top of her? Having anal sex with her…I think we all know what kind of “research” this is.

    On the other hand, this is a very cool app and something that seems like should already exist. The interface is fine. Are people so retarded that they can’t click + or -?

    Good work.

  39. tz

    Doesnt google already do this by using the bounce rate different results get for different queries? If they’re not then Topper should us this instead of the + - buttons since it’s virtually the same thing.

    No bounce == +
    Bounce == -
    No click on result x which is higher than clicked result y == x–

  40. tz

    Tried it out, seems to be kind of useful. Don’t think most people who complain have actually tried it. It was fast as well.

    But please consider the bounce rate model instead of +/-. It’s hard to set a +/- without opening a link, and it requires unnecessary interaction.

  41. Matt

    who says they can’t have the -/+ AND the clickthrough=+ (minus can act in servitude of “learning” the users “blacklist”) … then those who dont do the -/+ thing can gain te benefits of + while those who want to use it more hardcore can use the minus to teach hate and the pluses to issue love in batches…

    new idea = cool… and like he said.. test demo, not company… interesting. awaiting follow up article that starts “In Feb. I reviewed Topper…”

  42. Cem

    Topper,

    > 2. [...] This is something new. If you can find another example of real-time
    > explicit personalization of web content please let me know.

    We have something very similar to what you guys are trying to accomplish. Take a look at http://leaptag.com

    Our client version starts with a set of results obtained from few search engines. Our current version on Facebook takes a slightly different approach; it uses on our own feeds. In both, we evaluate/filter results based on ‘thumbs up/down’ feedback.

    BTW, it looks like you use SVM with RBF. Are you satisfied with the performance?

  43. Topper

    Ok - thanks. I think downloadable extensions will not be universally appealing which is why we developed Topper. I can see how the intent is similar but users should have privacy without adding additional software to their computer. Topper loads dynamically in the browser whenever the user needs it. The goal is to be lightweight, transparent and ad hoc.

    Also, no, we do not use an SVM with an RBF kernel. While RBF can be fast enough to support real-time learning it does not work very well with small amounts of text. I’m sure you have observed the same issue. Topper does contain an SVM method as one of its tools but this is a proprietary version based on SMO designed specifically for text and real-time training. Topper contains both supervised and unsupervised learning methods that will be applied to different types of web content. What is shown in this demo is simply for search.

    I would like to know more about what your group does. It certainly sounds like our interests are similar. If you are interested in speaking further please contact me at info@toppersearch.com

  44. Jarett

    I’m a developer and a coder, and no, I don’t find this interesting at all. It seems like they were too lazy to write an algorithm to pick out relevant search results, so they have you do the work instead. Classy.