
After months of negotiations and holding both off at bay, Twitter now has agreements with both Bing and Google to give them access to its full feed of public Tweets. Both search engines have been yearning to drink directly from Twitter’s the realtime firehose of micro-messages and all that they carry. A rudimentary version of Bing’s Twitter search is already live, and it will soon add public Facebook updates to its search results as well.
While financial terms of the deals were not disclosed, full access to Twitter’s data stream is very valuable to both search engines. Depending on how much Twitter was able squeeze out of Google and Bing for these licensing deals, they are likely to provide its first major source of revenue. (Imagine, if they have to pay by the Tweet).
Tweets and other realtime data streams are valuable to Google and Bing because for many types of searches (news, events, sports, stocks, shopping, etc.), the most recent information is often the most relevant. And it’s hard to beat millions of people Tweetng out their thoughts—the “pulse of the planet,” if you will—for realtime information about every subject imaginable. Google and Bing need access to this stream of data if they want to keep their results fresh and relevant.
Up until now, they had to try to index Twitter’s site selectively by concentrating on high-profile Twitterers like celebrities. Twitter wouldn’t let their robots gobble up and index every Tweet because its servers wouldn’t be able to take that kind of pounding. But Twitter didn’t just want to hand over the feed of all of its public Tweets (the firehose) to the search engines without getting paid for it either.
Now that Google and Bing are getting the firehose, it could have a big impact on search results. For the search engines, the firehose is much more valuable than any single Tweet. They can index it and sift it, looking for patterns and spikes in keywords and shared links to get a better sense of what people across the Web are paying attention to at any given moment. This data can then be folded back into regular search results, even if the top result isn’t a Tweet.
For example, if a link to a post about healthcare reform on an obscure blog suddenly gains currency and is retweeted hundreds of times, that is a signal to perhaps rank that link higher in searches about “healthcare reform.” If people stop Tweeting about it, then maybe it goes down in the ranking. But Google and Bing can use the firehose as a rich source of signals to mine and then blend back into regular search results.
Of course, Tweets and other micro-messages will become part of results. And how the search engines display them and rank them will also determine how relevant their results are. Here is where it gets interesting because realtime search is a hard problem that has not yet been solved. Do you show the most recent, random Tweets first, or the ones with the most authority? And how do you rank a Tweet? We already have PageRank, but what we now need is StreamRank.
Many startups are tackling this problem, as is Twitter itself. And now Google and Bing can try their hand at finding the most important bits of data in the firehose. The results should be a more relevant, faster feedback loop between data appearing on the Internet and the search engines finding it.
Photo credit: Flickr/ZeroOne









The patterns found in the real time data are important but I have a few other questions about these deals which I haven’t heard addressed anywhere yet.
1. Has Twitter given Bing & Google access to the full ARCHIVE of tweets as well as to the ongoing Firehouse of new tweets? I’d argue that there is a great deal of value in the full archive (including in fact in the spammer’s accounts – where the face that they were flagged as spam is valuable data about the sites they were spamming on behalf of – negative data but valuable data nonetheless)
2. While what seems to be most discussed re Twitter (and Facebook) are tweets w/links to something and/or tweets which are “retweets” (or both) I think there is also, if harder to extrapolate, a great deal of value from concurrent tweets especially when correlated with events happening in realtime (i.e. all the tweets discussing a given TV show as it airs, a piece of breaking news or a live event) This includes tweets from folks “on the ground” (in the case of events) as well as those conversing with the folks there (think the reactions to the tsunami and other natural or manmade disasters)
In realtime search engines with firehouse access may soon be able to do some interesting stuff alongside of monitoring realtime search queries – to identify emerging trends & events (i.e. if a bunch of tweets start to correlate with an uptick in search queries for a given set of phrases perhaps something is happening – a tv show’s season has started for example)
The Indian search engine company Yauba, http://www.yauba.com, has been doing this for awhile now.
Really? Yauba has access to Twitter’s XMPP-based feed?
Yauba is just a custom google search engine.
yauba looks cool, i see they have both twitter and identi.ca results. the identi.ca results go back 2 months while the twitter results seem to go back only 1 day. i hope bing or google can make the entire archive of conversation on twitter, identi.ca, FB etc. searchable.
Nice article.
And very interesting thoughts you mentionade. About 1, I think that you have to consider that Twitter has limited stored resources. And in the recent past they have even less. So they might delete the spammer twittes and acound data to save space. So maybe even they don’t have that data. But if Google and microsoft had access to this data, they could use it outside of search as well. For instance, consider internet browser security. If a user tryes to access a site that had been marked as spam on twitter the browser could warn the user.
Regarding your second thought that information would be essencial in any breaking news situation, like when MJ died or the Iran election etc. But with sentiment engines, both MS and Google could provide real time data about peaples toughts about a show thats airing or a movie that just hit the theaters. That could be MUCH more valuable than simple viewers share data, on amount of pages views in a site.
The potencial to extract diferent meaningful information from the firehose is limited only by ones imagination and the funding the get.
The archive of tweets isn’t really relevant, because that stuff is technically already crawled by Google and Bing, just not in real-time. The value is tapping into the stream in real-time, not whether or not they can actually crawl the data.
Actually the archive is pretty important and no, it isn’t well indexed by anyone. I, for one, would love to be able to search for & see my first tweets but at present that doesn’t appear to be possible (since I have vastly more tweets than the 3200 at which the Twitter API’s cap out on returning for a given user’s history.
As far as I am aware, Twitter does not allow mass crawling of tweets (and in any case unless an individual status update was linked to directly externally to Twitter finding the right link is quite difficult).
The value is BOTH in the mass archive (if Twitter allows access to it) and in the ongoing, real time (in the actual sense) firehose of new public tweets.
The mass archive, assuming access to it was allowed (and as I noted it may/may not include spam) could allow for large scale analysis & pattern tracking – as well as historical usage & trends/patterns over time (for example establishing a user’s normal usage which, in turn, could be a flag if that usage changes considerably – I’m a 10+ tweet a day user but at some conferences that has gone up to 200+)
“since I have vastly more tweets than the 3200″
Why doesn’t that suprise me?
There are already plenty of tools for real-time search and trending (although more innovation is possible). If Twitter is seriously about search as people claim it to be, the full archive of Tweets should be available.
Twitter is more than just “OMG i’m late lolz”. Some people are posting useful links and using it to replace or supplement their blogs. What a pity it falls off the edge after 2 weeks.
Real-time search could flood my page with garbage.
garbage in, garbage out, apparantly.
Remember this day. This is the day Twitter lost.
How is this a loss?
This is the day twitter I would hope, found a revenue stream.
Google & Bing both potentitally using them as a part of their algorithm and search offering? Yeah that would suck…
No, it’s exactly the opposite. This is the day that Google and Bing lost. The last thing I want in my search results is ANYTHING from Twitter. I’m not interested in pop culture or the feeble musings of a twitterer.
…and get off my lawn.
What’s your deal? At least put a good sentence together that is constructive.
Awesome!
Can’t wait for a ton of useless crap in my search results yay!
+1. will induspitiably prove that the emperor has no clothes.
This is no surprise, it was only a matter of time
Poor Kanye West.
http://www.kayn...stisnotdead.com
-kpaul
http://www.kany...stisnotdead.com even…
Scoopler.com is doing some cool real time search stuff….
Who cares? I don’t understand what the hell kind of kool-aid people are drinking in silicon valley. Who wants a bunch of random ass peoples status updates in their search results? This entire obsession with “real time search” is absurd.
haha. Agreed!
totally agree! what are you searching for? people’s clever takes on current events? links being shared? i honestly don’t get “real-time.” i get the technology, just not the utility.
The kind of person who writes a sentence like: “Both search engines have been yearning to drink directly from Twitter’s the realtime firehose of micro-messages and all that they carry.” Besides the lack of the most basic grammar skills, one senses infantile fandom rarely seen outside teenage myspace pages…
You want to make money on the Internet but you don’t want to listen to the people using it.
Put down that crack pipe my friend, this news is hardly worth all the hype you describe.
I think authority will play a large role in ranking real-time results. Followers is not a good metric, but retweets, mentions, favorites, likes etc will play a role. So will participation and popularity across multiple social properties.
I agree. They need to find the formula where the information is valid.
+1
we have a real time search topic twitter widget for about a week now. works great. who knew we would beat google and bing to the realtime punch.
-tweets regarding a search term are time consuming to watch not very efficient for niche location based content retrieval.
-marissa had it right search is 95% solved. will more twit clutter enhance user experience?
-the google killer will be a location based language not an algorithm.
Ha ha, are you still doing that stupid location based stuff.
Today is the day SEO died.
SEO is constantly dying for years as algorithms and metrics change. Probably any metric can be gamed I don’t see why twitter would be the exception. If tweets will really influence the results I expect many many more bots, tweeting, retweeting, adding noise in the near feature.
This is a truly exciting development. Years from now, we will look at today as the day web search got transformed. Some of us have been working hard at real-time search for several years. Our creation TipTop at http://FeelTipTop.com, a semantic Twitter-based search engine, is now available in a beta version. Please give it a try.
I sure as hell don’t want to start seeing useless tweets in my search results…
This will be a lot of info to process for the servers. There night be a lot of personal stuff searched through as well. Good Luck
open warfare. hope we aren’t collateral damage
I’d really like to see google or twitter or who ever come up with an interesting visual system.
Realtime streams are very interesting. I think the company to get visualization of realtime data will be a big winner… too much out there to take in at one time but if there was a clever way to visualize it… just like with realtime data in information security… that will be a huge winner me thinks.
Startup guys attack!
http://www.trendsmap.com/
Just launched this a couple of weeks ago.
Substitute firehose for crackpipe and you may have a story.
I’m shocked at the number of Tech-Crunch readers who can’t see past their noses on this. Guess they’re the same sort who once said ‘why would anyone want to read a bunch of HTML pages linked together online?.’
Think of any industry which uses real-time data – news, finance, health-care, education – the list goes on. Then index and show real-time information in relevant searches e.g. most-read news, trending stocks, local/global disease patterns, the most-used education resources.
Google and Bing aren’t going to just throw these results into a standard search, they’re going to use them in very clever ways which ultimately adds massive value for everyone. Get ready for the ride.
Keep guessing.
So, how is this different from being able to search every IM ever?
In other words, this is useful because _________.
His suggestion of a streamRank is a system or algorithm that increases relevance of data subsets in a specific point in time.
search is predicated on the fact that you already know what you’re looking for…
so an example could be and I think Evan talked about this at TED as soon as an earthquake hits in China that subject becomes more relevant and is potentially pushed to the top of a list of things you might want to find out about…
The tyranny of the crowd begins.
“Twitter wouldn’t let their robots gobble up and index every Tweet because its servers wouldn’t be able to take that kind of pounding.”
This make sense cause sometimes I use google to find a profile of a brand. Like say a denim brand like “adriano goldschmied twitter”
They are 3rd in the list when they should be #1
Does Twitter really represent the pulse of the planet ( and therefore google is out of touch )
Recent Wired magazine on this subject :
http://www.wire...-10/st_thompson
and a demonstration this may not be the case
http://onehands...ld-view-20.html
how soon is now?
we need to look beyond this and develop what’s really useful for people and, importantly, business — and that’s social-casting, or whatever tag you want to give it.
we need to find ways for this “data” stream to predict what will happen in the near future based on patterns from recent past. the data are there. do it! if i was a developer i’d do it ;-P
Who cases about real time? Think about the power the first search company that develops future time search search will have? Imagine knowing what people are twittering before they twitter it….
Haha nice, I’m drooling over the prospect…
Yauba.com
is just a custom google search engine.
A beautiful design
I want to see my first tweets right now from 8 months ago. How do I do that now?
I also want to find the Twitterer who replied to me about a certain subject some time in May or June, I don’t remember the time.
How do I find that?
This ability to search for our own tweets/updates and remember who replied to us (and who has RTd us the most) is where I see the biggest personal gain for Internet users is.
So tell me please…how can I search for that reply someone made to me last May or June?
Do you think that 10% of Twitter users who do 90% of the Tweeting should be able to affect how search results appear?
What comes after real-time?! Minority reports?!
Get Ready For The Firehose of Spam.
The best way to spam is now Twitter.
Spam on Twitter and both Google and Bing will reward your spam efforts.
The spam floodgates are open: spam on Twitter and get free links on Google and Bing.
All I get from bing is “Twitter search results are currently unavailable.” error message.
this is def interesting for improving search results by by parsing data (links, hastags, keywords numbers addresses ect) from the tweets to improve ranking and relevancy in real time….. SEO be damned
also interesting is how it will improve longer queries which usually turn up questions/forum threads and with enough ppl in the world tweeting theres bound to be some tweets worded exactly the same as your long ash search query might be “valuable” to provide that inline with old school search results but other then that i don’t see a practical reason to display them in line
as far as tweets being displayed as search results….
you can have a character at the beggining of the search query such as # $ * to represent the intent of wanting realtime search results instead of regular search results
you can also have a tab similar to the image search
as far as displaying them obviously there will be a need for users to set/adjust criteria to help filter and present the most relevant content such as:
- time (most recent specific date maybe a slider to represent hours, days months ect)
-tweets with the most retweets?
-authority people who are being retweeted/copied the most in real time?
-location – originating from or about/nearby/around a location
-of course you could always search the search results by keyword then filter the results again
-authority by activity? who is tweeting the most in a given amount of time at any given moment
-social filtering – filter content by contacts?
im no software engineer or CS undergrad but i came up with this in a matter of minutes and i have always thought that stacking/aggregating meta data to improve relevancy by filtering content in search results and providing real time recommendations is pretty much where its at
the more meta data you have the more accurate it will be period
well at least i think so
of course the analytics of tweet activity will improve results as well but thats pretty much common sense sorry forgot to add that in my comment above
What about spam? You don’t talk about this in your article. People will have a much higher incentive to spam twitter (extremely easy to do) when they know it can manipulate google’s results.
Google seems cautious about what it’s doing for the moment. They don’t want to turn their results to digg
I agree, this will definitely increase the incentive to spam twitter.
Not only spam, but Twitter also needs to be the shield against phishing attacks, something they’re undoubtedly wrestling with now.
wow this is getting interesting. these are very smart deals from the Microsoft camp as it gets Bing deeper into real-time data thats very beneficial for advertisers. especially as the semantic web starts to emerge and the ability to analyze and target in real-time is essential.
*cough* http://www.thed...archengines.com *cought*
In my opinion, people typically search under these conditions:
1. Exploratory, discovery type of search
For example: Like last night as I was helping out my daughter with her science project, I really needed to look things up! This is the type of search where the results from Google, Bing, Kosmix, etc are of great help.
2. Real time search
For example: When I need to know what people are currently talking about Kanye West, or the latest LCD TV. For this, search engines focusing on Twitter feeds are perfect.
And now, we are talking about combining both of them!
I may not need to see the tweets of all internet junkies every time I search, especially not when I am searching for topics of my daughter’s science project. My needs are similar to a visit to the good old library!
I will definitely need to see the tweets of all internet junkies every time I search for which laptop or TV to buy. My needs are similar to asking around for feedback and understanding general trends.
The point, I am trying to make is, what I search for, my needs, my expected results are all different all the time – and search engines should respect that, understand that and offer me appropriate results.
While analyzing the historical trends, relevancy and popularity from tweets and incorporating them in search results might be good, displaying search results sprinkled with tweets may not be required all the time.
I think, it should be upto the users to decide and inform the search engine what they expect out of it – and the search engines should behave accordingly.
BTW Twitter: letting Google and Bing get your feeds is the same mistake that Yahoo and others did in late nineties by letting Google become their default search engine. Realize that your value (and thus the power) is not in the tweets but in what the tweets are “about”.
Build a real-time search engine and an efficient business model around it or else be ready to be gobbled up by one of these giants in the next couple of years!
Thanks for the post- it not only explains what Google and Bing might get from the Twitter deal but also gives a glimpse of what Twitter’s business model might be.
I for one am very excited about this direction this is going. We’re building tools to help process the huge amounts of information that come unfiltered from twitter and I hope to see a lot more innovation from the big search engines.