Google Sends Man to Jail, Metaphysically
by John Biggs on May 29, 2008

Investment News wrote up an interesting piece on Google News’ algorithm and its apparent failures. The magazine wrote a piece on a new SEC ruling. This paragraph appeared:

“Mr. Radano, who in the early 1980s was a staff aide to then-Rep. John Breaux, D-La., before Mr. Breaux was elected to the Senate, was charged in 2002 with helping investment adviser Steven Bolla hide from clients the fact that Mr. Bolla had been barred from the industry.”

Fair enough: A parenthetical mention in a news story about John Breaux. He wasn’t specifically the subject of the line. Now, however, when you search for “John Breaux SEC”., you find on the top page:

John Breaux, D-La., before Mr. Breaux was elected to the Senate, was charged in 2002 with helping investment adviser Steven Bolla hide from clients the fact …

breux.jpg

Bloops. Apparently the algorithm knocked out a few important parts without even adding an ellipses or a “[SNIP!]” While this is a fairly rare occurrence and a very specialized case, in our “post fast and loose” news cycle, someone could be seriously hurt by this robotic excision. Context is everything, and aggregators remove the context with a vengeance. I’m down with robots doing my news hunting-and-gathering for me, I just hope once they’re done I’ll be able tell the poisonous berries from the good ones.

Comments

This is one of the reasons we’re working hard to revive Brijit. I’m keeping people posted on our blog at http://brijit.wordpress.com.

-Jeremy from Brijit

 

This has to be the case, because the reverse (”Man sends Google to Jail”) cannot be true.. :P

 

john…

i know this is anal…

but given that i tell my 7 year old, i’ll tell you as well.

please learn/use the difference between “their” and “they’re” and “there”.

and at the same time, keep using that spell checker!!

she just had an assignment where she was looking for spelling/gramatical errors in “news” media. she found way more on “blog” sites than in traditional newsprint/magazines..

guess you really do get what you pay for!!

peace..

 

your right. blogs are failing you’re 7 year old

 

John,

I don’t know the real point of your story, if you are trying to say google is bad for snippets this would be my response to lazy people like yourself.

You are blaming google for bad snippets?

Are you just that stupid? or just Lazy?

Everyone should be responsible to RESEACH all the facts, if you read something RESEARCH it to make sure it true.

When are people like you going to take responsibly instead of blaming others for your lack of research.

It is everyone else’s fault I am a failure.

Feel free to flame me back, it will only show how lazy and pathetic you are. This is my opinion.

 

No, they’re failing all of us. How are we to take a writer’s ideas seriously when the writer comes across as ignorant? And when young people do read, wouldn’t it be nice if we writers used the language correctly (even close to correctly) so they can see how it’s done?

A little respect for the language goes a long way.

You’re = “you are”
Your = “belongs to you”

 

I’m down with investors doing my news hunting-and-gathering for me “I just hope once they’re done I’ll be able tell the poisonous berries from the good ones”

 

I see “it’s” used incorrectly in the first sentence. The “they’re” in the last sentence is correct.

 

In all fairness, the quoted sentence confused me too.

 

You should make a review of http://www.techdosh.com, it’s a great resource for tech news including TC.

 

John,

OMG! I’m trying to understand the point of all your stories. Are you sure what are you doing? are you doing this for free, right?

If you do not have anything interesting to write about.. go see some movies.

 

This article strikes me as pretty silly. “‘Post fast and loose’ news cycle, someone could be seriously hurt by this robotic excision.” Oh really? Are we really supposed to believe that soon people will start referencing automated snippets and start blatantly getting facts wrong based on this? Look, I have nothing against cynicism, but this isn’t even newsworthy.

 

Holy shit, John Biggs is as talented a blogger as Arrington. TechCrunch got double plus awesome.

Now if only he could shill for Microsoft too…I can’t wait!

 

Why did I just waste a minute of my life reading this article?

 
 
 

it’s should be its. But nobody cares anymore…

 

a new low for TechCrunch in terms of irrelevance, grammar, and arrogance. Time to check out other blogs.

 

“This is very sad story hope he will have computer in is room”

 

If some of you are having a hard time understanding the point of this article you seriously need to work on your reading comprehension skills (or lack of).

 
 

@20

Did you and John go to the same school?

 

Why is this article on TC ?

 

“Apparently the algorithm knocked out a few important parts without even adding an ellipses or a “[SNIP!]””

Apparently the algorith read “Rep.” as being the end of a sentence and therefore didn’t think an ellipsis was needed. if they removed the period from the text of the article the sentence would probably not have been snipped there. it doesn’t take a genius to figure this out.

 

I miss the good old days when blogs like Tech Cruch were real blogs on industry and technology. TC is turning into a DIGG look alike. Is there nothing of material left to write or talk about? i appreciate the dangerous work conditions faced by tower workers, but if I wanted to prosecute ATT, I would read about it on CNN. And now this…. and the debate on english. Disappointing.

 

This Google result IS bad. Most people probably won’t be following each search result to check the validity of the summary. They’ll subconciously incorporate the snippet as a fact. “If a search result is repeated enough times then it becomes fact.” “That John Breaux…I recall he was convicted of something???”

 

Reminds me of the woman who wouldn’t vote for Obama because his name is Hussein. Why does she have a right to vote???

 
 

Personally, I think this is important John.

I’m not a lawyer, but if a human wrote and widely published that statement about a private person, it’d probably be libelous. However, since it’s a machine, I guess there’s no “intent” to harm, plus John Breaux is a public figure which raises the bar on libel/slander/privacy issues/etc.

But I suspect we’ll be wrestling with this more and more as we rely on machines to filter out content for us. And if machines take snippets out of context, extract some meaning from text that wasn’t intended, or displays something as factual because it didn’t “get” the joke or parody, somebody (given our litigious society) is eventually going to be held responsible for that.

Just imagine a poorly-written semantic filter getting the wrong impression of comment #3 with the use of “anal” and “7 year old”, for example.

 

Is it weekend already?

 

this is especially funny in light of Arrington’s headline yesterday which invented a quote from Jerry Yang.

 

Biggs,

You really suck. They’re, I have said it their (there).

- Jack

 

I really liked this story!

Sorry, guys. Usually I am the first one to bash poor articles on here. This is a clever Google result. Maybe it doesn’t belong on TechCrunch, but I liked it.

 

Actually, the more I think about it… I think I’ll set up a blog with several innocent quotes about myself on it, and then let Google index it falsely, and then sue them for libel (or slander, or whatever).

Man, that’s a million dollar idea! Thanks TC! This is the best idea since the idea that you gave me to buy YHOO at the end of January!

 

‘in our “post fast and loose” news cycle, someone could be seriously hurt’

pot kettle black, techcrunch?

 

Why do I get the feeling that many of the comments are written by the same person?

I personally liked the story. It demonstrates that we still have a long way to go in the IR community in terms of web site summarization and generating meaningful and helpful snippets in search results. This is in fact an active research area in the community and you can find research papers on summarization and snippet generation in the top information retrieval conferences such as WWW and SIGIR.

Keep it up, John!

 

Please talk about the actual story!

And Now Wait and see what our “Matt WHO ALWAYS Cutts” will say??

 

Blogs do not have professional content because the content is user generated.

 

Leave a Reply

Create a Gravatar for your comments.
« Back to text comment