New Israeli startup IRSeek is indexing public Internet Relay Chat (IRC) channels at the rate of 6 million conversations a day. 300 million conversations have now been indexed by the company. The most popular networks, including EFnet, DALnet, Freenode and QuakeNetUndernet, are all being monitored - IRSeeK is now “listening” to 2000+ channels across 10 networks.
There are few IRC search engines today, and most focus on specific niches or single networks, the Company says. Nearly two decades worth of data contained on IRC servers has effectively been lost. IRSeek wants to make sure that future conversations are properly indexed and and searchable. It’s a huge untapped knowledge-base.
So if you want to see what people are saying on IRC about, say, iPhone unlocks, now you can. The most popular search terms populate a query could on the front page of the site.
The company was founded by Eran Cohen (CEO), and Ariel Berkman (CTO). Development began in mid-2006.
The company says a channel is dropped when file sharing activity is detected and private conversations are not eavesdropped in anyway. Still, some IRC users, who have a possibly unreasonable expectation of privacy, may be troubled by IRSeeK. Personal information is often revealed in IRC chats. That information is now indexed and searchable. Searches can also be conducted by IRC nicknames, and all conversations involving that nickname (or even if they were just in the room) are linked. Of course, nicknames aren’t unique and many users may choose the same nickname over time. But even so, the knowledge that everything being typed can be later found by others may have a chilling effect on users.





Certainly seems like a useful knowledge base to developers and IT managers…
The key point that you’re missing in the article is that IRSeek are ARCHIVING as well as indexing IRC channels. Once it’s archived and made publicly available, anyone, including google, can index it.
Ben, That’s theoretically true, but would take a very long time to achieve. Also, the nice thing about browsing an archived conversation on IRSeek is that you can remove users (talking about uninteresting info on the channel), highlight your interesting users, and make it much easier for you to read through the discussions.
I think it’s a cool idea, because IRC (just like the USEnet was) is a valuable source. However, indexing conversations has already happened in a unstructured way years ago (through published eggdrop et al statistics in the www) so I would not call all conversations “lost”
It is an interesting source. I recently found some interesting stuff about astronomy.
SELECT conversation WHERE text = “a/s/l”
903290923 results found.
Name of the company, IRSeeK, is a little odd, too. How many times do people in the US look through their postal mail, and say “IRS? eeK!”
@3 I completely agree that the IRSeek UI is way more valuable than just static text. My point was that the author of the article is confusing “indexing” with “archiving”. The “About” page on the company website makes it clear they are both archiving and indexing.
@7 I agree… The “IRS” part really jumps out at you!
reminds me of Dejanews a little bit, certainly that data will make them a valuable takeover target in the future…I wonder if Google wants in on IRC?
I just use netsplit.de..
As an old school IRC Junkie, I can say that this idea won’t work..
doesn’t match the IRC spirit
Oh, and one more thing, I am Israeli too..
Irc is indeed an interesting source. I recently found lots of fascinating stuff about tubgirl and goatse.
Ilan, netsplit.de is only a statistical site.. and i think there’s a LOT of valuable knowledge lying in there..
I am extremely pleased to see that a company is dedicated to indexing as much IRC conversation as possible.
After providing and receiving loads of support in topics such as CSS, HTML, Actionscript, and PHP through IRC channels - I have always felt disheartened that the conversations were only real-time. At least now, more problems will be solved, and quicker.
- Tim.
Thanks, this will work great.
NO!
http://fakesteveballmer.blogspot.com
Anyone doing highly illegal things on IRC probably has either their own network and/or password-locked channel(s). And anyone who simply doesn’t want their conversations indexed can take similar measures. It’s like saying Google has a chilling effect on the web. Maybe it does, but only for people who don’t know how to take simple steps to hide their content. What I’m wondering is if IRSeek has the equivalent of robots.txt
How do they do this? Are they just having a bot join every channel they can find and log everything it sees? If so, it could easily be banned.
The C in IRC stands for “chat” - as in what humans do. Bots are (and have been) an abomination, sort of like answering machines talking to each other. Nobody (yet) archives your conversation on the street corner, in the office, or in a bar .. yet they want to archive your chat on a keyboard. It wont work anyway, any sensible network will ban the bot from joining anyway.
Lord have mercy! Most of IRC is completely and utterly nonsensical banter between people. Would you really be interested in a public search engine of everyone’s IM conversations? I mean, really? *yawn*
Sure, it could stand some value in support channels as #15 suggested, but should be an on-request or publicly announced feature of the channel, not some sneaky background logging done by someone’s dodgy bot.
I look forward to this company’s bots getting banned everywhere.
This is an awesome service. We use an IRC channel for the community to connect and communicate with one another, but you run into the same problems of “what was said about that policy?” having it be searchable is HUGE and an extremely valuable thing.
I requested that #productwiki gets added to their list of channels.
I would definitely respect this company, assuming they told the operators of the channels in question that they log it — or at least had the company name as nickname. They aren’t, for obvious reason — channel operator do not want people logging and making stuff searchable (for obvious reasons). This is the same reason that people don’t want everyone to listen to their coffee break conversations. I hope they go bankrupt soon!
People still use IRC?
This service is wonderful !!!
@spif - I’m with you, this is questionable if it’s not with consent or at the very least disclosure. Yeah, yeah, we know what you do in public on the web can be archived. At some point there needs to be some respect in the data collection process.
this is just retarded, the idea of IRC is to ask a question and get an instant answer. If you just search you won’t get any results because its just banter. A big ass block of text
So all it does is index all ongoing chats? I am surprised why there was nothing like this earlier. That aside, it feels like eavesdropping.
It can be a useful utility to make sense of IRC Chats. But I would prefer something similar to a no-call list here. Networks/channels should get an option whether they want their chat indexed and archived, or not. And please be more transparent.
If you want something coherently incoherent from the irc lands, http://www.bash.org is the place to go.
seriously… IRC alone is just not a valuable or credit worthy information source.
what would this be used for mainly .. I can think of 2 things
1. stalking
2. reviews
probably the thing i hate most about google search is when i type in something relating to a specific technology issue and i get back a boatload of irrelevant results from listservs that have been posted online.
i can see this data source resulting in the same pain. too much noise, not enough signal.
if somebody could do natural language search on it, then it’d be sweet.
I’d like to know how to ban their bots. Nothing on their site. I know a lot of irc opers would g-line them.
I too would like to know the bot names. These bots are not welcome in the channels that I maintain.
all of the bots that have been found so far logging are on the host bzq-219-46-202.isdn.bezeqint.net. banning this host from your channel should stop the logging.
Just looked through all of the users in one of my channels and I do not see *.bezeqint.net. So they must be coming from multiple hosts. I do see a lot of their bots from this host, however. Just not in my channels.
IRC has an awfully small signal-to-noise ratio, even in the technical channels where lots of questions get answered. I could see the value for some channels but by and large this would not be of much use when applied to most IRC discussions. Then again, I don’t much care for IRC in the first place so maybe I don’t get it. I do share the opinion of other posters here that these bots should and will be banned by most channel ops.
My issue with this is that they did not ask prior permission.
There is also no information on the site on how to have all information removed from their database.
I know a lot of people say, well google crawl my site and i never give them permission, unfortunatly theres no “robots.txt” for irc.
Other than that, people talk on irc with some confidence that their conversation is not going to be indexed by a bot and then crawled by google, which is what they are aiming at. Billions of pages of unique content generated for free in which to run google adsense on.
Its definatly a quick buck scheme and I hope most IRC admins will ban the address globally from all networks they are associated with.
Pierce
Stuff like this makes me wanna vote for Arabs.
What about the irc users themselves? I only see responses from IRC and Channel operators that “seem” to know what’s best for their users without actually hearing what they have to say… I think they wouldn’t mind, and actually be happy that such a site exists…but that’s just me.. an irc-user.
This is awesome - well the idea of logging any chat is a bit voyeuristic….the IRC is known as a sort of shady back alley of the internet where people talk about the filthiest things and share the most exclusive (and expensive) files and applications. So these companies are effectively helping the RIAA and the FBI do their jobs by providing them ‘yet another’ resource for tapping into the black market.
I wouldn’t be surprised if federal agencies tap these types of websites for their source code - so they can develop and integrate it into their own search product for illegal content.
what a terrible idea. I am glad I don’t use IRC anymore, because when I did, I enjoyed talking fairly freely to others in the same channel. We didn’t lock it because we weren’t doing anything illegal and didn’t require it to be totally private, but we did have the (reasonable) expectation that our chat was limited to those in the channel at that moment. As someone pointed out, this is like recording people talking at a bar and putting it up online for posterity.
I find it disturbing that the bots connect via tor exit nodes with random human-like nicks and identify as mIRC clients. The service might be useful sometimes, but it’s still highly questionable to log without the explicit knowledge and permission of users.
Freenode has banned all users arriving via non-authenticated TOR, which is how most of these bots have connected.
@Gustaf: You’re right. Freenode is in contact with them to have them stop logging on freenode and remove existing logs also.
The german community is gathering information and knowledge on a wiki page: http://wiki.vorratsdatenspeicherung.de/IRSeeK
An english version should be in the process of creation, too.
As of about 2 hours ago the website is deactivated. They want to work at the issues together with the community. Freenode will put an official statement in their blog, I guess: http://blog.freenode.net
All I can say is any such indexing or archive bot that connects to my network without permission will be instantly BANNED!!
Networks and ultimately it’s users have the RIGHT to privacy.
How can you guys write comments? Don’t you know Google is indexing this? Why aren’t you guys being chilled by the chilling effect of Googlebot?
brrrr
Any idiot who thinks IRC channels are by default secure communication channel is too stupid to have an opinion worth considering. This kind of archiving has been done for years, now decades. Everything you type puclicly online is recorded somewhere. Wake up morons.
Every user (human or bot) must follow the rules set by the networks, servers and individual channels in order to participate without interruption.
Accessing these services is a privilege, not a right. That clearly means the rules should be abused.
“IRSeek wants to make sure that future conversations are properly indexed and and searchable.”
IRSeek has no authority over this matter regardless of their goals or agendas.
It is in fact up to the networks (i.e. efnet, freenode) or servers (i.e. arcti.ca, servercentral.net) or the channels (i.e. #css) to make that call.